Migration to TimescaleDB complete!

Yesterday I posted about why we decided to make the switch to TimescaleDB and then later in the day we actually made the switch!

We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.

In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.

The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!

Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!

ListenBrainz moves to TimescaleDB

The ListenBrainz team has been working hard on moving our primary listen store from InfluxDB to TimescaleDB, and today at UTC 16:00 we’re going to make the switch.

We were asked on Twitter as to why we’re making the switch — and in the interest of giving a real world use case for switching, I’m writing this post. The reasons are numerous:

Openness: InfluxDB seems on a path that will make it less open over time. TimescaleDB and its dependence on Postgres makes us feel much safer in this regard.

Existing use: We’ve been using Postgres for about 18 years now and it has been a reliable workhorse for us. Our team thinks in terms of Postgres and InfluxDB always felt like a round peg in a square hole for us.

Data structure: InfluxDB was clearly designed to store server event info. We’re storing listen information, which has a slightly different usage pattern, but this slight difference is enough for us to hit a brick wall with far fewer users in our DB than we ever anticipated. InfluxDB is simply not flexible enough for our needs.

Query syntax and measurement names: The syntax to query InfluxDB is weird and obfuscated. We made the mistake of trying to have a measurement map to a user, but escaping measurement names correctly nearly drove one of our team members to the loonie bin.

Existing data: If you ever write bad data to a measurement in InfluxDB, there is no way to change it. I realize that this is a common Big Data usage pattern, but for us it represented significant challenges and serious restrictions to put simple features for our users into place. With TimescaleDB we can make the very occasional UPDATE or DELETE and move on.

Scalability: Even though we attempted to read as much as possible in order to design a scalable schema, we still failed and got it wrong. (I don’t even think that the docs to calculate scalability even existed when we first started using InfluxDB.) Unless you are using InfluxDB in exactly the way it was meant to be used, there are chances you’ll hit this problem as well. For us, one day insert speed dropped to a ridiculously low number per second, backing up our systems. Digging into the problem we realized that our schema design had a fatal flaw and that we would have drastically change the schema to something even less intuitive in order to fix it. This was the event that broke the camel’s back and I started searching for alternatives.

In moving to TimescaleDB we were able to delete a ton of complicated code and embrace a DB that we know and love. We know how Postgres scales, we know how to put it into production and we know its caveats. TimescaleDB allows us to be flexible with the data and the amazing queries that can be performed on the data is pure Postgres love. TimescaleDB still requires some careful thinking over using Postgres, it is far less than what is required when using InfluxDB. TimescaleDB also gives us a clear scaling path forward, even when TimescaleDB is still working on their own scaling roadmap. If TimescaleDB evolves anything like Postgres has, I can’t wait to see this evolution.

Big big thanks to the Postgres and TimescaleDB teams!

MusicBrainz Server update, 2020-06-29

The React conversion is back with this release. We’ve also fixed a regression that listed unrelated “recording of” relationship edits in history of artists, recordings, and releases, and made a lot of people quite frustrated during the last two weeks (sorry about that!). New edits won’t be wrongly listed anymore, and the existing wrongly-listed edits will be progressively unlisted during the following days. Finally, two new data reports have been created, and small display improvements have been made.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server. See the release notes for update instructions.

Thanks to loujin for contributing code. Thanks to DjSlash, hibiscuskazeneko, insolite, jesus2099, and mavit for having reported bugs and suggested improvements. Thanks to Atsushi Nakamura, kellnerd and salorock for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-29.

Bug

  • [MBS-10825] – Normalization of some Muziekweb links make them invalid
  • [MBS-10908] – Unrelated “recording of” relationship edits for works show up in history of artists, recordings, and releases

New Feature

  • [MBS-10770] – Report on relations with dates in the future
  • [MBS-10895] – Report for mediums with conflicting discID

Improvement

  • [MBS-10736] – Add autoselect + sidebar for Napster URLs
  • [MBS-10802] – Right-align columns of Reorder Relationship table
  • [MBS-10894] – Show label code after selecting label in the release editor
  • [MBS-10903] – Update Operabase cleanup and validation
  • [MBS-10907] – Define background color for header and footer too

React Conversion Task

  • [MBS-10777] – Convert Add/Remove Relationship edits to React
  • [MBS-10801] – Convert Reorder Relationships edit to React

MusicBrainz Server update, 2020-06-15

Today’s release focuses on stability with almost half of changes being bugfixes. It also provides a fair number of small improvements and new features. On a side note, the third party affiliated Magic Tagger is now known as AudioRanger.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server and to fix a few issues in both standard setup and development setup. See the release notes for update instructions.

Thanks to Cyna, Freso, loujin for contributing code. Thanks to chaban, chiark, jesus2099, kellnerd, nikki, otringal, navap, wmorg, and yeeeargh for having reported bugs and suggested improvements. Thanks to kellnerd and salorock for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-15.

Bug

  • [MBS-6532] – Work edits not in release edit history
  • [MBS-10186] – “Set track lengths” edit is stuck
  • [MBS-10381] – Duplicate work entity lyrics languages during edit
  • [MBS-10821] – Edit changing medium tracklist and format is stuck
  • [MBS-10836] – My Collections edit link doesn’t show edit page
  • [MBS-10856] – Relationship credit with whitespace causes error
  • [MBS-10867] – uncaught exception: entity not in the entities array
  • [MBS-10873] – Artist not always shown for annotation edits
  • [MBS-10882] – ISE when filtering RGs
  • [MBS-10885] – BadAmazonURLs: TypeError: Cannot read property ‘href_url’ of null
  • [MBS-10887] – CSS misalignment of country/date text with other columns
  • [MBS-10890] – Work language displayed multiple times in edit listings
  • [MBS-10892] – Collaborators can’t add releases to private collection

New Feature

  • [MBS-1736] – Block setting format on too early releases
  • [MBS-10862] – Report for releases with catalog numbers that look like label codes

Improvement

  • [MBS-4644] – Indicate which releases have CAA art in listings
  • [MBS-9340] – Don’t allow more languages if [No lyrics] is selected
  • [MBS-9931] – Fail gracefully when trying to remove a relationship which is in use as an example
  • [MBS-10469] – Show releases more likely to have a processed cover art on front page
  • [MBS-10893] – Add help text to select the appropriate artist from CD lookup
  • [MBS-10897] – Block always more smart links

React Conversion Task

  • [MBS-10393] – Convert Add Standalone Recording edit to React

Other Task

  • [MBS-6864] – Remove PUID edits
  • [MBS-10771] – Block tagging for unverified users
  • [MBS-10891] – Replace Magic Tagger link with AudioRanger

MusicBrainz Server update, 2020-06-02

Now that PostgreSQL has been upgraded to version 12 (see earlier instructions), regular improvements, bugfixes, and React/JSX template refactoring are back on the menu. The most noticeable improvement is probably that we are now able to display more specific error messages in the URL relationship editor when a link is not allowed, instead of always giving a generic “not good”-style message. There are even new features, for admins only, that will allow them to spot and delete sock-puppets and to temporarily disable edit notes from editors who continue to be disrespectful to others without having to delete their accounts outright.

In news not directly connected to the MusicBrainz website, the public search endpoint has been moved from the old search server to the Solr-based search server. This can be used by slave servers if they do not need, or can not afford, to host their own search indexes.

And while talking about slave servers, a new release of MusicBrainz Docker is also available. It follows this update of MusicBrainz Server and fixes a regression that affects servers with live indexing. See the release notes for update instructions.

Thanks to Cyna for converting more edits’ display to React, to KamranMackey and navap for updating external links’ icons. Thanks to alastairp, BestSteve, chaban, chirlu, cyberskull, Freso, hibiscuskazeneko, jesus2099, and Kid Devine for having reported bugs and suggested improvements. Thanks to eduardomariohs, kellnerd, mfmeulenbelt, salorock, and stich94 for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-02.

Bug

  • [MBS-9010] – Alerts about new edit notes are separate for production and beta
  • [MBS-10613] – Version info footer is shown on some pages in production
  • [MBS-10732] – Internal Server Error: Adding a collaborator to a collection without selecting from autocomplete dropdown
  • [MBS-10839] – “Add selected recordings for merging” missing in standalone-only overview
  • [MBS-10849] – Add release group preview on RE shows “This entity has been removed”
  • [MBS-10850] – Bad Amazon URLs report doesn’t ignore “streaming page” relationship
  • [MBS-10855] – Historic track edit involving since-removed entity doesn’t show recording
  • [MBS-10863] – “Edit barcode” edit doesn’t show barcode

New Feature

  • [MBS-10834] – New account flag for disabling ability to write edit notes
  • [MBS-10845] – Tool to allow account admins to look up accounts by e-mail

Improvement

  • [MBS #1516] – Update Facebook, Google Play, and Spotify icons
  • [MBS-7822] – Update OverClocked ReMix favicon
  • [MBS-8412] – Align number of edits per page
  • [MBS-9516] – Display specific error messages depending on URL validation rules
  • [MBS-9963] – Update Genius link format and logo
  • [MBS-10412] – Update URL cleanup for Niconico URLs, specifically channel links for artists
  • [MBS-10727] – Deny kasi-time.com URLs for lyrics
  • [MBS-10789] – Add validation for Genius links
  • [MBS-10813] – Update the Bandcamp logo used in the sidebar
  • [MBS-10831] – Allow niconico channel links for other entities than artist
  • [MBS-10840] – Capitalise “in key” info correctly in English guess case
  • [MBS-10841] – Add “Guess case” per-medium
  • [MBS-10842] – Remove report user link from deleted editor profiles
  • [MBS-10853] – Link to overview page for edits by subscribed editors in subscription email

React Conversion Task

  • [MBS-10397] – Convert Edit Event edit to React
  • [MBS-10399] – Convert Edit Recording edit to React
  • [MBS-10793] – Convert historic Move Release edit to React
  • [MBS-10799] – Convert historic Move Release to RG edit to React
  • [MBS-10817] – Convert Edit Label edit to React

Other Task

  • [MBS-7781] – Merge duplicate artist credits
  • [MBS-10785] – Remove link to FreeDB Gateway documentation
  • [MBS-10822] – Change tableColumns tables to use named parameters
  • [MBS-10860] – Merge the production and beta Redis stores
  • [MBS-10878] – Convey search queries from slave servers to Solr

PostgreSQL 12 Upgrade Instructions for MusicBrainz Server

Thanks to everyone for your patience during our downtime today. As promised, here are steps to follow to upgrade your own PG instance to v12. (Confused? See the previous blog post on this subject.)

If you’re already running v12, there are still some instructions you must follow!

For MusicBrainz Docker

If you’re running the new MusicBrainz Docker setup, an upgrade script exists for you to use. See the release notes for specific – hopefully brief – instructions.

For a Manual Setup (INSTALL.md Based)

If you aren’t using Docker but rather set up musicbrainz-server by hand following INSTALL.md, see the steps below.

Know that as an alternative, you can always import new data dumps from scratch (again following the steps in INSTALL.md) into a new PG 12 cluster. Just make sure you’re on the v-2020-05-18-postgres12 tag of musicbrainz-server while doing so.

If on the other hand you don’t mind getting your hands a bit dirty, you can use the quicker method below. Like INSTALL.md, this assumes you’re using Ubuntu/Debian and their postgresql-common cluster management tools.

If you’re already running v12, you should still follow these steps; however, you can skip the ones involving apt-get, pg_dropcluster, and pg_upgradecluster. The main steps you need to follow in this case are running the 20200518-pg12-before-upgrade.sql and 20200518-pg12-after-upgrade.sql scripts in that order.

On distros other than Debian/Ubuntu where the postgresql-common tools aren’t available, you’ll have to manage with initdb and pg_upgrade on your own.

  1. First take down the web server running MusicBrainz (stop plackup) to prevent database access.
  2. Turn off any cron jobs updating or accessing the database (e.g. for the live data feed/replication packets).
  3. Switch to the latest musicbrainz-server code with:
    git fetch origin && \
    git checkout v-2020-05-18-postgres12
  4. With PG 9.5 (or whatever version you’re using) still running, run the following “pre-upgrade” script:
    psql -U postgres -d musicbrainz_db \
    -f admin/sql/updates/20200518-pg12-before-upgrade.sql

    This assumes that “postgres” is the name of your PG superuser, and “musicbrainz_db” is the name of your database. If you see a few messages about things not existing, that’s normal.

  5. Install packages for PostgreSQL 12. On Ubuntu/Debian you can obtain them from the PGDG apt repo.
    apt-get update && \
    apt-get install postgresql-12 postgresql-server-dev-12

    If you’re installing postgresql-12 for the first time, this will automatically create a new cluster at /var/lib/postgresql/12/main. Remove that empty cluster. Don’t run this if you already had v12 installed and have data there!

    pg_dropcluster --stop 12 main
    If you did already have v12 installed with musicbrainz_db running there, leave the cluster alone and skip the next step involving pg_upgradecluster.

    In the unlikely event that you already have a v12 cluster, but also have musicbrainz_db running in a separate, older cluster, these instructions won’t work for you. We recommend importing fresh data dumps into the v12 cluster and dropping the old one.

  6. Upgrade the old cluster. This assumes it’s version 9.5; if you’re using version 10 or 11, make sure to replace 9.5 below with 10 or 11. If you have other databases in your old cluster besides musicbrainz_db, be aware that this will upgrade all of them to PG 12.
     pg_upgradecluster -v 12 9.5 main
  7. If all goes well, the new cluster should be up and running. (You can drop the old one if you like; the output of the pg_upgradecluster command will tell you how.) Now run the following “post-upgrade” script on the database:
    psql -U postgres -d musicbrainz_db -f \
    admin/sql/updates/20200518-pg12-after-upgrade.sql
    This may take a bit, as it has to recreate some indexes.
  8. The upgrade is complete. You can turn cron jobs back on, if applicable.
  9. Restart the MusicBrainz web server / plackup, if applicable. If you’re accessing the server in a web browser, the usual release upgrade steps apply, like running ./script/compile_resources.sh again.

If you run into any trouble following the above, please let us know and we’ll try to help resolve your issue as soon as possible!

MusicBrainz Server update, 2020-05-11

This is the last update before upgrading to Postgres 12. It is mainly focused on React conversion but also carries ten small bugfixes and improvements.

Thanks to navap for hacking the user interface. Thanks to admiy, chaban, fabe56, Freso, jesus2099, zas for having reported bugs and suggested improvements. Thanks to kellnerd, mfmeulenbelt, and salorock for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-05-11.

Bug

Improvement

  • [MBS-10737] – Allow thesession.org URLs for Places
  • [MBS-10761] – Disallow YouTube links at wrong levels
  • [MBS-10804] – Remove redundant user header from edit page
  • [MBS-10805] – Support for Amazon.AE/NL/SG/TR ASINs

React Conversion Task

  • [MBS-9910] – Convert wikidocs transclusion admin templates to React/JSX
  • [MBS-10748] – Convert the relationship doc page to React
  • [MBS-10760] – Convert Remove Track edit to React
  • [MBS-10762] – Convert historic Remove Release/Releases edits to React
  • [MBS-10764] – Convert historic Remove Label Alias edits to React
  • [MBS-10765] – Convert historic Add, Move and Remove DiscID edits to React
  • [MBS-10773] – Convert historic Change Release Quality edit to React
  • [MBS-10775] – Convert historic Add/Remove Relationship edits to React
  • [MBS-10790] – Convert historic MAC/SAC edits to React
  • [MBS-10791] – Convert historic Change RG edit to React
  • [MBS-10792] – Convert historic Change Artist Quality edit to React
  • [MBS-10811] – Convert historic Edit Relationship edit to React

Reminder: Upgrading to PostgreSQL 12 on May 18, 2020

As we announced in February, in two weeks time (May 18, 2020) we’ll be upgrading our production database server to PostgreSQL v12 (from v9.5). At the same time, v12 will become the minimum supported version for MusicBrainz Server, so we ask that you upgrade afterwards as soon as possible! If you’re still unsure, a Q&A is below.

When do I need to upgrade my postgres by?

As soon as possible after May 18 if you’d like to keep your musicbrainz-server code up to date.

How do I perform the upgrade?

We’ll provide instructions closer to May 18. It’s recommended that you don’t upgrade until then, since we’ll be providing scripts to resolve some issues.

Will the live data feed (replication packets) stop working right away if I don’t upgrade?

No, as long as you keep your musicbrainz-server code checkout on the v-2020-05-11 tag (which will be the final release before May 18) or earlier. Future releases may work for a while too.

This is not a schema change release, so replication will continue to work smoothly until you upgrade. No tables or views will change.

However, to make the upgrade process smoother we’ll be dropping the musicbrainz-collate and musicbrainz-unaccent extensions, instead using PG’s builtin collation support for the former and replacing the latter with the unaccent extension from postgresql-contrib. A few SQL functions are being added to enable this, and some indexes need to be rebuilt. This will all happen as part of upgrade scripts we provide (or you can import from scratch). Some features of musicbrainz-server that use these old extensions may cease to work if you don’t apply them.

The extension changes above don’t actually make use of any new PG 12 features. We’ll avoid using such features for at least 1 month.

If I’m already running PostgreSQL 12, do I need to do anything?

Yes, but things will be easier for you. As mentioned in the previous answer, we’ll be dropping the musicbrainz-collate and musicbrainz-unaccent extensions to make the upgrade process smoother for pre-v12 instances. So you’ll only have to run some upgrade scripts we provide to replace those extensions and rebuild some indexes.

My host/distribution doesn’t have PostgreSQL 12 yet!

If you’re running Debian or Ubuntu, the PGDG maintains an APT repository with the latest versions. These are the same packages MetaBrainz uses in production.

Amazon RDS supports PostgreSQL 12 since March 31.

I absolutely cannot upgrade yet! What should I do?

You can stay on the v-2020-05-11 release of musicbrainz-server or earlier until then. Replication packets (i.e. the live data feed) will continue to work until the next schema change on that tag, but you’ll have upgraded to v12 by then, right?

Instead of performing a pg_upgrade and running these upgrade scripts you mentioned, can I just import fresh data dumps into a new v12 cluster?

Of course. Just make sure your musicbrainz-server git checkout is on the v-2020-05-18 tag (once that’s released) or later before performing the import. And keep in mind it may be slower than a direct upgrade.

MusicBrainz Server update, 2020-04-27

A large variety of issue types have been addressed in today’s release!

As a new feature, search indexes are now dumped and made available along with database dumps under the FTP directory search-indexes. They are mainly intended to be loaded on a MusicBrainz slave server to start a mirror with search.

Among improvements, a noticeable one is to lighten the area’s overview page that was heavily crowded with all sorts of relationships that have been scattered over more specific tabs.

As for bugfixes, the major one secures user/admin forms against CSRF attacks.

Thanks to atj for contributing code to support Traxsource URLs. Thanks to alex_s7, chaban, danbloo, Lotheric, murdos, Skeebadoo for reporting issues. Thanks to kellnerd, Jormangeud, mfmeulenbelt, salorock for updating the translations in German, Finnish, Dutch, Italian, respectively. And thanks to all others who tested the beta version!

The git tag is v-2020-04-27.

Bug

  • [MBS-10359] – Guess feat. artists from track titles do not give expected result
  • [MBS-10677] – Place type shown as null on WS event place rels
  • [MBS-10717] – Cookie attributes must be adjusted to work with with new behavior in browsers
  • [MBS-10719] – “remove PUID” edit doesn’t load
  • [MBS-10742] – “Show more” country miscount
  • [MBS-10756] – Inconsistent default sort order for recordings on Works page
  • [MBS-10778] – User/admin forms are prone to CSRF attacks

New Feature

  • [MBS-10546] – Dump MB Solr data along with MB DB full export

Improvement

  • [MBS-1921] – Display edit link under annotations
  • [MBS-9086] – Move most relationships away from area overview
  • [MBS-10666] – Collapse work artists when there are too many on merge pages
  • [MBS-10741] – Make “relationship [attribute] in use” pages consistent
  • [MBS-10755] – Add entity type restrictions for musik-sammler.de URLs
  • [MBS-10781] – Add support for Traxsource URLs

React Conversion Task

  • [MBS-10740] – Convert /relationship static pages to React
  • [MBS-10751] – Convert Remove PUID edits to React

MusicBrainz Server update, 2020-04-13

No Easter egg in today’s update but rather a dozen or so of small bugfixes and convenient improvements.

Thanks Rotab for the pair of bugfixes he submitted, to CatQuest, chaban, chirlu, FSpy, HibiscusKazeneko, JesseW, KRSCuan, MichelV, wcw1966 for issues they reported, to salorock for the Italian translation he updated, and to all others who tested the beta version!

The git tag is v-2020-04-13.

Bug

  • [MBS-7465] – Tag cloud isn’t updated
  • [MBS-9169] – Inconsistent locale identifiers
  • [MBS-9728] – Recently-added Unicode emojis can’t be used in titles
  • [MBS-9894] – Timeline shows future date
  • [MBS-10360] – Whitelist User-Agent header in CORS Preflight requests
  • [MBS-10640] – Incorrect donation status in “Donation Check” tab
  • [MBS-10688] – Attaching a CDTOC that already exists on the medium gives a cryptic error
  • [MBS-10718] – Duplicate series “part of” relationships which got grouped are harder to detect
  • [MBS-10730] – Recording is displayed twice in artist overview when credited multiple times

Task

  • [MBS-10735] – Remove (discontinued) CD Baby links from the sidebar

Improvement

  • [MBS-5641] – Show release language/script in reports ReleasesWithUnlikelyLanguageScript, NoLanguage and NoScript
  • [MBS-10679] – Link to the JSON WS from the Details tab
  • [MBS-10680] – Link to WS docs from the Details tab
  • [MBS-10724] – Make sorting options (area, date, artist) consistent
  • [MBS-10747] – Change wording/phrasing of status description for open edits
  • [MBS-10753] – Use artist sort names for artist collection ordering