Steps to fix missing genre and tag data on MusicBrainz mirror servers

If you recently updated your mirror server to the 2022-05-16 schema change release, we’re sorry to say that a bug in our upgrade script caused aggregate genre and tag data (if you had imported any) to be deleted. If you need this data, it can be re-imported from a recent dump, and we’ve written a script to help automate that.

You can safely ignore this post if

To restore the genre and tag data, follow these steps:

  1. Ensure you’ve replicated up to the most recent replication packet available. If you’re not sure, run ./admin/replication/LoadReplicationChanges. If you’re up-to-date, it should log “Replication packet … is not available.”
  2. Run git checkout production && git pull origin production.
  3. Turn off any cron jobs that update the database, including for replication.
  4. Run ./admin/sql/updates/20220720-mbs-12508.sh.
  5. Restart any cron jobs that you disabled.

You can verify that this process worked by checking the number of tags in the database: echo 'SELECT count(*) FROM tag' | ./admin/psql READWRITE. It should be over 200,000.

Sorry for the inconvenience, and let us know if you encounter any further issues.

MusicBrainz schema change release, 2022-05-16 (with upgrade instructions)

We’re happy to announce the release of our May 2022 schema change today! Thanks to all who were patient during today’s downtime as we released everything to our production servers, and thanks to ikerm2003, mfmeulenbelt rinmon and salo.rock for updating the translations.

This is once again a fairly minor release as far as schema changes go, but please do report any issues that you come across, especially related to the propagation of ratings and tags.

New, user-facing changes with this release include withdrawn-only release groups showing in the official overview again (MBS-12208) and the final disappearance of Amazon cover art (MBS-12200). To this regard, the report of releases that have Amazon cover art but no Cover Art Archive front cover will stay available for editors to check their subscribed entities.

Additionally, several small changes will allow, in the next couple of releases, to store more information about genres (including URL relationships) and to recognize and special-case mood tags (MBS-12190). Another new feature that will start to be used in the API and artist credit pages is the addition of MusicBrainz IDs for artist credits, which allow referring to them with a unique and more persistent ID (MBS-11456). Finally, a few more under-the-hood only changes are made, which should ensure better performance for finding artists, events, etc. for all areas contained in a given one, and less bugs when adding and changing tags and ratings.

The area containment changes make use of a new materialized table. Like the ones we added last year, this table isn’t dumped nor replicated, since it is derived entirely from primary table data. Rather, it will be created during migration (or, in a new install, by running the admin/BuildMaterializedTables script) and triggers will be added to keep it up-to-date once it has been built. These triggers are created on replicated servers, too.

The accompanying new version of the search index rebuilder brings performance improvements for both the main server and mirrors, and simplifies maintenance. See the release notes for details.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server. See the release notes for update instructions.

Now, on to the instructions.

Schema Change Upgrade Instructions

Note: Importing the latest data dump is always a valid alternative to running ./upgrade.sh on an existing database, if you’d prefer to also get new data in one go. Just follow the relevant instructions in INSTALL.md. The git tag is v-2022-05-16.1-schema-change. The rest of the instructions here assume an in-place upgrade.

  1. Make sure DB_SCHEMA_SEQUENCE is set to 26 in lib/DBDefs.pm.
  2. If you’re using the live data feed (your REPLICATION_TYPE is set to RT_SLAVE), ensure you’ve replicated up to the most recent replication packet available with the old schema. If you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to upgrade, it should say “This replication packet matches schema sequence #27, but the database is currently at #26.”
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you’re automatically updating the database via cron jobs.
  5. If you’re using the live search indexing, stop it and, assuming sir is under the same directory as musicbrainz-server, run cd ../sir && python2.7 -m sir triggers && cd - && ./admin/psql < ../sir/sql/DropTriggers.sql && ./admin/psql < ../sir/sql/DropFunctions.sql
  6. Switch to the new code with git fetch origin followed by git checkout v-2022-05-16.1-schema-change.
  7. Run cpanm --installdeps --notest . (note the dot at the end) to ensure your perl-based dependencies are up to date.
  8. Run ./upgrade.sh (it may take a while to vacuum at the end).
  9. Set DB_SCHEMA_SEQUENCE to 27 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh.
  10. If your REPLICATION_TYPE is set to RT_SLAVE, change it to RT_MIRROR. (The previous terminology will work for the time being, but is now deprecated.)
  11. If you’re using the live search indexing, assuming again that sir is under the same directory as musicbrainz-server, run cd ../sir && git fetch origin && git checkout v3.0.1 && python2.7 -m sir triggers && cd - && ./admin/psql < ../sir/sql/CreateFunctions.sql && ./admin/psql < ../sir/sql/CreateTriggers.sql and rebuild indexes which takes hours (by running cd ../sir && python2.7 -m sir reindex && cd -) then start it in watch mode (with cd ../sir && git fetch origin && git checkout v3.0.1 && python2.7 -m sir amqp_watch)
  12. Turn cron jobs back on, if applicable.
  13. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart Redis. If you’re accessing your MusicBrainz server in a web browser, run ./script/compile_resources.sh.

Here’s the list of resolved tickets:

Fixed Bug

  • [MBS-5359] – *_tag tables are corrupt and need to be regenerated
  • [MBS-11760] – Removing the last use of a tag does not always remove the tag
  • [MBS-12369] – Standalone databases may be missing foreign keys for the documentation schema

New Feature

  • [MBS-12190] – Add Mood support in the database

Improvement

  • [MBS-11456] – Add MBIDs to artist credits in the database with merge
  • [MBS-12141] – Block tag names that are empty or have uncontrolled whitespace with database constraints
  • [MBS-12224] – Keep tags’ ref_count and aggregate vote counts updated with triggers
  • [MBS-12249] – Add a materialized area_containment table kept up-to-date with triggers
  • [MBS-12256] – Keep rating and rating_count column on *_meta tables up-to-date with triggers
  • [MBS-12313] – Clarify item naming in the Search drop down menu

Database Schema Change Task

  • [MBS-11457] – Drop the series ordering_attribute column
  • [MBS-11755] – Remove unused tags
  • [MBS-12157] – Remove support for Amazon cover art
  • [MBS-12200] – Drop schema objects related to Amazon cover art support
  • [MBS-12225] – Rename “slave” to “mirror” (inclusive language update)
  • [MBS-12250] – Create dbmirror2 schema on production and mirror servers
  • [MBS-12252] – Add edit_genre table
  • [MBS-12253] – Add relationship tables for genres
  • [MBS-12254] – Add genre_annotation table
  • [MBS-12255] – Add genre_alias_type table and make genre_alias consistent

Schema change release: May 16, 2022

Today we’re announcing a MusicBrainz database schema change release planned for May 16, 2022. The majority of these changes follow the theme of improving data integrity and consistency, performance, or just cleaning up old cruft. Others relate to new features for genres and artist credits. We’re also introducing a new entity based on tags, like genres that came before it: Mood. See below for more details, including information on how these changes affect the schema or existing data. We expect people will encounter zero breaking changes, but it doesn’t hurt to double check, especially if you have a specific or non-standard use of the database!

Here’s our list of tickets for the Spring 2022 schema change:

Schema changes

  • MBS-12256: Keep rating and rating_count column on *_meta tables up-to-date with triggers. An internal-only change to help us keep aggregate rating information (i.e. an average rating and count of ratings for each entity) up-to-date more easily, and help keep these values accurate. This change affects master/standalone databases, but should have no impact on mirror servers, where such triggers are not created. It’s possible that some existing aggregate ratings data was out of sync and will be updated with this change.
  • MBS-12224: Keep tags’ ref_count and aggregate vote counts updated with triggers. Like the change above for ratings, this is primarily internal-only and intended to help us keep tag counts in sync, though an adjacent goal is to make features like MBS-960 easier to implement. This also revives the tag.ref_count column, which hasn’t been updated for years, in order to provide a faster way of sorting tags/genres by usage count. Like above, this change should have no impact on mirror servers schema-wise, but will fix some existing corrupt tag counts.
  • MBS-12249: Add a materialized area_containment table kept up-to-date with triggers. Pages that make use of area containments, e.g. the list of artists from an area, which are expected to account for sub-areas, are currently quite slow and we’d like to improve upon this. The slowness is related to the recursive queries we use to get contained sub-areas – these queries are uncached and calculated on-the-fly. This ticket addresses these performance issues by caching area containment information in a new aptly-named area_containment table. Consistent with the tag and rating tickets above, this table will also be kept up-to-date with triggers. This change should have no impact on mirror servers except to make certain area requests faster; it does not affect existing data.
  • MBS-12250: Create dbmirror2 schema on production and mirror servers. The dbmirror extension we use to generate our replication packets a.k.a Live Data Feed is a 20 year old tool. It has issues and limitations that are difficult to fix, and we aim to replace it with something more maintainable. We wrote dbmirror2 to do that, but still have the task of getting it deployed to mirrors seamlessly. This will happen invisibly without any changes needed on mirrors! The action for this ticket is to simply create the schema for dbmirror2; it’s not actually used for replication yet. We’ll first have a testing phase and make sure external projects like mbdata work with the new replication packet format.
  • MBS-12200: Drop schema objects related to Amazon cover art support. For a long while, releases with Amazon URLs would be checked for cover art on Amazon, and if found, a link to the image would be cached for display. Unfortunately Amazon’s API to do this changed, and we haven’t synced artwork from them in years. We still have many old images cached and we still display those, but they aren’t guaranteed to be in sync. Last year we decided to drop support for displaying these, while giving time for users to upload any correct images to the Cover Art Archive. To help with this, we have a report of releases with Amazon cover art but no Cover Art Archive front cover.

    The schema change here involves dropping the release_coverart table (which was private and non-replicated) and the release_meta.amazon_store column (which was completely empty and unused). This change should have no impact on mirror servers, unless you were using this table or column for your own purposes, because they should be otherwise empty.
  • MBS-12141: Block tag names that are empty or have uncontrolled whitespace with database constraints. Recently we discovered a bug where empty or blank tag names could be submitted in the web service. This has since been fixed, but we’d also like to prevent such empty tags by adding a database constraint. This change has no effect on mirror servers schema-wise, where such constraints are not created. A few blank tags have already been deleted from the production database, but otherwise existing data is not affected. If any such tags exist in your standalone database, they’ll be deleted.
  • MBS-12252: Add edit_genre table. A requirement to start storing edit history for genres (right now changes leave no trail). This will add the empty table to mirror servers as well, but will not affect any existing data.
  • MBS-12253: Add relationship tables for genres. A requirement to be able to relate genres to other entities (such as URLs for the equivalent Wikipedia or Rate Your Music pages). This will add the empty tables (and accompanying example tables in the documentation namespace) to mirror servers as well, but will not affect any existing data.
  • MBS-12254: Add genre_annotation table. A requirement to make it possible to eventually add annotations to genres. This will add the empty table to mirror servers as well, but will not affect any existing data.
  • MBS-12255: Add genre_alias_type table and make genre_alias consistent. Originally the (as yet unused) genre_alias table was designed as a heavily simplified version of the alias tables for other entities. In retrospect, this was not a good decision, since it would make it harder to just use the generic implementation of our alias code for genres. As such, we’re adding a genre_alias_type table (originally genre aliases had no types) and replacing the genre_alias table with one having the extra columns matching other entities’ alias tables. These changes will also happen to mirror servers, but since the genre_alias table was completely unused it should not cause any issues. In case any standalone (not mirrored) servers were using genre_alias, we will ensure any existing data is transferred to the new version of it.
  • MBS-12241: Drop the whitespace_collapsed database constraint. We’ve had a constraint for years that tries to ensure that columns like entity names do not contain multiple consecutive spacing characters (disallowing names such as “This    Title”). In retrospect, this was overreaching, since there are several cases in which a specific number of spaces in a title can be shown to be artist intent. Additionally, we recently discovered that we had some very old data in the database that actually violated this constraint (causing issues when importing data to a standalone server). The data seemed to actually be correct (i.e. some of the aforementioned edge cases) so rather than amending it, we’re removing the constraint. This will have no effect on mirrors since they don’t run constraints.
  • MBS-12225: Rename “slave” to “mirror” (inclusive language update). We recently got a request from a long-time supporting organization to pick a different term for what we’ve historically called “slave server”. Since we were already sometimes using “mirror server” to mean the exact same thing, we are just changing the official name and will use “mirror server” in the future. RT_SLAVE will still work in DBDefs.pm, at least for now, but we’d suggest changing to the new (and equivalent) RT_MIRROR in your mirror servers’ DBDefs.pm. We’re likely to eventually drop support for RT_SLAVE in a future schema change, so we’ll remind you about changing it in future upgrade instructions.
  • MBS-12190: Add Mood support. Music mood was originally meant to be automatically calculated by the now-discontinued AcousticBrainz project. That’s obviously no longer in the cards, but this data would still be quite useful for ListenBrainz. As such, we’re planning to add basic support for mood tags in MusicBrainz in the same way we currently do genres. This can then be leveraged by ListenBrainz to collect the information directly from users playing music through their BrainzPlayer, and it can also of course be entered directly from MusicBrainz in the same way other tags (including genre tags) can. This will add new mood tables to mirrors and will detect some previously generic tags as mood tags, but it won’t cause any changes to the underlying data.
  • MBS-11760: Expand the database triggers which remove empty tags to all entities. When the last use of a tag (up- or downvote) is removed, we use triggers to completely remove the tag from the database. Some of the relevant triggers (for events, places, recordings and releases) were never created though, so if the last use was on an entity of one of those types the empty tag would not get purged. This doesn’t affect mirrors directly since the triggers don’t run on mirrors (but they should no longer get unused tags coming through replication).
  • MBS-11457: Drop series ordering_attribute. This column was added back when we were expecting to have different types of ordering attributes for series, but we have never used it. We planned to remove it last year already, but that required equivalent changes on the search server that couldn’t be made at the time. We are planning to just drop the column.
  • MBS-11456: Add MBIDs and redirect tables for artist credits. Adds a gid column to the artist_credit table, and a new artist_credit_gid_redirect table. It generates MBIDs for existing artist credits that will be replicated to mirrors.

    Artist credit MBIDs will be mainly exposed through the web service at first. The MBIDs will allow public identification of artist credits outside of MusicBrainz, and open the possibility of some more features in the future.
  • MBS-12208: Show withdrawn release groups in the official artist overview. The Withdrawn release status was added recently. Release groups containing only releases with this status were meant to be shown in the main (official) artist overview, but the way this is implemented means a small edit to the get_artist_release_group_rows function was needed. Mirrors using the materialized tables will need to update the data; more info will be provided as part of the upgrade instructions.

We’ll post upgrade instructions for standalone/mirror servers on the day of the release. If you have any questions, feel free to comment below, or on the linked JIRA tickets if relevant there!

NB: This post was updated on 21 March to include a ticket (MBS-12208) that we forgot to list originally

MusicBrainz schema change release, 2021-05-17 (with upgrade instructions)

We’re happy to announce the release of our May 2021 schema change today! Thanks to all who were patient during today’s downtime as we released everything to our production servers.

This is a fairly minor release as far as schema changes go, but please do report any issues that you come across, especially related to the display of recordings, releases and release groups on artist and release group pages.

New, user-facing changes with this release are limited to the new ability to merge collections (MBS-10208) and the addition of ratings for places (MBS-11451). Additionally, MBS-11463 adds a new view that is used to fix a couple small requests related to disc IDs (MBS-11268) and release length calculation (MBS-11349). Two other changes – adding a first-release-date field to recordings (MBS-1424) and support for PKCE in OAuth (MBS-11097) are more or less end-user affecting but were already released on the main MusicBrainz servers a while ago. All other changes are under the hood only.

We ran into a few complications while working on this schema change update, so we decided to postpone two changes to our October schema change to ensure only stuff we are more confident on is released. Those are MBS-11457, which involves dropping the ordering_attribute column for series and would have had no direct effect on user experience, and MBS-11456, which would have added MBIDs for artist credits.

A few of the released new features and improvements — namely the first-release-date field for recordings, and the performance improvements to artist pages — make use of new materialized tables. These tables aren’t dumped, nor are they replicated, since they’re derived entirely from primary table data. Rather, we’ve added a new script to build them (admin/BuildMaterializedTables, included in the upgrade instructions below), and triggers to keep them up-to-date once they’re built. These triggers are created on replicated servers, too. If you use the web interface or web service at all, just note the extra step of running BuildMaterializedTables after upgrade.sh below!

A new release of MusicBrainz Docker is also available that solves an issue for live indexing and matches this update of MusicBrainz Server. See the release notes for update instructions.

Now, on to the instructions.

Schema Change Upgrade Instructions

Note: Importing the latest data dump is always a valid alternative to running ./upgrade.sh on an existing database, if you’d prefer to also get new data in one go. Just follow the relevant instructions in INSTALL.md. The git tag is v-2021-05-19-hotfixes. The rest of the instructions here assume an in-place upgrade.

  1. Make sure DB_SCHEMA_SEQUENCE is set to 25 in lib/DBDefs.pm.
  2. If you’re using the live data feed (your REPLICATION_TYPE is set to RT_SLAVE), ensure you’ve replicated up to the most recent replication packet available with the old schema. If you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to upgrade, it should say “This replication packet matches schema sequence #26, but the database is currently at #25.”
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you’re automatically updating the database via cron jobs.
  5. If you’re using the live search indexing, stop it and, assuming sir is under the same directory as musicbrainz-server, run cd ../sir && python2.7 -m sir triggers && cd - && ./admin/psql < ../sir/sql/DropTriggers.sql && ./admin/psql < ../sir/sql/DropFunctions.sql
  6. Switch to the new code with git fetch origin followed by git checkout v-2021-05-19-hotfixes.
  7. Install newer dependencies Perl 5.30 or later and NodeJS 16 according to install prerequisites.
  8. Run cpanm --installdeps --notest . (note the dot at the end) to ensure your perl-based dependencies are up to date.
  9. Run ./upgrade.sh (it may take a while to vacuum at the end).
  10. Set DB_SCHEMA_SEQUENCE to 26 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh.
  11. If you’re using the web interface or web service, run ./admin/BuildMaterializedTables --database=MAINTENANCE all to build new materialized tables. These will take several additional gigabytes of spaces and be kept up-to-date automatically via triggers. For more information, see INSTALL.md.
  12. If you’re using the live search indexing, assuming sir is under the same directory as musicbrainz-server, run cd ../sir && git fetch origin && git checkout v2.1.0 && python2.7 -m sir triggers && cd - && ./admin/psql < ../sir/sql/CreateFunctions.sql && ./admin/psql < ../sir/sql/CreateTriggers.sql and rebuild indexes (by running cd ../sir && python2.7 -m sir reindex && cd -) then start it in watch mode (with cd ../sir && git fetch origin && git checkout v2.1.0 && python2.7 -m sir amqp_watch)
  13. Turn cron jobs back on, if applicable.
  14. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart Redis. If you’re accessing your MusicBrainz server in a web browser, run ./script/compile_resources.sh.

Here’s the list of resolved tickets:

New Feature

  • [MBS-10208] – Allow merging collections
  • [MBS-11451] – Support ratings for places
  • [MBS-11463] – Add view to easily access medium track lengths
  • [MBS-11652] – Add support for artist series (hotfixed)

Improvement

  • [MBS-10962] – Speed up listing artist’s releases
  • [MBS-11268] – Show “Set track durations” on release/discids page
  • [MBS-11460] – Add materialized tables to fetch release groups by artist or track artist

Database Schema Change Task

  • [MBS-10647] – Add [no label] to b_del_label_special trigger for labels
  • [MBS-11453] – Change entity0_cardinality, entity1_cardinality to SMALLINT
  • [MBS-11459] – Create the edit_data_type_info function on mirrors
  • [MBS-11464] – Drop table statistics.log_statistic
  • [MBS-11466] – Change language.frequency and script.frequency to SMALLINT

Previously Released Changes

  • [MBS-1424] – Add a ‘First release date’ field to recordings
  • [MBS-10821] – Edit changing medium tracklist and format is stuck
  • [MBS-11097] – Support PKCE (Proof Key for Code Exchange) by OAuth clients
  • [MBS-11431] – Speed up /ws/js/check_duplicates

Schema change release: May 17, 2021

We’re having a schema change release on May 17, mostly to make small changes that will make our queries more efficient, ensure better constraints, and make some hardcoded options editable without schema changes in the future. We are also upping the required versions of both Perl and Node.js to 5.30 and 16.0 respectively (see the “Minimum version requirements” section below.)

Here’s our list of tickets for the Spring 2021 schema change, with descriptions of what’s being changed:

Schema changes

  • MBS-1424: Add a “first release date” field to recordings. A very popular request for years, this allows requesting the date of the first ever release a recording appeared on. This adds materialized tables recording_first_release_date and release_first_release_date which are updated via triggers whenever the earliest date changes. The change was released as an optional extension to the main MusicBrainz server schema on Dec 16, 2020, but it will be added to the main schema during this schema change.
  • MBS-10208: Allow merging collections. Users who decide two of their collections should be joined into a larger one should be able to do so without having to move all the entities in the collection manually. This requires adding a editor_collection_gid_redirect table (equivalent to other existing x_gid_redirect tables) to ensure the old collection links redirect to the one they have been merged into.
  • MBS-10566: Convert allowed_series_entity_type and allowed_collection_entity_type to tables to allow for additions without schema changes. The constraints allowed_series_entity_type and allowed_collection_entity_type specify which types of entity can be used in series and collections, respectively. As such, if we want to add the possibility to create series of artists, we need to modify the constraint during a schema change. For ease of use, we are moving the constraints to be their own tables instead, allowing us to update them as needed in the future outside of a schema change release.
  • MBS-10647: Add [no label] to b_del_label_special trigger for labels. The b_del_label_special trigger ensures that any attempt to remove a special purpose label fails. Currently it only checks the special case “Deleted label”, but since “[no label]” is also a special purpose label that should never be deleted, we will add its ID to the trigger check.
  • MBS-10821: Remove orphaned recordings from collections for deletion. Replaces a single function, delete_orphaned_recordings(), to add a new clause that makes it so that recordings referenced only in collections (but not linked to anything else in the database) can be deleted as orphans. This was released on the main MusicBrainz servers on June 15, 2020, but it will be added to the main schema during this schema change.
  • MBS-10962 / MBS-11460Add materialized tables and indexes to fetch releases and release groups by artist or track artist. These tickets will address performance issues on our current artist pages. They do not modify any existing tables, but as mentioned, add some new tables (to be updated via triggers) and indexes.
  • MBS-11097: Support PKCE (Proof Key for Code Exchange) by OAuth clients. Adds two new columns to the editor_oauth_token table. This feature is opt-in, but allows public OAuth2 clients to mitigate auth code interception attacks. The change was released as an optional extension to the main MusicBrainz server schema on Sep 21, 2020, but it will be added to the main schema during this schema change.
  • MBS-11431: Speed up /ws/js/check_duplicates. Adds new indexes only (on the artist, label, place, and series tables, plus their respective alias tables). Improves some slow queries in the editing interface related to duplicate checking, i.e. finding other entities with the same name. Since this is a non-breaking change, it was released on the main MusicBrainz servers on Mar 15, 2021, but it will be added to the main schema during this schema change.
  • MBS-11451: Support ratings for places. Places can be reviewed in CritiqueBrainz yet cannot be rated in MusicBrainz. This is a strange state of affairs: clearly, if they are worth reviewing in depth, they also deserve the option to rate them. As such, we will add place_rating_raw and place_meta tables, in the same way we have for other ratable entities.
  • MBS-11453: Change entity0_cardinality, entity1_cardinality to SMALLINT. The cardinality columns of the link_type table are used to indicate whether the entity on each side of the relationship is expected to have only a few uses of the relationship type in question, or many (too many to comfortably display/edit). This is what stops every single recording of a work showing up in an edit work page, for example. At the moment we allow only two values for cardinality, 0/1. While it is possible that we will want to allow a few other values in between what is effectively “do not show anywhere ever” and “show all the time”, it is clear we will not need more than 32.000 values. As such, we are moving these columns from INTEGER to SMALLINT to reduce table size.
  • MBS-11456: Add MBIDs and redirect tables for artist credits. Adds a gid column to the artist_credit table, and a new artist_credit_gid_redirect table. The MBIDs will allow public identification of artist credits outside of MusicBrainz, and open the door to useful features in the future.
  • MBS-11457: Drop series ordering_attribute. This column was added back when we were expecting to have different types of ordering attributes for series, but we have never used it. We are planning to just drop the column.
  • MBS-11459: Add a script to create edit_data_type_info. edit_data_type_info is a function used for development that we added in 2020 and we will now add to mirrors as well, both for consistency and for anyone who uses their mirror for development and just wants to use it.
  • MBS-11463: Add view to easily access medium track lengths. Our current way of finding the length of a medium is to either load all its tracks, then sum the durations, or to use the duration of its disc ID when present. The first of these requires a lot more processing than just getting the durations straight from the database, while the second ignores any data tracks not on the CD TOC. This adds a medium_track_durations view that allows easy access to the duration of the track lengths for any medium.
  • MBS-11464: Drop statistics.log_statistic. We added the statistics.log_statistic table for a Google Summer of Code project back during a 2012 schema change, but the work never got finished and implemented. We are not planning to implement it anymore, so this table is useless and we will be dropping it.
  • MBS-11466: Change language.frequency and script.frequency to SMALLINT. The frequency column of a language or script indicates how often it’s used and lets us sort the most frequently used entries at the top of our lists. But we don’t store an exact count, just a number from 0-4 indicating “frequently used,” “hidden,” or “other.” Like MBS-11453 above, we don’t expect to need a full INTEGER type to store these columns at any point in the future, so can safely move them to SMALLINT.

Minimum version requirements

We’re raising the minimum required version for Node.js to v16 (which is the next LTS release coming in April 2021). Our current required version, v10, is hitting the end of its life cycle in April, and given there shouldn’t be a particular difficulty installing the current Node.js LTS on any system, it makes sense to just upgrade to the most recent LTS by the time of the schema change day.

We’re also raising the minimum required version for Perl to 5.30. The latest version is 5.32, but the current Ubuntu LTS (20.04), which is likely to be the next base image for our Docker containers, only provides 5.30. Our current required version, 5.18, was released all the way back in 2013, so moving to 5.30 is already a fairly significant improvement.

We’ll post final details about this release just prior to the release and shortly after we complete it, including instructions on how you can update your own copy. If you have any questions, please do leave a comment below or on the linked JIRA tickets!

Reminder: Upgrading to PostgreSQL 12 on May 18, 2020

As we announced in February, in two weeks time (May 18, 2020) we’ll be upgrading our production database server to PostgreSQL v12 (from v9.5). At the same time, v12 will become the minimum supported version for MusicBrainz Server, so we ask that you upgrade afterwards as soon as possible! If you’re still unsure, a Q&A is below.

When do I need to upgrade my postgres by?

As soon as possible after May 18 if you’d like to keep your musicbrainz-server code up to date.

How do I perform the upgrade?

We’ll provide instructions closer to May 18. It’s recommended that you don’t upgrade until then, since we’ll be providing scripts to resolve some issues.

Will the live data feed (replication packets) stop working right away if I don’t upgrade?

No, as long as you keep your musicbrainz-server code checkout on the v-2020-05-11 tag (which will be the final release before May 18) or earlier. Future releases may work for a while too.

This is not a schema change release, so replication will continue to work smoothly until you upgrade. No tables or views will change.

However, to make the upgrade process smoother we’ll be dropping the musicbrainz-collate and musicbrainz-unaccent extensions, instead using PG’s builtin collation support for the former and replacing the latter with the unaccent extension from postgresql-contrib. A few SQL functions are being added to enable this, and some indexes need to be rebuilt. This will all happen as part of upgrade scripts we provide (or you can import from scratch). Some features of musicbrainz-server that use these old extensions may cease to work if you don’t apply them.

The extension changes above don’t actually make use of any new PG 12 features. We’ll avoid using such features for at least 1 month.

If I’m already running PostgreSQL 12, do I need to do anything?

Yes, but things will be easier for you. As mentioned in the previous answer, we’ll be dropping the musicbrainz-collate and musicbrainz-unaccent extensions to make the upgrade process smoother for pre-v12 instances. So you’ll only have to run some upgrade scripts we provide to replace those extensions and rebuild some indexes.

My host/distribution doesn’t have PostgreSQL 12 yet!

If you’re running Debian or Ubuntu, the PGDG maintains an APT repository with the latest versions. These are the same packages MetaBrainz uses in production.

Amazon RDS supports PostgreSQL 12 since March 31.

I absolutely cannot upgrade yet! What should I do?

You can stay on the v-2020-05-11 release of musicbrainz-server or earlier until then. Replication packets (i.e. the live data feed) will continue to work until the next schema change on that tag, but you’ll have upgraded to v12 by then, right?

Instead of performing a pg_upgrade and running these upgrade scripts you mentioned, can I just import fresh data dumps into a new v12 cluster?

Of course. Just make sure your musicbrainz-server git checkout is on the v-2020-05-18 tag (once that’s released) or later before performing the import. And keep in mind it may be slower than a direct upgrade.

Upgrading Postgres instead of schema change: 18 May, 2020

Hello!

We’ve long procrastinated upgrading our production Postgres installation and we’ve decided to forego a schema change upgrade and instead upgrade Postgres to version 12.x. (We will migrate to whatever the latest stable version in the 12.x series will be).

This means that on 18 May we will not make any changes to the MusicBrainz schema, but  we will have some amount of down-time and/or read-only time while we upgrade Postgres on our production servers. We haven’t sorted out all of the exact details of how we will carry out this database upgrade, but the date is now confirmed.

If you operate a replicated instance of the MusicBrainz database we STRONGLY urge you to upgrade your installation shortly after we upgrade the production servers. After this release our team may start using Postgres features not available in Postgres 9.5.x, which is our current production version.

As usual for our releases that impact our downstream users, we will post many more details closer to the date and once the migration is complete, we will post detailed instructions on how you can upgrade your own installation.

Please post any questions you may have!

Thanks!

MusicBrainz schema change release, 2019-05-13 (with upgrade instructions)

We’re happy to announce the release of our May 2019 schema change today! Thanks to all who were patient during today’s downtime as we released everything to our production servers.

This is a fairly minor release as far as schema changes go, but please do report any issues that you come across, especially any related to genres and collections.

Visible changes with this release are limited to an indication if a specific artist credit is being edited (MBS-5387). Work on some of the changes to collections and genres is quite advanced, and we’re hoping to release some of the new features onto beta already in a week or so from now, while others might take a while longer.

Now, on to the instructions.

Schema Change Upgrade Instructions

Note: Importing the latest data dump is always a valid alternative to running ./upgrade.sh on an existing database, if you’d prefer to also get new data in one go. Just follow the relevant instructions in INSTALL.md. The git tag is v-2019-05-13-schema-change. The rest of the instructions here assume an in-place upgrade.

  1. Make sure DB_SCHEMA_SEQUENCE is set to 24 in lib/DBDefs.pm.
  2. If you’re using the live data feed (your REPLICATION_TYPE is set to RT_SLAVE), ensure you’ve replicated up to the most recent replication packet available with the old schema. If you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to upgrade, it should say “This replication packet matches schema sequence #25, but the database is currently at #24.”
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you’re automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout v-2019-05-13-schema-change.
  6. Install newer dependencies Yarn and NodeJS 8 or later according to install prerequisites.
  7. Run cpanm --installdeps --notest . (note the dot at the end) to ensure your perl-based dependencies are up to date.
  8. Run ./upgrade.sh (it may take a while to vacuum at the end).
  9. Set DB_SCHEMA_SEQUENCE to 25 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh.
  10. Turn cron jobs back on, if applicable.
  11. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart redis. If you’re accessing your MusicBrainz server in a web browser, run ./script/compile_resources.sh.

Here’s the list of resolved tickets:

Bug

  • [MBS-5387] – ACs being edited aren’t marked as having pending edits on the aliases tab
  • [MBS-9365] – event_meta_fk_id was never created as part of any upgrade script
  • [MBS-9462] – Standalone databases created before schema 21 are missing some l_event_url triggers
  • [MBS-10146] – Regression: ISE on Remove DiscID page
  • [MBS-10149] – Swap track titles with artist credits fails to update both fields properly
  • [MBS-10150] – Regression: The link to the release group reviews in the release page is broken

Improvement

  • [MBS-9664] – Add database constraints to disallow loop relationship
  • [MBS-10044] – Add place area to place lists

Database Schema Change Task

  • [MBS-10052] – Add new schema for the event art archive
  • [MBS-10173] – Create a genre table in the DB and populate it with existing genres
  • [MBS-10174] – Create an addition timestamp in the DB for new collection items
  • [MBS-10175] – Create a position integer in the DB for collection items
  • [MBS-10176] – Create a comment text field in the DB for collection items
  • [MBS-10177] – Create an editor_collection_collaborator table for collaborative collections
  • [MBS-10178] – Create a genre_alias table
  • [MBS-10181] – Create filesize for cover art and each thumb in the DB

React Conversion Task

  • [MBS-9925] – Convert collection pages to React
  • [MBS-10179] – Convert all entity list components to React

MusicBrainz Schema change upgrade downtime: 17:00 UTC (10am PST, 1pm EST, 19:00CEST)

Hi!

At 17:00 UTC (10am PST, 1pm EST, 19:00CEST) we will start the process of our schema change release. The exact time that we plan to start the change will depend on how long it takes to finish our preparations, but we expect it to be shortly after 17:00UTC.

Once we start the process we will put a banner notification on musicbrainz.org and we will also post updates to the @MusicBrainz twitter account, so follow us there for more details.

After the release is complete, we will post instructions here on how to upgrade your replicated MusicBrainz instances.

No Spring 2018 schema change

We recently decided not to have a spring 2018 schema change release. As usual, we still have some bits left over to finish up from the last spring schema change. More importantly, we’re making a concerted effort to improve the user experience (UX) of the MusicBrainz site — more on that in a blog post later.

We may decide to do an autumn 2018 schema change, but this depends on how well our UX efforts progress over the course of winter and spring.