Today we’re announcing a MusicBrainz database schema change release planned for May 16, 2022. The majority of these changes follow the theme of improving data integrity and consistency, performance, or just cleaning up old cruft. Others relate to new features for genres and artist credits. We’re also introducing a new entity based on tags, like genres that came before it: Mood. See below for more details, including information on how these changes affect the schema or existing data. We expect people will encounter zero breaking changes, but it doesn’t hurt to double check, especially if you have a specific or non-standard use of the database!
Here’s our list of tickets for the Spring 2022 schema change:
- MBS-12256: Keep
*_metatables up-to-date with triggers. An internal-only change to help us keep aggregate rating information (i.e. an average rating and count of ratings for each entity) up-to-date more easily, and help keep these values accurate. This change affects master/standalone databases, but should have no impact on mirror servers, where such triggers are not created. It’s possible that some existing aggregate ratings data was out of sync and will be updated with this change.
- MBS-12224: Keep tags’
ref_countand aggregate vote counts updated with triggers. Like the change above for ratings, this is primarily internal-only and intended to help us keep tag counts in sync, though an adjacent goal is to make features like MBS-960 easier to implement. This also revives the
tag.ref_countcolumn, which hasn’t been updated for years, in order to provide a faster way of sorting tags/genres by usage count. Like above, this change should have no impact on mirror servers schema-wise, but will fix some existing corrupt tag counts.
- MBS-12249: Add a materialized
area_containmenttable kept up-to-date with triggers. Pages that make use of area containments, e.g. the list of artists from an area, which are expected to account for sub-areas, are currently quite slow and we’d like to improve upon this. The slowness is related to the recursive queries we use to get contained sub-areas – these queries are uncached and calculated on-the-fly. This ticket addresses these performance issues by caching area containment information in a new aptly-named
area_containmenttable. Consistent with the tag and rating tickets above, this table will also be kept up-to-date with triggers. This change should have no impact on mirror servers except to make certain area requests faster; it does not affect existing data.
- MBS-12250: Create dbmirror2 schema on production and mirror servers. The dbmirror extension we use to generate our replication packets a.k.a Live Data Feed is a 20 year old tool. It has issues and limitations that are difficult to fix, and we aim to replace it with something more maintainable. We wrote dbmirror2 to do that, but still have the task of getting it deployed to mirrors seamlessly. This will happen invisibly without any changes needed on mirrors! The action for this ticket is to simply create the schema for dbmirror2; it’s not actually used for replication yet. We’ll first have a testing phase and make sure external projects like mbdata work with the new replication packet format.
- MBS-12200: Drop schema objects related to Amazon cover art support. For a long while, releases with Amazon URLs would be checked for cover art on Amazon, and if found, a link to the image would be cached for display. Unfortunately Amazon’s API to do this changed, and we haven’t synced artwork from them in years. We still have many old images cached and we still display those, but they aren’t guaranteed to be in sync. Last year we decided to drop support for displaying these, while giving time for users to upload any correct images to the Cover Art Archive. To help with this, we have a report of releases with Amazon cover art but no Cover Art Archive front cover.
The schema change here involves dropping the
release_coverarttable (which was private and non-replicated) and the
release_meta.amazon_storecolumn (which was completely empty and unused). This change should have no impact on mirror servers, unless you were using this table or column for your own purposes, because they should be otherwise empty.
- MBS-12141: Block tag names that are empty or have uncontrolled whitespace with database constraints. Recently we discovered a bug where empty or blank tag names could be submitted in the web service. This has since been fixed, but we’d also like to prevent such empty tags by adding a database constraint. This change has no effect on mirror servers schema-wise, where such constraints are not created. A few blank tags have already been deleted from the production database, but otherwise existing data is not affected. If any such tags exist in your standalone database, they’ll be deleted.
- MBS-12252: Add
edit_genretable. A requirement to start storing edit history for genres (right now changes leave no trail). This will add the empty table to mirror servers as well, but will not affect any existing data.
- MBS-12253: Add relationship tables for genres. A requirement to be able to relate genres to other entities (such as URLs for the equivalent Wikipedia or Rate Your Music pages). This will add the empty tables (and accompanying example tables in the documentation namespace) to mirror servers as well, but will not affect any existing data.
- MBS-12254: Add
genre_annotationtable. A requirement to make it possible to eventually add annotations to genres. This will add the empty table to mirror servers as well, but will not affect any existing data.
- MBS-12255: Add
genre_alias_typetable and make
genre_aliasconsistent. Originally the (as yet unused)
genre_aliastable was designed as a heavily simplified version of the alias tables for other entities. In retrospect, this was not a good decision, since it would make it harder to just use the generic implementation of our alias code for genres. As such, we’re adding a
genre_alias_typetable (originally genre aliases had no types) and replacing the
genre_aliastable with one having the extra columns matching other entities’ alias tables. These changes will also happen to mirror servers, but since the
genre_aliastable was completely unused it should not cause any issues. In case any standalone (not mirrored) servers were using
genre_alias, we will ensure any existing data is transferred to the new version of it.
- MBS-12241: Drop the
whitespace_collapseddatabase constraint. We’ve had a constraint for years that tries to ensure that columns like entity names do not contain multiple consecutive spacing characters (disallowing names such as “This Title”). In retrospect, this was overreaching, since there are several cases in which a specific number of spaces in a title can be shown to be artist intent. Additionally, we recently discovered that we had some very old data in the database that actually violated this constraint (causing issues when importing data to a standalone server). The data seemed to actually be correct (i.e. some of the aforementioned edge cases) so rather than amending it, we’re removing the constraint. This will have no effect on mirrors since they don’t run constraints.
- MBS-12225: Rename “slave” to “mirror” (inclusive language update). We recently got a request from a long-time supporting organization to pick a different term for what we’ve historically called “slave server”. Since we were already sometimes using “mirror server” to mean the exact same thing, we are just changing the official name and will use “mirror server” in the future.
RT_SLAVEwill still work in DBDefs.pm, at least for now, but we’d suggest changing to the new (and equivalent)
RT_MIRRORin your mirror servers’ DBDefs.pm. We’re likely to eventually drop support for
RT_SLAVEin a future schema change, so we’ll remind you about changing it in future upgrade instructions.
- MBS-12190: Add Mood support. Music mood was originally meant to be automatically calculated by the now-discontinued AcousticBrainz project. That’s obviously no longer in the cards, but this data would still be quite useful for ListenBrainz. As such, we’re planning to add basic support for mood tags in MusicBrainz in the same way we currently do genres. This can then be leveraged by ListenBrainz to collect the information directly from users playing music through their BrainzPlayer, and it can also of course be entered directly from MusicBrainz in the same way other tags (including genre tags) can. This will add new mood tables to mirrors and will detect some previously generic tags as mood tags, but it won’t cause any changes to the underlying data.
- MBS-11760: Expand the database triggers which remove empty tags to all entities. When the last use of a tag (up- or downvote) is removed, we use triggers to completely remove the tag from the database. Some of the relevant triggers (for events, places, recordings and releases) were never created though, so if the last use was on an entity of one of those types the empty tag would not get purged. This doesn’t affect mirrors directly since the triggers don’t run on mirrors (but they should no longer get unused tags coming through replication).
- MBS-11457: Drop series
ordering_attribute. This column was added back when we were expecting to have different types of ordering attributes for series, but we have never used it. We planned to remove it last year already, but that required equivalent changes on the search server that couldn’t be made at the time. We are planning to just drop the column.
- MBS-11456: Add MBIDs and redirect tables for artist credits. Adds a
gidcolumn to the
artist_credittable, and a new
artist_credit_gid_redirecttable. It generates MBIDs for existing artist credits that will be replicated to mirrors.
Artist credit MBIDs will be mainly exposed through the web service at first. The MBIDs will allow public identification of artist credits outside of MusicBrainz, and open the possibility of some more features in the future.
- MBS-12208: Show withdrawn release groups in the official artist overview. The Withdrawn release status was added recently. Release groups containing only releases with this status were meant to be shown in the main (official) artist overview, but the way this is implemented means a small edit to the
get_artist_release_group_rowsfunction was needed. Mirrors using the materialized tables will need to update the data; more info will be provided as part of the upgrade instructions.
We’ll post upgrade instructions for standalone/mirror servers on the day of the release. If you have any questions, feel free to comment below, or on the linked JIRA tickets if relevant there!
NB: This post was updated on 21 March to include a ticket (MBS-12208) that we forgot to list originally