Today we’re announcing a MusicBrainz database schema change release planned for May 16, 2022. The majority of these changes follow the theme of improving data integrity and consistency, performance, or just cleaning up old cruft. Others relate to new features for genres and artist credits. We’re also introducing a new entity based on tags, like genres that came before it: Mood. See below for more details, including information on how these changes affect the schema or existing data. We expect people will encounter zero breaking changes, but it doesn’t hurt to double check, especially if you have a specific or non-standard use of the database!
Here’s our list of tickets for the Spring 2022 schema change:
Schema changes
- MBS-12256: Keep
rating
andrating_count
column on*_meta
tables up-to-date with triggers. An internal-only change to help us keep aggregate rating information (i.e. an average rating and count of ratings for each entity) up-to-date more easily, and help keep these values accurate. This change affects master/standalone databases, but should have no impact on mirror servers, where such triggers are not created. It’s possible that some existing aggregate ratings data was out of sync and will be updated with this change. - MBS-12224: Keep tags’
ref_count
and aggregate vote counts updated with triggers. Like the change above for ratings, this is primarily internal-only and intended to help us keep tag counts in sync, though an adjacent goal is to make features like MBS-960 easier to implement. This also revives thetag.ref_count
column, which hasn’t been updated for years, in order to provide a faster way of sorting tags/genres by usage count. Like above, this change should have no impact on mirror servers schema-wise, but will fix some existing corrupt tag counts. - MBS-12249: Add a materialized
area_containment
table kept up-to-date with triggers. Pages that make use of area containments, e.g. the list of artists from an area, which are expected to account for sub-areas, are currently quite slow and we’d like to improve upon this. The slowness is related to the recursive queries we use to get contained sub-areas – these queries are uncached and calculated on-the-fly. This ticket addresses these performance issues by caching area containment information in a new aptly-namedarea_containment
table. Consistent with the tag and rating tickets above, this table will also be kept up-to-date with triggers. This change should have no impact on mirror servers except to make certain area requests faster; it does not affect existing data. - MBS-12250: Create dbmirror2 schema on production and mirror servers. The dbmirror extension we use to generate our replication packets a.k.a Live Data Feed is a 20 year old tool. It has issues and limitations that are difficult to fix, and we aim to replace it with something more maintainable. We wrote dbmirror2 to do that, but still have the task of getting it deployed to mirrors seamlessly. This will happen invisibly without any changes needed on mirrors! The action for this ticket is to simply create the schema for dbmirror2; it’s not actually used for replication yet. We’ll first have a testing phase and make sure external projects like mbdata work with the new replication packet format.
- MBS-12200: Drop schema objects related to Amazon cover art support. For a long while, releases with Amazon URLs would be checked for cover art on Amazon, and if found, a link to the image would be cached for display. Unfortunately Amazon’s API to do this changed, and we haven’t synced artwork from them in years. We still have many old images cached and we still display those, but they aren’t guaranteed to be in sync. Last year we decided to drop support for displaying these, while giving time for users to upload any correct images to the Cover Art Archive. To help with this, we have a report of releases with Amazon cover art but no Cover Art Archive front cover.
The schema change here involves dropping therelease_coverart
table (which was private and non-replicated) and therelease_meta.amazon_store
column (which was completely empty and unused). This change should have no impact on mirror servers, unless you were using this table or column for your own purposes, because they should be otherwise empty. - MBS-12141: Block tag names that are empty or have uncontrolled whitespace with database constraints. Recently we discovered a bug where empty or blank tag names could be submitted in the web service. This has since been fixed, but we’d also like to prevent such empty tags by adding a database constraint. This change has no effect on mirror servers schema-wise, where such constraints are not created. A few blank tags have already been deleted from the production database, but otherwise existing data is not affected. If any such tags exist in your standalone database, they’ll be deleted.
- MBS-12252: Add
edit_genre
table. A requirement to start storing edit history for genres (right now changes leave no trail). This will add the empty table to mirror servers as well, but will not affect any existing data. - MBS-12253: Add relationship tables for genres. A requirement to be able to relate genres to other entities (such as URLs for the equivalent Wikipedia or Rate Your Music pages). This will add the empty tables (and accompanying example tables in the documentation namespace) to mirror servers as well, but will not affect any existing data.
- MBS-12254: Add
genre_annotation
table. A requirement to make it possible to eventually add annotations to genres. This will add the empty table to mirror servers as well, but will not affect any existing data. - MBS-12255: Add
genre_alias_type
table and makegenre_alias
consistent. Originally the (as yet unused)genre_alias
table was designed as a heavily simplified version of the alias tables for other entities. In retrospect, this was not a good decision, since it would make it harder to just use the generic implementation of our alias code for genres. As such, we’re adding agenre_alias_type
table (originally genre aliases had no types) and replacing thegenre_alias
table with one having the extra columns matching other entities’ alias tables. These changes will also happen to mirror servers, but since thegenre_alias
table was completely unused it should not cause any issues. In case any standalone (not mirrored) servers were usinggenre_alias
, we will ensure any existing data is transferred to the new version of it. - MBS-12241: Drop the
whitespace_collapsed
database constraint. We’ve had a constraint for years that tries to ensure that columns like entity names do not contain multiple consecutive spacing characters (disallowing names such as “This Title”). In retrospect, this was overreaching, since there are several cases in which a specific number of spaces in a title can be shown to be artist intent. Additionally, we recently discovered that we had some very old data in the database that actually violated this constraint (causing issues when importing data to a standalone server). The data seemed to actually be correct (i.e. some of the aforementioned edge cases) so rather than amending it, we’re removing the constraint. This will have no effect on mirrors since they don’t run constraints. - MBS-12225: Rename “slave” to “mirror” (inclusive language update). We recently got a request from a long-time supporting organization to pick a different term for what we’ve historically called “slave server”. Since we were already sometimes using “mirror server” to mean the exact same thing, we are just changing the official name and will use “mirror server” in the future.
RT_SLAVE
will still work in DBDefs.pm, at least for now, but we’d suggest changing to the new (and equivalent)RT_MIRROR
in your mirror servers’ DBDefs.pm. We’re likely to eventually drop support forRT_SLAVE
in a future schema change, so we’ll remind you about changing it in future upgrade instructions. - MBS-12190: Add Mood support. Music mood was originally meant to be automatically calculated by the now-discontinued AcousticBrainz project. That’s obviously no longer in the cards, but this data would still be quite useful for ListenBrainz. As such, we’re planning to add basic support for mood tags in MusicBrainz in the same way we currently do genres. This can then be leveraged by ListenBrainz to collect the information directly from users playing music through their BrainzPlayer, and it can also of course be entered directly from MusicBrainz in the same way other tags (including genre tags) can. This will add new mood tables to mirrors and will detect some previously generic tags as mood tags, but it won’t cause any changes to the underlying data.
- MBS-11760: Expand the database triggers which remove empty tags to all entities. When the last use of a tag (up- or downvote) is removed, we use triggers to completely remove the tag from the database. Some of the relevant triggers (for events, places, recordings and releases) were never created though, so if the last use was on an entity of one of those types the empty tag would not get purged. This doesn’t affect mirrors directly since the triggers don’t run on mirrors (but they should no longer get unused tags coming through replication).
- MBS-11457: Drop series
ordering_attribute
. This column was added back when we were expecting to have different types of ordering attributes for series, but we have never used it. We planned to remove it last year already, but that required equivalent changes on the search server that couldn’t be made at the time. We are planning to just drop the column. - MBS-11456: Add MBIDs and redirect tables for artist credits. Adds a
gid
column to theartist_credit
table, and a newartist_credit_gid_redirect
table. It generates MBIDs for existing artist credits that will be replicated to mirrors.
Artist credit MBIDs will be mainly exposed through the web service at first. The MBIDs will allow public identification of artist credits outside of MusicBrainz, and open the possibility of some more features in the future.
- MBS-12208: Show withdrawn release groups in the official artist overview. The Withdrawn release status was added recently. Release groups containing only releases with this status were meant to be shown in the main (official) artist overview, but the way this is implemented means a small edit to the
get_artist_release_group_rows
function was needed. Mirrors using the materialized tables will need to update the data; more info will be provided as part of the upgrade instructions.
We’ll post upgrade instructions for standalone/mirror servers on the day of the release. If you have any questions, feel free to comment below, or on the linked JIRA tickets if relevant there!
NB: This post was updated on 21 March to include a ticket (MBS-12208) that we forgot to list originally
Could you please tell us, which of this changes affect the API?
For example: With MBS-12249 can we get the complete street/city/region/area/country with one API call?
@InvisibleMan
MBS-12249 should allow us to return the complete area containment efficiently, and this is something I hope to implement once the schema change is in place (however I’m not sure it will be part of the initial release — right now this is more of a backend change).
Other items that affect the web service APIs are also about returning additional information:
“MBS-11456: Add MBIDs and redirect tables for artist credits”
– artist-credit nodes will have a new gid attribute.
“MBS-12190: Add Mood support”
– we should be adding mood-list, user-mood-list, etc. along with corresponding inc=moods and inc=user-moods options to entities; the implementation will mirror genres.
“MBS-12255: Add genre_alias_type table and make genre_alias consistent”
– we don’t use genre aliases yet, but eventually this will be outputted in the web service.
“MBS-12254: Add genre_annotation table”
– once annotations are editable for genres, we should start outputting them in /ws/2/genre lookup requests.