Breaking Changes – Page 5 – MetaBrainz Blog

May 14 schema change release time

The schema change release is approaching fast and we’re making good progress taking care of all the tasks that need to be completed. During yesterday’s meeting we agreed on the time to do this release: 14 May 1600UTC (9am PDT, 11am Chicago, 17h BST, 18h CEST, timezone map).

We will start the release process at this time — the exact time when the site will be down will depend on how long it will take us to walk through our pre-release checklist. We will be announcing the exact downtime via Twitter (@musicbrainz) and via a banner on the main MusicBrainz pages.

We expect our downtime to be somewhere between 15-30 minutes.

Thanks!

Schema change date change: Now 14 May, 2014

Due to a travel schedule snafu, we need to pull our schema change release day in by one day. The new release date for our schema change release is 14 May, 2014.

Sorry for having to make this change!

More details for our 2014-05-15 schema change release

As promised last week, here is a more detailed explanation of what is going to be changing during our May 15th schema change release:

[MBS-5978] – Replication feed is missing release_tag: This will not change any database tables but will finally include release_tag data in our Live Data Feed.
[MBS-6709] – "None" is no longer the last Packaging type after adding Book and Cassette Case: See below for more details.
[MBS-2410] – Label types not a tree anymore: This bug and MBS-6709 add “parent”, “child_order” and “description” columns to all of the yellow tables on this diagram except for language, script and script_language.
[MBS-2714] – Add support for Series: Details are provided in the ticket itself.
[MBS-6144] – Remove the apparently-unused script_language table: Exactly what it says on the tin.
[MBS-6602] – Remove sortnames from areas: Removes the sortname column from the area table.
[MBS-6603] – Remove sortnames from labels: Removes the sortname column from the label table.
[MBS-6651] – Make it possible to disable dates for relationship types: Adds a “has_dates” boolean column to link_type, to indicate whether a particular relationship type allows dates or not. The upgrade will set them all to “can have dates” for now and we will manually disable the appropriate ones later.
[MBS-7205] – Link types should track assumed cardinality: Details are provided in the ticket itself.
[MBS-3674] – Make instruments entities: This adds the following tables: instrument, instrument_type, instrument_gid_redirect, instrument_alias, instrument_alias_type, instrument_annotation, edit_instrument, l_*_instrument, l_instrument_* (the usual AR tables to other entities).

Next schema change release: May 15, 2014

As per our twice a year schedule of making schema changes that impact our Live Data Feed users, I’d like to announce the set of tickets that we’re going to implement for the next release.

We have quite a few clean-up tickets in this batch and a handful of improvements — some of the improvements are a lot of work, but won’t seem that way on the surface. However, the one major new feature we’re going to add are Series. Series will allow us to mark a set of releases as belonging to a series (e.g. Now this is what I call crappy music!). This has been a long long requested feature and we’re finally at a point where we can implement this.

The good news about this release is that a large number of the tickets are already implemented! The big tickets are still in progress, but the smaller ones are already in review. We’re hoping that we’ll have a less hasty push to the finish line this year with lots of time for testing. And we’re hoping for ponies too! 🙂

2013-10-14 schema change release update instructions

As promised, here are the instructions for updating any instances of MusicBrainz you might have. You will need to perform these steps to upgrade to the new version:

Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE
Switch to the new code with git fetch origin followed by git checkout v-2013-10-14
Run carton exec -Ilib — ./upgrade.sh (or simply ./upgrade.sh if you aren’t using carton, such as for VM users or fairly new installs).
Set DB_SCHEMA_SEQUENCE to 19 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed.

If your server fails to start or cron is having issues, it may be a missing perl module problem. To check for any missing perl modules, follow the instructions in INSTALL.md.

October 14 schema change complete

We’ve just finished rolling out the fall schema change release. Apart from the /place/create endpoint failing, the site is back up in full read-write mode now.

Read on for the changes in this release — our next blog post will give instructions on how to upgrade your instances of MusicBrainz.

Thanks to everyone who worked on this release!

Bug

[MBS-2301] – Attach TOC to new release – TOC/DiscId is lost when based on existing release/tracklist
[MBS-4453] – Duplicate artist credits
[MBS-5624] – Release groups don’t show the yellow removal warning when empty
[MBS-5647] – Release header of a different release appears on the release duplicates tab of the release editor
[MBS-6067] – Internal server error when using query parameter with a lookup in the webservice
[MBS-6211] – Aliases are missing the “ended” flag
[MBS-6373] – Area names being incorrectly translated
[MBS-6518] – Wikidocs pages keep 404ing
[MBS-6703] – Type displays as (none) on remove cover art edits.
[MBS-6715] – Internal server error looking up non-existing ISWC
[MBS-6717] – Some false reports of possible Artist collaborations
[MBS-6736] – ISE when giving a non-integer to entity/mbid/annotation/
[MBS-6748] – Internal server error loading some edit relationship edits
[MBS-6765] – Documentation search doesn’t work over https in recent Firefox/Chrome
[MBS-6782] – Artist overview page still displaying release date that was removed
[MBS-6787] – edit search merge works edits : ISWC are duplicated

Improvement

[MBS-631] – Add support for deprecating a relationship
[MBS-6068] – Remove the _name tables
[MBS-6069] – Track MBID webservice changes
[MBS-6182] – Deleted editors should be marked in a real way, not just designated by lack of password/well-known username
[MBS-6392] – Display ISNIs with spaces
[MBS-6543] – Highlight specified edit note when using edit note fragment in URL
[MBS-6564] – Add disambiguation comments to areas
[MBS-6706] – Improve the relationship type documentation display
[MBS-6713] – Ensure IMDB links are added at the right level
[MBS-6767] – Remove the Creative Commons download relationships report
[MBS-6779] – INSTALL.md doesn’t mention that you need to apt-get install cpanminus before trying to run cpanm.

New Feature

[MBS-5701] – Add a way to mark recordings as containing video
[MBS-6200] – Add a “place” entity
[MBS-6683] – Add autoselect for ReverbNation URLs

Task

[MBS-6046] – Remove PUID support
[MBS-6669] – Update the Allmusic logo used in the sidebar
[MBS-6732] – Add neyzen.com to score whitelist
[MBS-6766] – Delete unused root/release/full.tt

You can check out this release with the following tag: v-2013-10-14 .

Schema change release tomorrow at 1700UTC

Tomorrow, Monday 14 October, 2013, 17:00 UTC (10:00 PDT, 13:00 EDT, 18:00 BST, 19:00 CEST) we’re going to release our next round of schema changes!

As it is typical with our fall schema changes, this one is a little simpler and more focused on cleanup, rather than massive new changes. This gives me hope that we will have smoother release than we did in the spring. 🙂

We’re going to make the site read-only and run off our backup database server while we upgrade our primary database server. I suspect that we should be in read-only mode for about an hour. The exact start time is not quite known — we’ll start our release process at 17:00 UTC, but when we go to read-only is hard to tell. We’ll tweet and give a shout in IRC when we’re ready.

Schema 17/18 upgrade instructions

We’ve just completed our extra schema upgrade. The full instructions for upgrade follow:

Schema 16 to schema 17 upgrade

If you already ran the migration that was announced May 15th, or if you imported a data dump from May 15th or later, skip to the next section.

Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 16.
Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
Switch to the new code with git fetch origin followed by git checkout schema-16-to-17
Run carton install --deployment to install any new perl modules.
Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed.

Schema 17 to schema 18 upgrade

Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 17.
Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
Switch to the new code with git fetch origin followed by git checkout v-2013-05-24
Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
Set DB_SCHEMA_SEQUENCE to 18 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed. EDIT: also restart memcached here, see http://tickets.musicbrainz.org/browse/MBS-6376

Note that the tags to check out for the two migrations are different.

Changes

For the list of changes in schema 17, see the former blog post. The changes for schema 18 are:

Fix the track table corruption that the schema 16-17 upgrade created, by importing a copy of the ‘track’ table from the production database.
Fix some indexes and constraints that should not be on slaves or which had bad names starting with ‘medium2013’ or ‘track2013’
Create a missing index on medium.release that dramatically improves performance.
Fix the ref_count column of the artist_credit table, which was not updated properly at the schema 16-17 upgrade.

Urgent schema update required

On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!

This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.

We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.

What went wrong

Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.

How are we fixing it

In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.

Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!

What problems may have arisen

In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.

FAQ about this update

Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.

Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “

SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
     medium.track_count  count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id)  medium.track_count;

Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:

SELECT distinct m.id FROM
  (SELECT DISTINCT medium.* FROM
    ( SELECT track.medium, min(track.position) AS first_track, max(track.position)
      AS last_track, count(track.position) AS track_count, sum(track.position)
      AS track_pos_acc
      FROM track
      GROUP BY track.medium) s
    JOIN medium ON medium.id = s.medium
    WHERE first_track != 1 OR last_track != s.track_count OR
        (s.track_count * (1 + s.track_count)) / 2  track_pos_acc
    ) m

(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)

For more safety, don’t trust anything in the track table that’s been updated since the schema change:

SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’

If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.

Again, we’re sorry for the trouble this update may have caused you!

Issues with 2013-05-15 schema change and the 'track' table.

As a heads-up for anyone using postgresql 9.1 or later (9.0 is the only confirmed-correct version) anyone running a slave server, it appears that there’s an issue with the upgrade script which will result in an incorrect track table in most cases.

~~An ostensible fix that was previously mentioned here does not work. We’re still working on a fix and will update this post as we have more details.~~

There is a fix, see http://blog.musicbrainz.org/?p=1962 for instructions. Thanks for your patience!