Schema Change Release – Page 5

October 14 schema change complete

We’ve just finished rolling out the fall schema change release. Apart from the /place/create endpoint failing, the site is back up in full read-write mode now.

Read on for the changes in this release — our next blog post will give instructions on how to upgrade your instances of MusicBrainz.

Thanks to everyone who worked on this release!

Bug

[MBS-2301] – Attach TOC to new release – TOC/DiscId is lost when based on existing release/tracklist
[MBS-4453] – Duplicate artist credits
[MBS-5624] – Release groups don’t show the yellow removal warning when empty
[MBS-5647] – Release header of a different release appears on the release duplicates tab of the release editor
[MBS-6067] – Internal server error when using query parameter with a lookup in the webservice
[MBS-6211] – Aliases are missing the “ended” flag
[MBS-6373] – Area names being incorrectly translated
[MBS-6518] – Wikidocs pages keep 404ing
[MBS-6703] – Type displays as (none) on remove cover art edits.
[MBS-6715] – Internal server error looking up non-existing ISWC
[MBS-6717] – Some false reports of possible Artist collaborations
[MBS-6736] – ISE when giving a non-integer to entity/mbid/annotation/
[MBS-6748] – Internal server error loading some edit relationship edits
[MBS-6765] – Documentation search doesn’t work over https in recent Firefox/Chrome
[MBS-6782] – Artist overview page still displaying release date that was removed
[MBS-6787] – edit search merge works edits : ISWC are duplicated

Improvement

[MBS-631] – Add support for deprecating a relationship
[MBS-6068] – Remove the _name tables
[MBS-6069] – Track MBID webservice changes
[MBS-6182] – Deleted editors should be marked in a real way, not just designated by lack of password/well-known username
[MBS-6392] – Display ISNIs with spaces
[MBS-6543] – Highlight specified edit note when using edit note fragment in URL
[MBS-6564] – Add disambiguation comments to areas
[MBS-6706] – Improve the relationship type documentation display
[MBS-6713] – Ensure IMDB links are added at the right level
[MBS-6767] – Remove the Creative Commons download relationships report
[MBS-6779] – INSTALL.md doesn’t mention that you need to apt-get install cpanminus before trying to run cpanm.

New Feature

[MBS-5701] – Add a way to mark recordings as containing video
[MBS-6200] – Add a “place” entity
[MBS-6683] – Add autoselect for ReverbNation URLs

Task

[MBS-6046] – Remove PUID support
[MBS-6669] – Update the Allmusic logo used in the sidebar
[MBS-6732] – Add neyzen.com to score whitelist
[MBS-6766] – Delete unused root/release/full.tt

You can check out this release with the following tag: v-2013-10-14 .

Schema change release tomorrow at 1700UTC

Tomorrow, Monday 14 October, 2013, 17:00 UTC (10:00 PDT, 13:00 EDT, 18:00 BST, 19:00 CEST) we’re going to release our next round of schema changes!

As it is typical with our fall schema changes, this one is a little simpler and more focused on cleanup, rather than massive new changes. This gives me hope that we will have smoother release than we did in the spring. 🙂

We’re going to make the site read-only and run off our backup database server while we upgrade our primary database server. I suspect that we should be in read-only mode for about an hour. The exact start time is not quite known — we’ll start our release process at 17:00 UTC, but when we go to read-only is hard to tell. We’ll tweet and give a shout in IRC when we’re ready.

Schema 17/18 upgrade instructions

We’ve just completed our extra schema upgrade. The full instructions for upgrade follow:

Schema 16 to schema 17 upgrade

If you already ran the migration that was announced May 15th, or if you imported a data dump from May 15th or later, skip to the next section.

Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 16.
Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
Switch to the new code with git fetch origin followed by git checkout schema-16-to-17
Run carton install --deployment to install any new perl modules.
Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed.

Schema 17 to schema 18 upgrade

Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 17.
Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
Switch to the new code with git fetch origin followed by git checkout v-2013-05-24
Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
Set DB_SCHEMA_SEQUENCE to 18 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed. EDIT: also restart memcached here, see http://tickets.musicbrainz.org/browse/MBS-6376

Note that the tags to check out for the two migrations are different.

Changes

For the list of changes in schema 17, see the former blog post. The changes for schema 18 are:

Fix the track table corruption that the schema 16-17 upgrade created, by importing a copy of the ‘track’ table from the production database.
Fix some indexes and constraints that should not be on slaves or which had bad names starting with ‘medium2013’ or ‘track2013’
Create a missing index on medium.release that dramatically improves performance.
Fix the ref_count column of the artist_credit table, which was not updated properly at the schema 16-17 upgrade.

Urgent schema update required

On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!

This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.

We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.

What went wrong

Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.

How are we fixing it

In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.

Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!

What problems may have arisen

In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.

FAQ about this update

Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.

Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “

SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
     medium.track_count  count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id)  medium.track_count;

Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:

SELECT distinct m.id FROM
  (SELECT DISTINCT medium.* FROM
    ( SELECT track.medium, min(track.position) AS first_track, max(track.position)
      AS last_track, count(track.position) AS track_count, sum(track.position)
      AS track_pos_acc
      FROM track
      GROUP BY track.medium) s
    JOIN medium ON medium.id = s.medium
    WHERE first_track != 1 OR last_track != s.track_count OR
        (s.track_count * (1 + s.track_count)) / 2  track_pos_acc
    ) m

(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)

For more safety, don’t trust anything in the track table that’s been updated since the schema change:

SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’

If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.

Again, we’re sorry for the trouble this update may have caused you!

Issues with 2013-05-15 schema change and the 'track' table.

As a heads-up for anyone using postgresql 9.1 or later (9.0 is the only confirmed-correct version) anyone running a slave server, it appears that there’s an issue with the upgrade script which will result in an incorrect track table in most cases.

~~An ostensible fix that was previously mentioned here does not work. We’re still working on a fix and will update this post as we have more details.~~

There is a fix, see http://blog.musicbrainz.org/?p=1962 for instructions. Thanks for your patience!

Schema change release, 2013-05-15

Today we released a schema change update for MusicBrainz. Schema change updates change the format of the underlying MusicBrainz database and allow us to store more information, or model information in a richer/more correct form.

Summary

This release specifically includes some exciting new features:

Areas

A new entity is this release is the “area” entity, which can track countries, subdivisions of countries, cities, and other such location entities (venues, however, will be another entity). While this release primarily only introduces the entity, migrating only our existing list of countries, it’s now also possible to add start and end locations to artists, and to mark works as anthems. Editing of areas, their aliases and annotations, and area-area and area-url relationships, is limited to a new class of “Location Editor”.

ISNI Codes

The International Standard Name Identifier (ISNI) is an ISO standard that identifies public identities of parties. We now have support for storing ISNI codes inside the MusicBrainz database, which will make it easier to cross-reference data in MusicBrainz with other databases.

Multiple Release Events

Releases can now have multiple date and country pairs, whereas previously they could only have one country and one date. This will allow us to more accurately store information about releases that occur in different areas at different dates, but are otherwise the same physical product.

Forthcoming Features

We also began work on the database support for some future MusicBrainz features:

Track MBIDs

All tracks on mediums now have unique identifiers. This will allow people to refer to a specific track in a release in a way that is more resilient to editing than just the track name or position. Currently we have database support for this, but track identifiers are not yet exposed in either the website or the web service.

Dynamic Work Attributes

Dynamic work attributes will let us introduce new attributes to describe works without schema changes

Free Text Relationship Attribute Credits

This feature will let editors specify an alternative name for relationship attributes to specifically exactly which model guitar was used in a recording, rather than the current vague “electric guitar” attribute. Support for this feature is now in the database (in the link_attribute_credit) table, but the UI to do this editing is still to be finished.

Support more formats in the Cover Art Archive

Uploading cover art to the cover art archive will soon support a few other image formats, starting with PNG.

Other Schema Changes

The remaining smaller schema changes are:

The wiki transclusion version mapping is now stored in the database, not a flat file.
The link_type.short_link_phrase was renamed to link_type.long_link_phrase.
The work.artist_credit column was dropped.
Collections can now have a description.

Upgrading

~~If you are currently running a slave database, then you will need to perform a few manual steps to upgrade to the new version:~~

~~Take down the web server running MusicBrainz, if you’re running a web server.~~
~~Turn off cron jobs if you are automatically updating the database via cron jobs.~~
~~Make sure your REPLICATION_TYPE setting is RT_SLAVE~~
~~Switch to the new code with git fetch origin followed by git checkout v-2013-05-15~~
~~Run carton install --deployment to install any new perl modules.~~
~~Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.~~
~~Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm~~
~~Turn cron jobs back on, if needed.~~
~~Restart the MusicBrainz web server, if needed.~~

Please see the instructions in our more recent blog post instead of these instructions.

Release Notes

This release wouldn’t have possible without help from Alastair Porter, Michael Wiencek, Nicolás Tamargo or the rest of the MusicBrainz team – thank you all for your hard work! As we missed the previous release, there are a few other changes in this release. Here are the full release notes:

Bug

[MBS-4703] – Add Medium edit does not correctly display the auto-edit note
[MBS-5834] – Users’ votes page mistitled as “edits”
[MBS-5851] – Instruments are missing from the JSON webservice responses
[MBS-5979] – Automatic redirect to beta clear release editor seeding
[MBS-6015] – Edit Artist Credit edits affecting track ACs don’t appear in related release edit histories
[MBS-6129] – Relationship editor isn’t correctly parsing attributes
[MBS-6149] – Entity merges silently dropping aliases with locales
[MBS-6178] – URL page headers are completely inconsistent
[MBS-6196] – work/edit_form.tt includes artist credit docs
[MBS-6249] – Add Event button broken on add release with cdtoc

Improvement

[MBS-1346] – New Report: Artists with 0 subscribers
[MBS-2229] – Allow multiple release events per release
[MBS-3626] – Display license logos in the sidebar
[MBS-3669] – Merge dated and undated relationships
[MBS-4115] – Cover art archive: Support .png SQL changes
[MBS-4294] – Add a “description” field to collections (SQL/UI)
[MBS-4756] – Move the wiki transclusion index to the database
[MBS-4866] – URL autoselect: ameblo.jp -> has blog at
[MBS-4867] – Timeline graph rate-of-change graph should change its vertical scaling depending on where it’s zoomed
[MBS-4925] – Add country of birth and country of death to Artist (person)
[MBS-5528] – Change short_link_phrase to long_link_phrase
[MBS-5772] – Generate relationship documentation (semi-)automatically
[MBS-5848] – Instrument credits (SQL)
[MBS-6023] – Track MBID UI changes
[MBS-6141] – Add discography page URL matching for universal-music.co.jp, lantis.jp, jvcmusic.co.jp, wmg.jp, avexnet.jp and kingrecords.co.jp
[MBS-6142] – Prevent Wikipedia links from being added as discography page relationships
[MBS-6188] – Remove rating from work merge page
[MBS-6189] – Show work languages on ISWC page
[MBS-6190] – Artist credit diffs per word, join phrase per char

New Feature

[MBS-799] – Location, venue and event support
[MBS-1839] – Track MBID SQL changes
[MBS-2417] – Support multiple countries/regions on a single release
[MBS-3296] – Add dynamic attributes
[MBS-3985] – Support multiple artist countries
[MBS-5272] – Create daily and weekly “rollup” replication packets
[MBS-5302] – Store International Standard Name Identifier (ISNI, ISO 27729) for artists and labels
[MBS-5861] – Dynamic work attributes (SQL)

Task

[MBS-5314] – Drop the work.artist_credit column
[MBS-6133] – Ensure JPEG uploads still work through CAA
[MBS-6167] – Add iTunes links to the sidebar
[MBS-6170] – Add Bandcamp links to the sidebar

Sub-task

[MBS-4217] – Spotify relationship under the External links section
[MBS-5809] – SQL changes for MBS-4294: Add a “description” field to collections
[MBS-5917] – transclusion to DB: SQL
[MBS-5918] – transclusion to DB: UI
[MBS-5919] – locations: SQL
[MBS-5920] – locations: UI
[MBS-5929] – Schema changes to store relationship type guidelines and example usages in the database
[MBS-5930] – Generate documentation automatically/allow managing guidelines in DB
[MBS-5933] – Schema changes to support multiple (date, country) pairs on releases
[MBS-6187] – UI changes to support multiple country/date pairs on releases

Schema change release tomorrow 15 May, 1300UTC

Things have been quiet at MusicBrainz in the past few weeks, but don’t let this fool you! We’ve been working hard on getting the schema change release put together. We’re on schedule for releasing this tomorrow at 13:00 UTC, 15 May, 2013.

We’re going to start the release process at 1300UTC, but the site may not go down just yet then. We’ll get started once we have all of our ducks in a row — to get more updates from us before we start, please follow @musicbrainz on Twitter or join us in IRC at #musicbrainz on irc.freenode.net.

Thanks, and wish us luck for a smooth schema change tomorrow!

Official schema change notification for 15 May, 2013

We’re nearly done implementing the SQL portions of our tickets for the upcoming schema change on 15 May, 2013. We’ve settled on the following tickets that we plan to release:

MBS-5861: Dynamic work attributes
MBS-3978: Support more than one barcode on same release
MBS-4756: Move the wiki transclusion index to the database
MBS-799: Location, venue and event support
MBS-3985: Support multiple artist countries
MBS-4925: Add country of birth and country of death to Artist (person)
MBS-4115: Cover art archive: Support .png SQL changes
MBS-1839: Track MBID SQL changes
MBS-5809: Add a “description” field to collections
MBS-5314: Drop the work.artist_credit column
MBS-5302: Store International Standard Name Identifier (ISNI, ISO 27729) for artists and labels
MBS-5528: Change short_link_phrase to long_link_phrase
MBS-2229: Allow multiple release events per release
MBS-2417: Support multiple countries/regions on a single release
MBS-5772: Generate relationship documentation (semi-)automatically
MBS-5848: Instrument credits

Each of the tickets above will give you a complete idea of how we plan to change our schema on May 15th. Questions? Post a question in the comments and we will answer it.

Finally, we are going to require that Postgres 9.1 will be the minimum version of Postgres going forward. I’ve spoken to many people about this and it seems that a large percentage of people are already using Postgres 9.1, so this should not be a major change.

Thanks!

Preparing for the May 15th schema change release

It it time for us to start the process towards the next schema change release. Starting today and for the next two weeks, we’re going to seek people to be the champion (sponsor) of a ticket. If you feel strongly about a schema change ticket getting taken care of, you should consider championing this ticket. Once you’ve decided to do adopt a ticket, you should assign the ticket to yourself.

Then, over the next two weeks it will be up to you to do the following:

Drive consensus around the core concept of the ticket. If you go through the process of working up a ticket, but no one agrees with what you’re proposing, you’ve wasted your time. Make sure that you get buy in from others in the community. For instance, if Nikki doesn’t like it, chances are its not going to fly. 🙂
Each schema change feature requires two tickets: 1) An SQL ticket that implements the actual changes to the database and defines the queries used to fetch the data. 2) A UI change ticket that implements the UI portions of the schema change ticket.
Ensure that the ticket clearly states what needs to be done to implement the ticket. The ticket should essentially become or link to a requirements document. This requirements document should explain what the new feature should do. It should not explain how it should be done — we should leave the how to our developers who are going to implement the feature.
Provide as much supporting documentation as you can. Mock-ups for UIs are deeply appreciated (even if they delve into the how realm of things) and very useful for meaningfully discussing these tickets.
Have the ticket reviewed by a developer for clarity and completeness, then address any issues said developer may raise.

On 15 February, we’re going to look at the list of tickets that people have taken on and choose the ones that are clear enough to move forward. If you’ve done all the work outlined above, the chances are good that your ticket will be chosen to move forward. If your ticket is chosen to move forward, there will be more questions that the developers will raise — hopefully those can be tackled in the space of a week. After that we will take all of the well defined tickets and schedule them for implementation. All the other tickets that are not clear to implement will be rejected and will have to make another pass though this process in the autumn.

If you’re still interested, here is the list of schema change tickets that should be considered for this.

We’re going to follow the this schedule:

1 Feb: Schema change ticket selection starts
15 Feb: Select schema change tickets for implementation, start making tickets fully actionable
1 March: Tickets must be fully actionable. Tickets that are not actionable will be dropped from the 15 May release.
15 March: SQL tickets must be fully implemented.
1 May: UI tickets must be fully implemented, start final ticket testing phase
15 May: Release day

All of these dates have been added to our new community calendar.

Updating MusicBrainz slave instances for 2012-10-15

If you have a replicated instance of MusicBrainz, please follow these instructions to get your server running on the new schema:

Take down the web server running MusicBrainz, if you’re running a web server.
Turn off cron jobs if you are automatically updating the database via cron jobs.
Make sure your REPLICATION_TYPE setting is RT_SLAVE
Switch to the new code with git fetch origin followed by git checkout v-2012-10-15-schema-change
Run carton install --deployment to install any new perl modules.
Run carton exec -- ./upgrade.sh from the top of the source directory.
Set DB_SCHEMA_SEQUENCE to 16 in lib/DBDefs.pm
Turn cron jobs back on, if needed.
Restart the MusicBrainz web server, if needed.

This upgrade requires quite a bit of disk-space to execute; your slave may run into trouble if there is less than 10Gb of disk space free. If you’re on a disk space constrained machine, you may want to consider re-importing the data rather than upgrading in place. The next data dump should be available in about 14-16 hours from now.