Schema 17/18 upgrade instructions

We’ve just completed our extra schema upgrade. The full instructions for upgrade follow:

Schema 16 to schema 17 upgrade

If you already ran the migration that was announced May 15th, or if you imported a data dump from May 15th or later, skip to the next section.

  1. Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 16.
  2. Take down the web server running MusicBrainz, if you’re running a web server.
  3. Turn off cron jobs if you are automatically updating the database via cron jobs.
  4. Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
  5. Switch to the new code with git fetch origin followed by git checkout schema-16-to-17
  6. Run carton install --deployment to install any new perl modules.
  7. Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
  8. Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm
  9. Turn cron jobs back on, if needed.
  10. Restart the MusicBrainz web server, if needed.

Schema 17 to schema 18 upgrade

  1. Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 17.
  2. Take down the web server running MusicBrainz, if you’re running a web server.
  3. Turn off cron jobs if you are automatically updating the database via cron jobs.
  4. Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
  5. Switch to the new code with git fetch origin followed by git checkout v-2013-05-24
  6. Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
  7. Set DB_SCHEMA_SEQUENCE to 18 in lib/DBDefs.pm
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if needed. EDIT: also restart memcached here, see http://tickets.musicbrainz.org/browse/MBS-6376

Note that the tags to check out for the two migrations are different.

Changes

For the list of changes in schema 17, see the former blog post. The changes for schema 18 are:

  1. Fix the track table corruption that the schema 16-17 upgrade created, by importing a copy of the ‘track’ table from the production database.
  2. Fix some indexes and constraints that should not be on slaves or which had bad names starting with ‘medium2013’ or ‘track2013’
  3. Create a missing index on medium.release that dramatically improves performance.
  4. Fix the ref_count column of the artist_credit table, which was not updated properly at the schema 16-17 upgrade.

Urgent schema update required

On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!

This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.

We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.

What went wrong

Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.

How are we fixing it

In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.

Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!

What problems may have arisen

  1. In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
  2. Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.

FAQ about this update

Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.

Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “

SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
     medium.track_count  count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id)  medium.track_count;

Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:

SELECT distinct m.id FROM
  (SELECT DISTINCT medium.* FROM
    ( SELECT track.medium, min(track.position) AS first_track, max(track.position)
      AS last_track, count(track.position) AS track_count, sum(track.position)
      AS track_pos_acc
      FROM track
      GROUP BY track.medium) s
    JOIN medium ON medium.id = s.medium
    WHERE first_track != 1 OR last_track != s.track_count OR
        (s.track_count * (1 + s.track_count)) / 2  track_pos_acc
    ) m

(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)

For more safety, don’t trust anything in the track table that’s been updated since the schema change:

SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’

If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.

Again, we’re sorry for the trouble this update may have caused you!

Search server regressions fixed

Yesterday we pushed out a new version of our search servers to fix some regressions introduced last week. Thanks to Paul Taylor for fixing these bugs so quickly.

Release Notes – MusicBrainz Search Server – Version 2013-20-05

Bug

  • [SEARCH-290] – REGRESSION WS2 RECORDING query returns cropped artist-credit
  • [SEARCH-294] – REGRESSION:Search results no longer include medium-list count attribute
  • [SEARCH-298] – REGRESSION:ws/1 release search seems broken

Improvement

  • [SEARCH-296] – Update README to point to up-to-date mmd-schema repository

Issues with 2013-05-15 schema change and the 'track' table.

As a heads-up for anyone using postgresql 9.1 or later (9.0 is the only confirmed-correct version) anyone running a slave server, it appears that there’s an issue with the upgrade script which will result in an incorrect track table in most cases.

An ostensible fix that was previously mentioned here does not work. We’re still working on a fix and will update this post as we have more details.

There is a fix, see http://blog.musicbrainz.org/?p=1962 for instructions. Thanks for your patience!

Search server release: 2013-05-15

Coninciding with our main server release, we’ve updated our search servers. This version fixes some bugs from the last release and adds support for countries and track ids.

Thanks for your hard work on this release, Paul!

Release Notes – MusicBrainz Search Server – Version 2013-05-15

Bug

  • [SEARCH-236] – Incomplete VA artist credit included for releases in recording search
  • [SEARCH-282] – REGRESSION:Johanne Sebastian Bach is not the first result when search for artist Bach
  • [SEARCH-283] – REGRESSION:"-" is returned instead of an empty list when there are no ISWCs for a work
  • [SEARCH-284] – REGRESSION:"-" is returned instead of an empty list when there are no ISRCs for a recording

Improvement

  • [SEARCH-219] – Include alias sortnames when searching labels or artists
  • [SEARCH-257] – entity search : entity name should have more weight than aliases and artist credits
  • [SEARCH-268] – Add extended alias info to the ws search results
  • [SEARCH-269] – WS searches don’t return aliases that match the artist name

Task

  • [SEARCH-274] – Support for changes to Countries in forthcoming Schema release
  • [SEARCH-285] – Support for TrackIds in forthcoming Schema Release

Schema change release, 2013-05-15

Today we released a schema change update for MusicBrainz. Schema change updates change the format of the underlying MusicBrainz database and allow us to store more information, or model information in a richer/more correct form.

Summary

This release specifically includes some exciting new features:

Areas

A new entity is this release is the “area” entity, which can track countries, subdivisions of countries, cities, and other such location entities (venues, however, will be another entity). While this release primarily only introduces the entity, migrating only our existing list of countries, it’s now also possible to add start and end locations to artists, and to mark works as anthems. Editing of areas, their aliases and annotations, and area-area and area-url relationships, is limited to a new class of “Location Editor”.

ISNI Codes

The International Standard Name Identifier (ISNI) is an ISO standard that identifies public identities of parties. We now have support for storing ISNI codes inside the MusicBrainz database, which will make it easier to cross-reference data in MusicBrainz with other databases.

Multiple Release Events

Releases can now have multiple date and country pairs, whereas previously they could only have one country and one date. This will allow us to more accurately store information about releases that occur in different areas at different dates, but are otherwise the same physical product.

Forthcoming Features

We also began work on the database support for some future MusicBrainz features:

Track MBIDs

All tracks on mediums now have unique identifiers. This will allow people to refer to a specific track in a release in a way that is more resilient to editing than just the track name or position. Currently we have database support for this, but track identifiers are not yet exposed in either the website or the web service.

Dynamic Work Attributes

Dynamic work attributes will let us introduce new attributes to describe works without schema changes

Free Text Relationship Attribute Credits

This feature will let editors specify an alternative name for relationship attributes to specifically exactly which model guitar was used in a recording, rather than the current vague “electric guitar” attribute. Support for this feature is now in the database (in the link_attribute_credit) table, but the UI to do this editing is still to be finished.

Support more formats in the Cover Art Archive

Uploading cover art to the cover art archive will soon support a few other image formats, starting with PNG.

Other Schema Changes

The remaining smaller schema changes are:

  • The wiki transclusion version mapping is now stored in the database, not a flat file.
  • The link_type.short_link_phrase was renamed to link_type.long_link_phrase.
  • The work.artist_credit column was dropped.
  • Collections can now have a description.

Upgrading

If you are currently running a slave database, then you will need to perform a few manual steps to upgrade to the new version:

  1. Take down the web server running MusicBrainz, if you’re running a web server.
  2. Turn off cron jobs if you are automatically updating the database via cron jobs.
  3. Make sure your REPLICATION_TYPE setting is RT_SLAVE
  4. Switch to the new code with git fetch origin followed by git checkout v-2013-05-15
  5. Run carton install --deployment to install any new perl modules.
  6. Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
  7. Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if needed.

Please see the instructions in our more recent blog post instead of these instructions.

Release Notes

This release wouldn’t have possible without help from Alastair Porter, Michael Wiencek, Nicolás Tamargo or the rest of the MusicBrainz team – thank you all for your hard work! As we missed the previous release, there are a few other changes in this release. Here are the full release notes:

Bug

  • [MBS-4703] – Add Medium edit does not correctly display the auto-edit note
  • [MBS-5834] – Users’ votes page mistitled as “edits”
  • [MBS-5851] – Instruments are missing from the JSON webservice responses
  • [MBS-5979] – Automatic redirect to beta clear release editor seeding
  • [MBS-6015] – Edit Artist Credit edits affecting track ACs don’t appear in related release edit histories
  • [MBS-6129] – Relationship editor isn’t correctly parsing attributes
  • [MBS-6149] – Entity merges silently dropping aliases with locales
  • [MBS-6178] – URL page headers are completely inconsistent
  • [MBS-6196] – work/edit_form.tt includes artist credit docs
  • [MBS-6249] – Add Event button broken on add release with cdtoc

Improvement

  • [MBS-1346] – New Report: Artists with 0 subscribers
  • [MBS-2229] – Allow multiple release events per release
  • [MBS-3626] – Display license logos in the sidebar
  • [MBS-3669] – Merge dated and undated relationships
  • [MBS-4115] – Cover art archive: Support .png SQL changes
  • [MBS-4294] – Add a “description” field to collections (SQL/UI)
  • [MBS-4756] – Move the wiki transclusion index to the database
  • [MBS-4866] – URL autoselect: ameblo.jp -> has blog at
  • [MBS-4867] – Timeline graph rate-of-change graph should change its vertical scaling depending on where it’s zoomed
  • [MBS-4925] – Add country of birth and country of death to Artist (person)
  • [MBS-5528] – Change short_link_phrase to long_link_phrase
  • [MBS-5772] – Generate relationship documentation (semi-)automatically
  • [MBS-5848] – Instrument credits (SQL)
  • [MBS-6023] – Track MBID UI changes
  • [MBS-6141] – Add discography page URL matching for universal-music.co.jp, lantis.jp, jvcmusic.co.jp, wmg.jp, avexnet.jp and kingrecords.co.jp
  • [MBS-6142] – Prevent Wikipedia links from being added as discography page relationships
  • [MBS-6188] – Remove rating from work merge page
  • [MBS-6189] – Show work languages on ISWC page
  • [MBS-6190] – Artist credit diffs per word, join phrase per char

New Feature

  • [MBS-799] – Location, venue and event support
  • [MBS-1839] – Track MBID SQL changes
  • [MBS-2417] – Support multiple countries/regions on a single release
  • [MBS-3296] – Add dynamic attributes
  • [MBS-3985] – Support multiple artist countries
  • [MBS-5272] – Create daily and weekly “rollup” replication packets
  • [MBS-5302] – Store International Standard Name Identifier (ISNI, ISO 27729) for artists and labels
  • [MBS-5861] – Dynamic work attributes (SQL)

Task

  • [MBS-5314] – Drop the work.artist_credit column
  • [MBS-6133] – Ensure JPEG uploads still work through CAA
  • [MBS-6167] – Add iTunes links to the sidebar
  • [MBS-6170] – Add Bandcamp links to the sidebar

Sub-task

  • [MBS-4217] – Spotify relationship under the External links section
  • [MBS-5809] – SQL changes for MBS-4294: Add a “description” field to collections
  • [MBS-5917] – transclusion to DB: SQL
  • [MBS-5918] – transclusion to DB: UI
  • [MBS-5919] – locations: SQL
  • [MBS-5920] – locations: UI
  • [MBS-5929] – Schema changes to store relationship type guidelines and example usages in the database
  • [MBS-5930] – Generate documentation automatically/allow managing guidelines in DB
  • [MBS-5933] – Schema changes to support multiple (date, country) pairs on releases
  • [MBS-6187] – UI changes to support multiple country/date pairs on releases

Schema change release tomorrow 15 May, 1300UTC

Things have been quiet at MusicBrainz in the past few weeks, but don’t let this fool you! We’ve been working hard on getting the schema change release put together. We’re on schedule for releasing this tomorrow at 13:00 UTC, 15 May, 2013.

We’re going to start the release process at 1300UTC, but the site may not go down just yet then. We’ll get started once we have all of our ducks in a row — to get more updates from us before we start, please follow @musicbrainz on Twitter or join us in IRC at #musicbrainz on irc.freenode.net.

Thanks, and wish us luck for a smooth schema change tomorrow!

Server Update 2013-04-22 and a Notice Regarding Passwords

We’ve just finished deploying a new update to the MusicBrainz website, but before covering release notes we’d like to clarify a few points regarding the recent password leak.

There have been a few reports suggesting that MusicBrainz was attacked or compromised, but we can assure that this is not the case. As stated in our original blog post, we accidentally included private information in our otherwise routine public backups. As the majority of MusicBrainz data is available under the Creative Commons BY-NC-SA or CC0, these backups are intended to be public – the mistake was the human error in accidentally including password hashes.

A few of you also mentioned that we didn’t respond in a timely fashion in alerting users via emails. We completely agree with these comments, and apologise for not being able to respond quicker. Unfortunately, we did not have the infrastructure to do such mass mailing, and as we didn’t want to rush this out to the point of making yet more errors, it took us a little longer than we all would have liked.

With that out of the way, here’s what we’ve done in the last fortnight. Many thanks to Michael Wiencek, Nicolás Tamargo, and the rest of the MusicBrainz team for their work on this release.

Bug

  • [MBS-4277] – Guess Case: Keep uppercase option should be clearer
  • [MBS-4468] – “Enter vote” button incorrectly also adds edit note
  • [MBS-5597] – Funky Caps inConsisTency in Edit search paGes (presets or not)
  • [MBS-5832] – Rating not showing on Merge Recordings page
  • [MBS-5967] – No feedback after editing a user
  • [MBS-5998] – Barcodes incorrectly represented in mo RDF
  • [MBS-6077] – Reorder tracks edit not working
  • [MBS-6081] – Viewing MusicBrainz Events and Cover Art information makes the CAA launch event appear to the left of the actual graph
  • [MBS-6082] – “Remove Label” edit page has broken documentation link
  • [MBS-6100] – Internal server error with webservice requests which require authentication
  • [MBS-6111] – Amazon taking precedence over CAA after merging releases
  • [MBS-6121] – Logging in with non-ascii password throws exception
  • [MBS-6122] – Unable to merge release groups when cover art is set
  • [MBS-6169] – Check for edit user uses the wrong sub

Improvement

  • [MBS-4632] – Link to archive.org item from cover art tab
  • [MBS-4658] – Report: duplicate ISWCs
  • [MBS-5261] – Link to How to Add Disc IDs from the Disc ID tab of releases
  • [MBS-5667] – Rename “Other release groups”
  • [MBS-5800] – Remove restriction on entering standalone recordings
  • [MBS-6001] – Make the autoeditor election overview more useful
  • [MBS-6034] – label search is missing country column
  • [MBS-6086] – Setting cover art type (from nothing to something) should be an autoedit
  • [MBS-6109] – Work search doesn’t show type
  • [MBS-6110] – Relate To URL should auto-identify audiojelly.com as “can be purchased for download” relation.
  • [MBS-6119] – Seeing vote counts on closed elections shouldn’t require log in

New Feature

  • [MBS-1891] – Anchors/Permalinks on Edit Notes
  • [MBS-5737] – Notification when RE has created a release

Task

  • [MBS-6096] – Add soundtrackcollector.com to the Other Databases whitelist

The Git tag for this release is v-2013-04-11.

libdiscid 0.5.0 released

A new libdiscid release was made available today.

Changes:

LIB-29: add read_sparse() for faster reading again
LIB-35: add HAVE_SPARSE_READ and VERSION_* defines
LIB-36: hide internal symbols on Linux/Unix
LIB-34: distmac and distwin32 cmake targets

The important change is the read_sparse() function:

Philipp Wolfer added a read_sparse() function. With this function you can either only read the TOC or specifiy which features of the disc you want to read.
The normal read() also extracts ISRCs starting with 0.3.0.
You might want to change existing applications to use read_sparse if you care about performance and don’t use ISRCs. The TOC is usually cached, so read_sparse() can be faster (0,5 vs. 3 seconds measured).
The difference is only in where the time is spent. It doesn’t really save overall time, but the TOC is read right when the disc is inserted so no additional disc access is performed when using the TOC in your application.
To make it possible to keep using read() when read_sparse() is not available we provide the HAVE_SPARSE_READ define, which can be used like that:

#ifndef DISCID_HAVE_SPARSE_READ
#define discid_read_sparse(disc, dev, i) discid_read(disc, dev)
#endif

discid_read_sparse(disc, device, 0)

We also provide defines for the libdiscid version numbers.
However, you should rather test for features/functions in the build files and create specific defines for your use case.
The above define is only provided as a convenience for read_sparse().

There are more details about the other changes in the full announcement mail.

If you didn’t follow the musicbrainz-devel list:
This year brought several new releases for libdiscid, starting with ISRC and MCN support in libdiscid 0.3.0. Applications using libdiscid 0.2.2 (or even lower) still work with libdiscid 0.5.0.

Information, documentation and other links are at:
http://musicbrainz.org/doc/libdiscid
That includes builds for Windows and Mac OS X.

Server Update, 2013-04-08

We’ve just finished pushing out another fortnight of work to the main MusicBrainz servers. While work continues on the forthcoming schema change release, we have fixed a few bugs and added a few improvements. Thanks to Lukáš Lalinský and the rest of the MusicBrainz team for their work on this release! Here’s what’s changed:

Bug

  • [MBS-5524] – Edit relationship type edits are missing information for identifying the relationship type
  • [MBS-5830] – Attributes are not shown in relationship type edits
  • [MBS-5964] – Artist credit’s sort name tooltip doesn’t update after edit
  • [MBS-5980] – Cover art comments can make the cover art page look very unstructured
  • [MBS-5993] – name variation checks happen after HTML entity conversion
  • [MBS-6019] – Some FreeDB data getting wrongly imported from the dumps
  • [MBS-6021] – Track count search in Add Disc uses full release count for multi-disc releases
  • [MBS-6038] – "See all {num} {entity}" is just wrong i18n-wise
  • [MBS-6042] – Release country field in edit search breaks after doing a search
  • [MBS-6043] – In front page "Recent Additions" the text for the cover art pop-up labels (HTML title attribute) is wronlgy escaped
  • [MBS-6075] – An error occured while loading this edit

Improvement

  • [MBS-5613] – Set cover art page does not show enough information
  • [MBS-6039] – Make it possible to display attributes after the link phrase

Sub-task

  • [MBS-4258] – Make dates translatable

The Git tag for this release is v-2013-04-08.