On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!
This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.
We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.
What went wrong
Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.
How are we fixing it
In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.
Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!
What problems may have arisen
- In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
- Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.
FAQ about this update
Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.
Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “
SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
medium.track_count count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id) medium.track_count;
Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:
SELECT distinct m.id FROM
(SELECT DISTINCT medium.* FROM
( SELECT track.medium, min(track.position) AS first_track, max(track.position)
AS last_track, count(track.position) AS track_count, sum(track.position)
AS track_pos_acc
FROM track
GROUP BY track.medium) s
JOIN medium ON medium.id = s.medium
WHERE first_track != 1 OR last_track != s.track_count OR
(s.track_count * (1 + s.track_count)) / 2 track_pos_acc
) m
(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)
For more safety, don’t trust anything in the track table that’s been updated since the schema change:
SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’
If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.
Again, we’re sorry for the trouble this update may have caused you!