The end of the replication nightmare!

I’m pleased to report that our nightmare of finding/reconstructing the missing replication packets is finally over!

Through many heroic hours of work, Bitmap and Chirlu have reconstructed the missing replication packets. All clients should now be on their way to being up to date. We’ve learned a number of lessons (some good, some bad — that’s life, right?) in this ordeal and we hope to avoid these issues in the future.

An integral part of this recovery process were a number of people from our community who helped us: Users mbcz, rembo10 and xeam sent us their complete DB dumps! Bitmap used these to sanity check and diff several other database to finally extract the missing packets. Thank you for dropping what you were doing and sending us a few GB of data over blazingly fast connections. Without you this would not have been possible; and this is not an exaggeration. Thank you!

After some more rest we’re going to continue to put out smaller fires that remain from the move to NewHost, but for now, the big fires are put out. Just in time for the weekend!

In the 11 year history of the replication stream we’ve had to have users restart their stream about 3-4 times because of problems on our end. Zero would’ve been nicer, but I’m proud that we’ve been able to make this system work for so long. On a daily basis we seem to have about 400 replicated copies of MusicBrainz running all over the world. Clearly this part of our service is well used and I sleep a little better at night knowing that our most critical data is backed up across the globe.

Just for fun, here is a graph of the replication API usage over the last 6 months:

hourly-api-usage.png

Towards the end the graph shows the week plus long break, then a small blip as some of our replicas got unstuck yesterday and the much larger spike shows the rest of the replicas getting unstuck. Now, as to what caused the blip in mid-October — I have no idea.

Anyways, please accept my apologies for the replication stream outage and keep replicating!

Thanks!

Replication update: Do you have a DB that could help us?

In my post from yesterday I talked about our continuing struggle to fix our replication stream. Overnight we learned two new things:

  1. The lengthy hard drive recovery process has failed and yielded no useful results. 😦
  2. We have a working DB diff program in place that allows us to create missing replication packets from two DBs at known packet numbers.

#2 is a great step forward in fixing our replication stream, but we’re missing a specific replicated database at a specific point in time. If you have a replicated database, please read on and see if you can help us:

  1. Is your database at replication packet 99847? The way to find out your current replication sequence is to look in slave.log in your server directory or to issue the query
    select current_replication_sequence from replication_control;

    at the SQL prompt.

  2. Do you have a complete replicated MusicBrainz database, including the cover_art_archive schema?
  3. Are you willing to make a dump of the DB and send it to us?
  4. Do you have a fast internet connection to make #3 possible?

If you’ve answered yes to all of the above, please send email to support at metabrainz dot org.

Thanks!

 

Move to NewHost and Replication Update

It has been a long week since our move to the new hosting provider in Germany. Our move across the Atlantic worked out fairly well in the grand scheme of things. The new servers are performing well, the site is more stable and we have a modern infrastructure for most of our projects.

However, such moves are not without problems. While we didn’t encounter many problems, the most significant one we did encounter was the failure to copy two small replication packets off the old servers. We didn’t notice this problem until after the server in question had been decommissioned. Ooops.

And thus began a recovery effort that is almost worthy of a bad Hollywood B-movie plot. Between myself traveling and the team finishing the most critical migration bits, it took 2 days for us to realize the problem and find a volunteer to fetch the drives from the broken server. Only in a small and wealthy place such as San Luis Obispo, could a stack of recycled servers sit in an open container for 2 days and not be touched at all. My friend collected the drives and immediately noticed that the drives were damaged in the recycling process, which isn’t surprising. And we can consider ourselves really lucky that this drive didn’t contain private data — those drives have been physically destroyed!

Since then, my friend has been working with Linux disk recovery tools to try and recover the two replication packets off the drive. Given that he is working with a 1TB drive, this recovery process takes a while and must be fully completed before attempting to pull data off the drive. For now we wait.

At the same time, we’re actively cobbling together a method to regenerate the lost packets. In theory it is possible, but it involves heroic efforts of stupidity. And we’re expending that effort, but so far, it bears no fruit.

In the meantime, for all of the people who use our replicated (Live Data) feed — you have the following choices:

  1. If you need data updates flowing again as soon as possible, we strongly recommend importing a new data set. We have a new data dump and fresh replication packets being put out, so you can do this at any time you’re ready.
  2. If the need for updates is not urgent yet and you’d rather not reload the data, sit tight. We’re continuing our stupidly heroic efforts to recover the replication packets.
  3. Chocolate: It really makes everything better. It may not help with your data problems, but at least it takes the edge off.

We’re terribly sorry for the hassle in all of this! Our geek pride has been sufficiently dinged that our chocolate coping mechanisms will surely cause us to put on a pound or two.

Stay tuned!

UPDATE 1: The first recovery examination has not located the files, but my friend will do a second pass tomorrow and turn over file fragments to us that might allow us to recover files. But that won’t be for another 8 hours or so.

Important Update on Replication

We’ve been informed that some people have noticed their MusicBrainz slave servers have stopped applying replication packets. We’ve tracked the problem down and now have a fix for this problem.

How Do I Tell If I’m Affected?

To determine if you are affected, connect to the MusicBrainz PostgreSQL database (we recommend using ./admin/psql from your musicbrainz-server checkout) and run the following:

SELECT now() - last_replication_date FROM replication_control;

If you see a date that is larger than a few days, there’s a good chance you are affected by this bug and should upgrade.

How Do I Fix This?

Fixing this is easy, you simply need to update your Git checkout. We suggest the following:

cd musicbrainz-server
git fetch --tags origin
git checkout v-2013-10-14-replication-fix-2

EDIT: formerly used a different (broken) tag and recommended a different method of fetching tags.

When replication next runs, the data that cannot be applied will be rewritten and replication will continue as normal.

Musing: compacting replication packets

I’ve recently been working to write more about MusicBrainz internals, and thoughts about the project. Often this blog doesn’t see many posts, and most of them are on official topics like releases, so putting this here is an experiment. I hope you’ll all enjoy hearing about something a bit less concrete (and perhaps less dry, more technical) than usual!

How Replication Works

Replication is a pretty important part of MusicBrainz, though perhaps to the average user it’s a bit hidden. For those readers who aren’t familiar with it, replication is the mechanism behind the live data feed: users have a PostgreSQL database and using tools we provide (or some third-party alternatives) regularly download and apply so-called “replication packets” which describe the changes to the database in a specific period.

Replication packets are .tar.bz2 archives with a collection of files in them: COPYING with the license info, README with a very sparse description of replication, SCHEMA_SEQUENCE with the version of the database schema the replication packet applies to, REPLICATION_SEQUENCE with a sequence number that the code uses to apply replication packets in the correct order, TIMESTAMP with, well, a timestamp, and finally a folder mbdump, which contains two files: dbmirror_pending and dbmirror_pendingdata. Those of you who use the MusicBrainz data dumps may recognize this format: it’s the same as the data dumps. dbmirror_pending and dbmirror_pendingdata are two database tables that are used by replication to store the data about changes while those changes are being applied to the database.

Let’s look more closely at what those two tables contain. dbmirror_pending has columns for a sequence ID, a fully-qualified table name, an operation, and a transaction ID. dbmirror_pendingdata has columns for a sequence ID, a boolean for if data specifies keys only, and finally for the change data itself. Jointly, these two tables combine, conceptually, into an ordered list of operations to perform on the database. Since I tend to think in JSON, here’s a way you could imagine a single operation looking:

{"table": "musicbrainz.release",
"operation": "update",
"existing_row": "\"id\"='1290306' \"name\"='620308' \"artist_credit\"='234861' \"release_group\"='1269028' \"status\"='1' \"packaging\"='1' \"country\"='150' \"language\"='120' \"script\"=",
"update_row": "\"id\"='1290306' \"gid\"='e37dfeea-0f25-48fa-85c0-b4d174ff172d' \"name\"='620308' \"artist_credit\"='234861' \"release_group\"='1269028' \"status\"='1' \"packaging\"='1' \"country\"='150' \"language\"='120' \"script\"= \"date_year\"='2009' \"date_month\"= \"date_day\"= \"barcode\"='8715777007870' \"comment\"='' \"edits_pending\"='3' \"quality\"='-1' \"last_updated\"='2013-05-15 13:01:05.065623+00'"}

This operation would specify that it should update the table musicbrainz.release by taking the row whose id is 1290306, gid is e37dfeea-0f25-48fa-85c0-b4d174ff172d, name is 620308, etc. (as listed in ‘existing_row’) and change it to have id 1290306, gid e37dfeea-0f25-48fa-85c0-b4d174ff172d, name 620308, etc. (as listed in update_row).

Compacting replication packets

So there’s a summary of how replication works, in rough, abstract terms. Now on to the real topic of the post: making replication packets more compact. As the current system works, every change is put into the replication packet; that is, if in the course of an hour (one replication packet), a table changes twenty times, then twenty operations will end up in the replication packet. Sometimes this is useful: some data users use database-level triggers to update their own derived information, and sometimes that requires seeing every change, even if it changes again very soon thereafter. However, for most people, replication is just a way to get their database up to date every hour. For these people, having all twenty updates is wasteful — as far as they’re concerned, it could just be a simple update from how the row looked at the start of the hour to how it looked at the end. This becomes especially true for the (currently rather underused) larger replication packets (daily and weekly, specifically), which include more changes (and thus more changes to the same rows).

To formalize this idea a bit more, let’s make one more abstraction: a chain of operations. A chain is, informally, an ordered list of operations on the same data (or the same row). The simplest chain is just one operation, and from there we can work with chains in a variety of ways:

  • Chains can be combined: if the final state of a chain corresponds to the initial state of another chain, and the first chain’s final operation is before the initial operation of the second (without any other chains whose initial state corresponds to the first chain’s final state in-between), then the two chains can be combined into one chain.
  • Chains can be reordered: if there is no way to combine two chains by the above rule (including by way of intermediate operations or chains), then the two chains operate on completely separate data and thus can happen in any order. (For the database-savvy who might have noticed: slave databases, that use replication, don’t have foreign keys or other constraints which might make this not true).
  • Chains can be combined even more: If the final operation of a chain is a deletion, and the initial operation of another is an insertion, the two chains can be combined by turning the deletion + insertion into a single update. (As you might imagine, the ability to change the order of chains helps a lot here!)
  • Chains can be collapsed: perhaps most important! Any number of updates in a chain can be turned into a single update, from the initial state of the first update to the final state of the last update. Additionally, an insertion followed by an update can be turned into a single insertion, directly to the final state of the update. Additionally, an update followed by a deletion can be turned into a single deletion, directly from the initial state of the update. Finally, an insertion followed by a deletion can be ignored entirely, since it has no lasting effect on the database.

Thus, by creating, combining, reordering, and collapsing chains, we can make replication packets do many fewer operations (which, in turn, can make the packets smaller and have them apply faster). I’ve still glossed over the details of how this could be implemented, though, so to wrap things up, here’s a basic algorithm for optimizing packets:

  1. Loop through operations in the order ProcessReplicationChanges (the script responsible for applying changes to the database from a replication packet) would: order the transactions in ascending order by the maximum sequence ID within the transaction, and within a transaction by ascending sequence ID.
  2. Take action depending on the type of operation:
    • If an insertion: see if the output already contains a deletion on the same table. If so, take the initial state of the deletion and the final state of the insertion and output an update instead of the deletion. If no such deletion exists, simply copy the insertion to the output.
    • If an update: see if the output already contains an insertion or an update on the same data (that is, find an operation whose final state matches the initial state of the update). If you find one, replace it with an operation of the same type (and initial state, if applicable) but with the final state of the new update instead of what it previously included. If you don’t find one, just copy over the update to the output.
    • If a deletion: see if the output already contains an insertion on the same data. If it does, remove it, add nothing to the output, and move on. If not, look for an update on the same data. If you find one, replace it with a deletion from the initial state of the update. If not, look for an insertion on the same table. If you find one, replace it with an update from the initial state of the deletion to the final state of the insertion. Finally, if you found nothing, copy the deletion to the output.
  3. Dump the output as a replication packet. Transactions shouldn’t matter, so put them all in the same transaction.

Final notes, FAQs

  • Why isn’t this already being done? Daily and weekly packets are relatively new, and since some data users do want to see every single operation, it doesn’t make sense to do this to the hourly packets. It’s also somewhat annoying to get the operations in the right order, because they aren’t in the replication packet dump, due to how transactions have to be processed. The musicbrainz-server code gets around this by importing the data to a PostgreSQL table and letting the database do the work of putting things in the right order. mbslave instead loads the entire packet into memory and sorts it there, which obviously is potentially dangerous with a larger packet (which, as noted above, are more likely to derive value from this process). Altogether, the safest way to implement this would be to mimic musicbrainz-server’s process, but to do this on the production servers it would need to use different table names so as to not interfere with the normal replication packet creation process. But mostly, because nobody’s written it yet.
  • How much benefit would this really bring? Simple answer: I don’t really know, but I know it’s some. Autoedits (additions, especially) have a tendency to produce multiple operations where one would do fine, because they first increment the ‘edits_pending’ column of whatever they’re editing in one operation, then in the process of applying the edit (automatically, and immediately, in the same transaction) decrement it. Editors also tend to change a bunch of different things about the same entity all at once, but sometimes not in one edit but rather in several. Of course, any deletion is easily counterbalanced by the many insertions that happen all the time in MusicBrainz; perhaps most notably, daily scripts run to clean up unused entities, so any packet that includes those changes will probably be able to collapse some insertion + deletion pairs. So there’s several cases where obvious chains would appear already.

Hopefully those of you who’ve made it down this far found this enlightening — or, at least, interesting. At some point in the future this might be something we do with the daily and weekly replication packets we’re already creating (currently just by concatenating together hourly packets), but for now there’s no solid plan to do so. Thus, just musing for now!

Thanks for reading!

Replication issues, and packet 64833 is large

  1. We’ve ushered in the new year by discovering, then solving some issues with replication; packet number 64831 (from yesterday, 1AM UTC) didn’t build correctly and needed a bit of manual prodding. However, it’s now been pushed out and replication should be back to normal.
  2. As part of our fix process, we turned off production of replication packets for the bulk of today. As a result, packet number 64833 covers what would otherwise have been about 20 packets, and thus is somewhat large. The import process for this packet will accordingly take somewhat longer than usual.

Sorry for any inconvenience, and happy new year!

Replication packet 61163 is large

I’d like to apologize for replication packet 61163 — our release created a very large replication packet (4.9Mb) that is going to take a while for clients to apply. Users using our slave software (musicbrainz-server and mbslave) can expect to see a much longer loading time and much greater use of disk space when this packet is applied.

Sorry for the inconvenience.