Schema change release: What happened?

Now that we’ve finally finished the schema change release, I wanted to give an account of what happened in this arduous process. Before I dive into the details, I want to offer a picture that best sums up our current situation and challenges:

personal-container-mngmnt3

The shipping container is MusicBrainz and the boat is our hosting infrastructure. This picture perfectly describes the sort of challenges we’ve faced over the past few days. 🙂

Here is what happened:

Because the site was recently running slow and our search servers kept crashing, Zas and I were not available to help Bitmap prepare for the schema change release. This long process was left to Bitmap and Gentlecat to take care of on their own. We quickly realized that we were not ready for the release when the due date came and thus we delayed one week.

Sunday 22 May

Finally we were ready to proceed with the Postgres 9.5 upgrade. Once we started the process, we kept running into small problems that we didn’t get in our test setups. We do not have access to enough infrastructure to have a complete clone of our production environment, so we can only do so much to prepare for all the things that might happen when we run upgrades on our production servers.

All the while we attempted to start the upgrade, our backup database server was running much slower than anticipated. In the end we figured out that a step for optimizing the database (analyzing it) wasn’t carried out. During this time the site was really slow/unusable, but by the time the problem became apparent we had started the upgrade and could not turn back.

Once the upgrade was done, optimizing the database took much much longer than usual: 3 hours! This process wasn’t started until about 1am local time, which made for a very long night before that process finished. And even then we hit snags and had to start over a couple of times. At about 4:30am we had the site running on Postgres 9.5 in read only mode. The plan was to rest and start the schema change release in the morning.

Monday 23 May

Of course we had spent all of our time working on the Postgres upgrade and site stability, so our document that we use to plan the schema change was not in place. We spent the day preparing this and other bits for the release. To get an appreciation for what this document looks like, have a look! Note that some steps could be instant, others might take hours to carry out. Others might involve a sub-step or 20 not included in the document.

In the evening we were ready to make the change. By this point our backup DB was performing much better, so the read-only site worked acceptably. Thus, we started the release. Overall, the actual release process was reasonably smooth – we hit a few snags and had to do a lot of waiting for our slow servers. At about 1am in the morning things were finally complete. We proceeded with our sanity checks to make sure things went smoothly and all of them passed.

We proceeded to put the site into read-write mode and immediately saw portions of Postgres crashing, which is really bad. With community feedback we quickly deduced that some write operations were causing Postgres back-end processes to crash. We went back to read-only mode on the site and things stabilized and we finally went to bed at 3am.

Tuesday 24 May

In the morning we quickly found the source of database trouble with the help from the Postgres people on IRC. Thanks for the swift help Johto! We found that the steps for installing the updated third party extensions into Postgres had not completed correctly. Repeating the steps by hand fixed this problem.

Sadly yesterday morning we got an email informing us that our Live Data Feed replication stream had become corrupted. 🙁 This was heartbreaking news to us, since it means a great inconvenience to all of our Live Data Feed users. We immediately split into two teams: Zas, chirlu and myself to fix the root cause of the issue and Bitmap to investigate fixing the stream.

I proceeded to setup a test environment was able to quickly reproduce the problem. Zas and chirlu were an amazing support team Googling issues as I came across them. Within fairly short time we fixed the problem and deployed the fix to our database server. The problem was caused by a bug in a piece of code that we’ve been using for 13 years! A change in Postgres caused this bug to actually become a problem and corrupt our replication feed. 🙁

Once the problems were fixed we needed to initiate a new data dump and check to make sure the replication stream is working correctly. Of course we found a problem that we fixed and re-started the process to dump the data. Loads of hurry-up-and-wait situations to try our patience!

When we were satisfied that things were working correctly we re-enabled the site as read-write at about 1am and allowed people to continue editing. Exhausted we stumbled into bed waiting for data dumps to sync out to the FTP site.

Wednesday 25 May

Today Bitmap was flying home and as soon as WiFi became available on his flight he started working and helping with putting the schema change to bed. We’ve verified that everything is working as expected. At last this saga comes to and end and we can all take a break and catch up on sleep!

Thank you for your patience through all of this.

Schema change release, 2016-05-23 (with upgrade instructions)

Starting with this release, PostgreSQL 9.5 is now our minimum supported version. In order to import any future data sets, you will need to upgrade your installation to version 9.5.

Due to unforeseen problems with the Live Data Feed (AKA replication), users with slave databases will be required to first import a fresh data dump into their new 9.5 installation. We apologize that this is the case, but even had this stream not been broken, doing a clean import is faster and easier than doing the migration. For details on what happened during this rather lengthy schema change release, stay tuned for a post mortem blog post that covers the details.

If you have a non-replicated standalone database, you can use pg_upgrade and run ./upgrade.sh directly, but for simplicity we strongly recommend importing the latest data dump. Thus, we will only provide instructions for a clean import:

  1. Make sure you have PostgreSQL 9.5 installed, and your database settings in lib/DBDefs.pm are updated to point to the 9.5 installation if you currently have an older version of postgres running. If you already have postgres 9.5 and want to replace the existing database there, you’ll need to drop it first (using dropdb or from within psql). Be careful that you’re not dropping any important data if this is a standalone database that you’ve made changes to.
  2. Take down the web server running MusicBrainz, if you’re running a web server.
  3. Turn off cron jobs if you are automatically updating the database via cron jobs.
  4. Switch to the new code with git fetch origin followed by git checkout v-2016-05-23-schema-change-v2
  5. Run cpanm --installdeps --notest . to ensure your perl-based dependencies are up to date. Note the dot at the end.
  6. Set DB_SCHEMA_SEQUENCE to 23 in lib/DBDefs.pm
  7. Download the latest data dumps. If you don’t need historical edit data, excluding the edit dump will speed up your import significantly.
  8. Initialize a new database from the data dumps downloaded in step 7. Detailed instructions for doing this are located in INSTALL.md in the musicbrainz-server repository; if your data dumps are in /tmp, the command should simply be something like ./admin/InitDb.pl --createdb --import /tmp/mbdump*.tar.bz2.
  9. After the import has finished, turn cron jobs back on, if applicable.
  10. Restart the MusicBrainz web server, as well as memcached, if applicable.

We would like to thank bitmap, Gentlecat, zas, chirlu, reosarevok, gcilou for contributing directly to the release and we’d also like to thank all of the people who helped test, debug or otherwise offer support in this quite difficult release. Thank you!

And finally, here’s the list of changes you can expect in the upgrade:

Bug

  • [MBS-6406] – Admins can’t change email addresses
  • [MBS-8288] – Missing indexes for inverse lookup on *_gid_redirect tables
  • [MBS-8669] – Primary key for place table missing on old slaves
  • [MBS-8906] – Release pages ISE if CB doesn’t return JSON from its API for whatever reason
  • [MBS-8928] – If you submit the release editor without being logged in, it displays “[object Object]” as an error mesage
  • [MBS-8943] – Some pages do not respect DB_READ_ONLY setting

Improvement

  • [MBS-1873] – Fix vote tallies for edits
  • [MBS-3887] – Duplicate artist and label names not being checked against alias
  • [MBS-8287] – Log deleted entities that were in a subscribed collection
  • [MBS-8433] – Work attributes don’t have a uuid
  • [MBS-8716] – Store the edit data in a JSONB column
  • [MBS-8717] – Move the edit data to a separate table
  • [MBS-8838] – Add gids to all *_type* tables
  • [MBS-8873] – Convert and unify artist credit editors to React
  • [MBS-8909] – Add logos to IMDb and VGMdb links in the sidebar
  • [MBS-8939] – Update the Instagram logo used in the sidebar
  • [MBS-8940] – Let banner message editors dismiss the banner only temporarily

Task

  • [MBS-8656] – Bring edit table indexes back into sync
  • [MBS-8719] – Stop materializing of edit and vote counts
  • [MBS-8720] – Add a materialized view of edit note recipients
  • [MBS-8727] – Prevent duplicate votes
  • [MBS-8800] – Create the earthdistance extension and add a geodetic index for place coordinates
  • [MBS-8804] – Add BRIN indexes for timestamp columns
  • [MBS-8897] – add new entity icons
  • [MBS-8938] – Schema changes to support alternative tracklists

State of the Onion: MetaBrainz

In the past few weeks we’ve been hit with several traffic increases to MusicBrainz which is putting considerably more strain on our aging infrastructure than we’re happy with. If it seems that we’re not doing anything about it, that is because we’ve been busy behind the scenes trying to keep things moving forward. This sometimes doesn’t leave us a lot of time to keep the public informed on our work. Hopefully this blog post will fix this in the short term:

In 2011 we started to make plans to move MusicBrainz hosting into the cloud, but then out of the blue we were donated a pile of machines. There were so many machines that I postponed the cloud plans and prepared the donated machines for service. That has carried us for 4+ years with almost no hardware cost, which was really great. The plan was to move to the cloud sometime around 2015, but then I spent most of 2014/2015 dealing with conflicts in the team, putting us seriously behind schedule while our hardware decayed.

On top of that, we’ve recently had some “bad luck”. We have had some disrespectful commercial customers hit us really hard and we had to find and block them. We have had unexpected traffic spikes and when trying to address these unexpected traffic spikes, we had two more machines fail on us. These were the donated machines that we kept in reserve for just this moment. The loss of two machines caught us short on capacity to handle the increased demands on our servers.

So, now we face the tough question: Do we buy expensive hardware that we might use for 6 months (~$5000) or do we try and save the money and tough it out? I’d rather not spend so much money on such short term use if we can avoid it. We’re going to try and move to a new hosting facility somewhere in the EU, since that is where most of our users are.

Moving to a new hosting facility has an incredible number of dependencies that Christina (our Biz Dev manager), Zas and I have been working through. It may not seem like we have a plan, but we do, and we’re incredibly busy trying to make the plan happen. To give you a taste of what we’re up against:

  1. We want to move our hosting to Europe and have a business presence in Europe in order to reduce the costs and inefficiencies of being a solely US based business. A lot of our traffic, customers and contractors are in the EU and it simply makes sense to have a presence here.
  2. To establish a presence in the EU I needed local help to help with the business matters as well as researching and establishing an EU organization. So I needed to find a Biz Dev manager and that person is Christina.
  3. Once Christina was on board she researched our options about what was suited for us. Getting that process moving involved getting certified documents from California, board approval for spending funds to establish the organization, EU labor law research, (and we needed to swap a board member, too!), hiring help to establish the org. and generally navigating the Spanish bureaucracy. (See this only slightly exaggerated short film for some clues of our ordeal.)
  4. Once the org. had been established we needed to convince the bank to open a bank account for us. The draconian US banking laws extend worldwide and the local bank had to ensure that they were not opening themselves up to thousands of $$$ in accounting hassles just to allow a tiny non-profit to open a bank account. We finally have a bank account and have started paying our contractors with it!
  5. At the same time we’re also working to set up an office for the growing team here in Barcelona. That required a byzantine process that barely started when you sign the lease. Getting power, internet and water set up has taken a frustratingly long time. Had I known how long, I would have stayed at my co-working space for a while longer while addressing hosting issues.
  6. While Christina has been focused on the hardcore paperwork, Zas is keeping the site running, which itself requires many heroics. Zas and I have started planning the move to the EU hosting provider. We’ve got a 5-page document that collects some of the open questions and requirements around this process: https://docs.google.com/document/d/16KNm4KksNwz29Opk1aILOMtCmPIeXFuxxUjMoPT3th0/edit#heading=h.dpfvoz1idcro. Right now Zas and Bitmap are here in Barcelona and we’re going to work on establishing a formal plan for moving to the new hosting company. We’re currently comparing hosting company offerings – see what we’ve collected so far if you care to follow along. The amount of work required to make this happen is making my head hurt. (A special shoutout to KodeStar, lead developer of FanArt.tv, for providing a lot of useful feedback about our various options.)
  7. While Christina, Zas and I have our hands full, Bitmap and Gentlecat continue to release new features and work on the schema change. Not to mention all the contributions from Freso and Reosarevok to keep the community happy and polite while we deal with less than optimal site conditions. That said, I am really happy and proud of my team, trying to keep things running in sub-optimal conditions.

This is just a snapshot of everything that is happening behind the scenes that will culminate with the goal of moving to a new hosting company and being set up in the EU. And mind you, we’re doing this with a minuscule budget trying to be careful of how we spent our money.

Deprecating MBIDs

This post is an April Fools joke. Rest assured, we have no intention of changing the MBID system that MusicBrainz currently uses.

But, like all good parody news items, there is an element of truth behind this post. The announcement of the Echo Nest API shutdown is real, and with this change you will no longer be able to use the Echo Nest IDs to look up information. This particularly hurts users of the Million Song Dataset, which maps each track to the Echo Nest ID. The new Spotify API isn’t even providing any compatibility api or ID mapping, leaving users to look up 1 million Spotify IDs in the remaining months that the the old Echo Nest api will remain available.

At MusicBrainz, we understand the importance of a stable identifier system. That’s why, 16 years ago, we picked these unwieldy-looking UUID identifiers – that have since proven to have stood the test of time, with room to continue growing. You can look up an MBID made 16 years ago, today – and it will still work another 16 years in the future.

Hello all,

Following Echo Nest’s bold announcement that Echo Nest ids are being replaced by Spotify IDs, we figured it was time to make our own ID change public as well – MBIDs were a fantastic idea 16 years ago, but let’s face it, they’re not the most beautiful thing around, so our MBIDs will now also be replaced by Spotify IDs to help with a proper mapping across tools. Anything without a Spotify mapping will simply get purged. This should greatly simplify the data we have and remove any doubt for some releases whether they exist or not – if they’re on Spotify, they clearly exist!

We would like to commend Echo Nest on their brave leadership in this, giving us the courage to move on from our ancient heritage and try new things. With the speed technologies evolve in this digital age, it can be hard to keep up with things and keep things fresh, but Echo Nest is showing the way forward, and we’re delighted to be able to follow so quickly in their path.

I hope you all will welcome this bold move by our team. We hope to have it ready by next schema change. We know we’re excited! 😀

PS. No, we will not provide a mapping between MBIDs and the new Spotify IDs. We trust our data users to be capable to set things up on their own. Happy hacking! 🙂

Upgrading Postgres for MusicBrainz Live Data Feed users

We’re slowly approaching that time of year: Schema change release time. After skipping our fall update to focus on some internal tasks, we’re ready to have another schema change release in the spring: May 16, 2016

We have started the process to collect features we wish to release for this schema change release and we’ll be publishing that list in the coming weeks. However, we’re contemplating the impact of one more change we’d like to make: Upgrading to a more recent version of Postgres.

Internally we are going upgrade to Postgres 9.5, which was recently released, so we expect that the Postgres team will have worked out the most significant kinks before we’re ready to move to it. However, even though we are moving to 9.5, we are considering the impact on our downstream users/customers who need to make the same or similar change.

While we are moving to version 9.5 of Postgres, we have the option of only adopting features from Postgres 9.4, which means that our downstream users may continue to use Postgres 9.4. However, Postgres 9.5 has some nice features we’d like to use (e.g. UPSERT), so we’re pondering if it is possible for us to require Postgres 9.5 from all of ours Live Data Feed users starting on May 16, 2016. 

We have already informally queried a few of ours users and so far it seems that requiring Postgres 9.5 is feasible. If you are a Live Data Feed user and feel that this requirement of Postgres 9.5 is too much for your and your organization by May 16, 2016, please leave a comment to this blog post!

Schema change release, 2015-05-18 (including upgrade instructions)

Our previously mentioned schema change release is finished! Below will be upgrade instructions, including configuration updates for replication access tokens.

This release does not include UI for several of the schema change patches, which will (hopefully) happen for next release on June 1. The incomplete patches are MBS-7489 (credits for artists in relationships), MBS-4145 (tag upvote/downvote), and MBS-8004 (collections for additional entity types). These patches have had their schema change components finished, but the UI was incomplete or needed more work.

Schema Change Upgrade Instructions

These are largely as previous upgrade instructions, using the tag v-2015-05-18-schema-change. The primary difference is the inclusion of configuring an access token for replication.

  1. Make sure your REPLICATION_TYPE setting is RT_SLAVE and your DB_SCHEMA_SEQUENCE is set to 21 in lib/DBDefs.pm. If you’re running a standalone server, you can run the upgrade, but it may be easier to just import a new data dump!
  2. Ensure you’ve replicated up to the most recent replication packet available with the old schema. (if you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to update, it should say “Mismatched schema sequence, 21 (database) vs 22 (replication packet)”).
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you are automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout v-2015-05-18-schema-change
  6. Run ./upgrade.sh (or carton exec -Ilib -- ./upgrade.sh if you’re using carton, with very old setups).
  7. Run cpanm --installdeps --notest . to ensure your perl-based dependencies are up to date. This release adds a dependency on LWP::Protocol::https, for fetching replication packets from the new server; many systems may already have this installed, but it should be verified.
  8. Set DB_SCHEMA_SEQUENCE to 22 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh
  9. Assuming you have been updating your server with replication, it will now be necessary to configure an access token:
    1. Go to https://metabrainz.org/supporters/account-type and choose your account type as applicable. If you’re an individual, non-commercial user of the data, choose “non-commercial”; if not, choose an applicable tier in the “commercial” section. If you’re not sure of the appropriate tier, make your best guess; it can be adjusted if necessary.
    2. Then, from https://metabrainz.org/profile, create an access token, which should be a 40-character random alphanumeric string provided by the site.
    3. Finally, add this token to lib/DBDefs.pm under the REPLICATION_ACCESS_TOKEN configuration option. The final configuration section should look something like sub REPLICATION_ACCESS_TOKEN { "ck3UpgwgOXhWC6SpFcd99rZOTjzfrei3gQlgZZ9z" }.
    4. Don’t reveal your access token! If you do, inadvertently, you can use the MetaBrainz site to generate a new token, invalidating the old one. (The one in the example above is one I created for myself and then invalidated — don’t get any ideas, it won’t work!)
  10. Turn cron jobs back on, if applicable.
  11. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart memcached.

Finally, the list of bugs closed this release:

Bug

  • [MBS-4436] – Medium titles cannot be longer than 255 charaters

Improvement

  • [MBS-1347] – Implement aliases for release groups, releases and recordings
  • [MBS-7906] – maybe don’t show “”≠null diff. in edit pages
  • [MBS-8279] – Remove empty_artists etc. database functions

New Feature

  • [MBS-8302] – Add Live Data Feed access token support

Task

  • [MBS-8266] – Make medium titles VARCHAR NOT NULL
  • [MBS-8278] – Update DB_SCHEMA_SEQUENCE in DbDefs.pm.sample
  • [MBS-8283] – Remove DB constraint that disallows empty event names

Not included in this list but also relevant is MBS-8349, which while fixed for a previous release, in this release is also applied to old slave servers, which may help performance for some queries.

Picard 1.3.2 released

Picard 1.3.2 is now available on the Picard download page. This again is only a small maintenance release which mainly updates the OS X build to fix crashes some users experienced with Picard 1.3.1. Here is the complete list of fixes:

Bug

  • [PICARD-681] Fixed tags from filename dialog not opening on new installations
  • [PICARD-682] Picard 1.3.1 crashes on OSX on start

For Ubuntu users we have also finally updated our stable PPA.

Picard 1.3.1 released

This is a maintenance release, mostly bug fixes. All users are encouraged to upgrade.

Binary packages are available at http://picard.musicbrainz.org/downloads/

Bug

  • [PICARD-273] – Picard should use the correct Accept header when talking to web services.
  • [PICARD-589] – Picard refuses to load files if any path component happens to be hidden
  • [PICARD-642] – ConfigUpgradeError: Error during config upgrade from version 0.0.0dev0 to 1.0.0final0
  • [PICARD-649] – Windows installer sets working directory to %PROGRAMFILES%\MusicBrainz Picard\locale
  • [PICARD-655] – Last.fm plus tooltip help elements are all messed up
  • [PICARD-661] – Regression: Tagger script for cover art filename does not work anymore
  • [PICARD-662] – Retrieving collections causes AttributeError: release_list
  • [PICARD-663] – Artist name makes it impossible to save

Improvement

  • [PICARD-658] – Support the new pregap and data tracks
  • [PICARD-659] – Set the originalyear tag when loading a release
  • [PICARD-665] – Web service calls to ports 80 and 443 do not need explicit port specification. 443 should be automatically made https.

Schema change upgrade instructions, schema 21

This upgrade shouldn’t be substantially different than past upgrades, now that we’ve fixed a few bugs with the process. To upgrade:

  1. Make sure your REPLICATION_TYPE setting is RT_SLAVE and your DB_SCHEMA_SEQUENCE is set to 20 in lib/DBDefs.pm.
  2. Ensure you’ve replicated up to the most recent replication packet available with the old schema. (if you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you).
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you are automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout schema-change-20-to-21
  6. Run ./upgrade.sh (or carton exec -Ilib -- ./upgrade.sh if you’re using carton, with very old setups).
  7. Set DB_SCHEMA_SEQUENCE to 21 in lib/DBDefs.pm
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart memcached.

That’s it! The only real difference from the past is the specific tag to be used: schema-change-20-to-21, which is a couple of fix-up commits past the regular release tag.