Upgrading Postgres for MusicBrainz Live Data Feed users

We’re slowly approaching that time of year: Schema change release time. After skipping our fall update to focus on some internal tasks, we’re ready to have another schema change release in the spring: May 16, 2016

We have started the process to collect features we wish to release for this schema change release and we’ll be publishing that list in the coming weeks. However, we’re contemplating the impact of one more change we’d like to make: Upgrading to a more recent version of Postgres.

Internally we are going upgrade to Postgres 9.5, which was recently released, so we expect that the Postgres team will have worked out the most significant kinks before we’re ready to move to it. However, even though we are moving to 9.5, we are considering the impact on our downstream users/customers who need to make the same or similar change.

While we are moving to version 9.5 of Postgres, we have the option of only adopting features from Postgres 9.4, which means that our downstream users may continue to use Postgres 9.4. However, Postgres 9.5 has some nice features we’d like to use (e.g. UPSERT), so we’re pondering if it is possible for us to require Postgres 9.5 from all of ours Live Data Feed users starting on May 16, 2016. 

We have already informally queried a few of ours users and so far it seems that requiring Postgres 9.5 is feasible. If you are a Live Data Feed user and feel that this requirement of Postgres 9.5 is too much for your and your organization by May 16, 2016, please leave a comment to this blog post!

Schema change release, 2015-05-18 (including upgrade instructions)

Our previously mentioned schema change release is finished! Below will be upgrade instructions, including configuration updates for replication access tokens.

This release does not include UI for several of the schema change patches, which will (hopefully) happen for next release on June 1. The incomplete patches are MBS-7489 (credits for artists in relationships), MBS-4145 (tag upvote/downvote), and MBS-8004 (collections for additional entity types). These patches have had their schema change components finished, but the UI was incomplete or needed more work.

Schema Change Upgrade Instructions

These are largely as previous upgrade instructions, using the tag v-2015-05-18-schema-change. The primary difference is the inclusion of configuring an access token for replication.

  1. Make sure your REPLICATION_TYPE setting is RT_SLAVE and your DB_SCHEMA_SEQUENCE is set to 21 in lib/DBDefs.pm. If you’re running a standalone server, you can run the upgrade, but it may be easier to just import a new data dump!
  2. Ensure you’ve replicated up to the most recent replication packet available with the old schema. (if you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to update, it should say “Mismatched schema sequence, 21 (database) vs 22 (replication packet)”).
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you are automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout v-2015-05-18-schema-change
  6. Run ./upgrade.sh (or carton exec -Ilib -- ./upgrade.sh if you’re using carton, with very old setups).
  7. Run cpanm --installdeps --notest . to ensure your perl-based dependencies are up to date. This release adds a dependency on LWP::Protocol::https, for fetching replication packets from the new server; many systems may already have this installed, but it should be verified.
  8. Set DB_SCHEMA_SEQUENCE to 22 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh
  9. Assuming you have been updating your server with replication, it will now be necessary to configure an access token:
    1. Go to https://metabrainz.org/supporters/account-type and choose your account type as applicable. If you’re an individual, non-commercial user of the data, choose “non-commercial”; if not, choose an applicable tier in the “commercial” section. If you’re not sure of the appropriate tier, make your best guess; it can be adjusted if necessary.
    2. Then, from https://metabrainz.org/profile, create an access token, which should be a 40-character random alphanumeric string provided by the site.
    3. Finally, add this token to lib/DBDefs.pm under the REPLICATION_ACCESS_TOKEN configuration option. The final configuration section should look something like sub REPLICATION_ACCESS_TOKEN { "ck3UpgwgOXhWC6SpFcd99rZOTjzfrei3gQlgZZ9z" }.
    4. Don’t reveal your access token! If you do, inadvertently, you can use the MetaBrainz site to generate a new token, invalidating the old one. (The one in the example above is one I created for myself and then invalidated — don’t get any ideas, it won’t work!)
  10. Turn cron jobs back on, if applicable.
  11. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart memcached.

Finally, the list of bugs closed this release:

Bug

  • [MBS-4436] – Medium titles cannot be longer than 255 charaters

Improvement

  • [MBS-1347] – Implement aliases for release groups, releases and recordings
  • [MBS-7906] – maybe don’t show “”≠null diff. in edit pages
  • [MBS-8279] – Remove empty_artists etc. database functions

New Feature

  • [MBS-8302] – Add Live Data Feed access token support

Task

  • [MBS-8266] – Make medium titles VARCHAR NOT NULL
  • [MBS-8278] – Update DB_SCHEMA_SEQUENCE in DbDefs.pm.sample
  • [MBS-8283] – Remove DB constraint that disallows empty event names

Not included in this list but also relevant is MBS-8349, which while fixed for a previous release, in this release is also applied to old slave servers, which may help performance for some queries.

Schema change upgrade instructions, schema 21

This upgrade shouldn’t be substantially different than past upgrades, now that we’ve fixed a few bugs with the process. To upgrade:

  1. Make sure your REPLICATION_TYPE setting is RT_SLAVE and your DB_SCHEMA_SEQUENCE is set to 20 in lib/DBDefs.pm.
  2. Ensure you’ve replicated up to the most recent replication packet available with the old schema. (if you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you).
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you are automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout schema-change-20-to-21
  6. Run ./upgrade.sh (or carton exec -Ilib -- ./upgrade.sh if you’re using carton, with very old setups).
  7. Set DB_SCHEMA_SEQUENCE to 21 in lib/DBDefs.pm
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart memcached.

That’s it! The only real difference from the past is the specific tag to be used: schema-change-20-to-21, which is a couple of fix-up commits past the regular release tag.

Schema change release, 2011-07-11

Today we released our first schema change update since NGS. This change is quite a radical one, as it merges both of our databases (“READWRITE” and “RAWDATA”) into a single database. For most users of the database, this probably won’t affect you, but you’re encouraged to run the upgrade process anyway. Here’s what you need to do:

  1. Take down the web server running MusicBrainz, if you’re running a web server.
  2. Turn off cron jobs if you are automatically updating the database via cron jobs.
  3. Set DB_SCHEMA_SEQUENCE to 13 in lib/DBDefs.pm
  4. Make sure your REPLICATION_TYPE setting is RT_SLAVE
  5. Switch to the new code with git fetch origin followed by git checkout v-20110711-schema-change
  6. Run ./upgrade.sh from the top of the source directory.
  7. Install the perl modules Algorithm::Merge and Algorithm::Diff
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if needed.

This process may take a while, as it has to dump one database into another, and download a few extra changes to ensure slaves aren’t missing any data. The RAWDATA database should no longer be in use and you should be able to drop it, but waiting to see that everything is working well might be a good idea.

This schema change does not introduce any new data. For everyone else, here’s a list of what got fixed since the last release!

Bug

  • [MBS-1977] – ModBot is unable to close some edits
  • [MBS-1979] – Unable to edit a “later translated versions” relationship with change direction
  • [MBS-2026] – Subscribed artists open edits won’t load for editors with large amounts of subscriptions
  • [MBS-2442] – MB postgres unaccent extension overwrites unaccent.so library shipped with postgres-contrib
  • [MBS-2689] – Timeline isn’t working in Opera 10
  • [MBS-2698] – Some tables are not replicated through Live data feed
  • [MBS-2812] – Adding track times has caused a failed dependency for an edit which changed the track titles.
  • [MBS-2826] – Web service returns malformed XML (not escaped properly)
  • [MBS-2831] – “Edit medium” edit display: Artist credits changes are incorrect
  • [MBS-2974] – I don’t receive e-mail when someone votes no to my edit any more
  • [MBS-2981] – Changing the case of recording comments should be auto-edits
  • [MBS-2986] – tracklist_index was not populated during NGS
  • [MBS-2995] – Caught exception in MusicBrainz::Server::Controller::WS::2::ReleaseGroup->release_group_browse “Can’t call method “format”

Improvement

  • [MBS-1500] – Remove tracklist_index.tracks
  • [MBS-1707] – Advanced tracklist in RE: “Title” to “Disc title”
  • [MBS-2242] – Disable editing of Medium title when there’s only one medium
  • [MBS-2434] – Combine READWRITE and RAWDATA
  • [MBS-2462] – Other edit types that don’t highlight what has changed between old and new values
  • [MBS-2583] – “Edit Medium” should show you which medium is being edited with respect to the overall release
  • [MBS-2767] – Release-group XML result should include first release date

Next Generation Schema Release Candidate 1: Monday January 17th

As of today we can finally see the light at the end of the tunnel! We feel confident that we can finish all new features (and most improvements as listed in jira) by January 17th. Once we reach RC1 we’re going to freeze the features for NGS and only work to fix bugs in our codebase.

That said, we hope to release NGS onto the main servers sometime in February.

Its been a very long road to NGS, but it is finally tangibly close. I’m getting excited!