Schema change release: May 13, 2019

It’s been a while since our last schema change release in May 2017. To move forward on some features we’d like to add, we plan to have a Spring 2019 schema change release set for May 13th, 2019. This release should not be disruptive to downstream users, as we plan only to augment the schema with some new tables and columns, and not break any of the existing schema.

Here’s our list of tickets for the Spring 2019 schema change, with descriptions of what’s being changed:

MBS-1658: Add a free text field to collection items. This change will allow users of collections to annotate each item with free text, to hold personalized info about the item (for example, that a release is of a specific pressing, was purchased in 1999 at the Village Discount, or was a gift from your mom). This ticket will add a new TEXT column to editor_collection_release and all other entity-specific editor_collection_* tables. It will not add any new tables or modify any existing columns.

MBS-5387: Mark artist credits as having pending edits when they’re being edited. On an artist’s aliases tab, we provide the ability to directly edit artist credits associated with that artist. However, doing so doesn’t indicate anywhere that the artist credits are being changed (we typically highlight entities with pending edits). To resolve this, we’ll be adding an INTEGER edits_pending column to the artist_credit table. This ticket will not change any existing columns of the artist_credit or artist_credit_name tables.

MBS-5818: Make it possible to have ordered collections. Editor’s collections of entities are automatically ordered by field, for example, releases in collection are ordered by ascending release date. Sometimes, one might want to order collector’s items by other criteria, for example, most wanted releases. This change will enable editors to order collection items by hand if they want to. To do so, an INTEGER position column will be added to the editor_collection_* tables.

MBS-7480: Store cover art image file sizes. Knowing file sizes for cover images and their thumbnails would allow us to better detect when a thumbnail isn’t available, and also allow us to display file sizes to the user before they download an image. To do this, we’ll add four INTEGER columns to the cover_art_archive.cover_art table to store the file sizes in bytes: filesize, thumb_250_filesize, thumb_500_filesize, thumb_1200_filesize.

MBS-9428: Allow multiple users to share one collection. In some cases (like a collection of cleaned up entities several people want to keep an eye on, or a radio station’s collection of vinyl the station owns), it would be quite useful if multiple editors could add (and remove) entries from a collection. This would make cooperation easier and hopefully make community projects (such as the Composer Diversity Database project) easier to start and work on. We’ll add an editor_collection_collaborator table linking collections to the editors allowed to make changes in them (only the collection owner will be able to make changes to the list of allowed collaborators).

MBS-9491: Move hard-coded genres to a database table. We recently added genres to MusicBrainz, but they’re currently stored as an object in the server code. This change will move them to a new table genre (id, gid, name, comment).

MBS-10062: Add aliases for genres. Connected to the previous issue, we need a way to be able to specify “hiphop”, “hip hop” and “hip-hop” are all the same thing, and eventually to store translated versions of genres. This will add a table genre_alias (id, genre, name, locale, edits_pending, last_updated, primary_for_locale).

MBS-9973: Add a date added column for collection items. Editor’s collections have no editing history, thus it doesn’t allow to sort items by date of addition to the collection they belongs to. To allow this, editor_collection_* tables will get a TIMESTAMP added column.

MBS-10052: Add new tables for the event poster archive. This will move us another step toward CAA-84, giving us a place to store event posters, logos, and other related images. The schema change here will be to add a new event_art_archive schema, detailed in the comments in MBS-10052. There will be no change to the existing cover_art_archive schema.

The following tickets will also be included, but only involve adding some missing foreign keys, triggers, and constraints to standalone mirrors; these will not be created and have no effect on replicated mirrors.

MBS-9365: Adds a missing foreign key between the event_meta and event tables.

MBS-9462: Adds some missing l_event_url triggers to delete unused URLs.

MBS-9664: Adds constraints to prevent an entity from linking to itself in a relationship.

If you have any questions, please do leave a comment below or on the linked JIRA tickets!

No Spring 2018 schema change

We recently decided not to have a spring 2018 schema change release. As usual, we still have some bits left over to finish up from the last spring schema change. More importantly, we’re making a concerted effort to improve the user experience (UX) of the MusicBrainz site — more on that in a blog post later.

We may decide to do an autumn 2018 schema change, but this depends on how well our UX efforts progress over the course of winter and spring.

Schema change release, 2017-05-15 (including upgrade instructions)

We’re happy to announce the release of our May 2017 schema change today! Thanks to all who were patient during today’s downtime as we released everything to our production servers.

This is a fairly minor release as far as schema changes go, but please do report any issues that you come across.

Currently, the only visible change for editors is the ability to add multiple lyrics languages to works. We’ve also modified the schema to support dynamic attributes for entities other than works, but the UI for that won’t be complete for another release or two.

Now, on to the instructions.

Schema Change Upgrade Instructions

Note: Importing the latest data dump is always a valid alternative to running ./upgrade.sh on an existing database, if you’d prefer to also get new data in one go. Just follow the relevant instructions in INSTALL.md. The rest of the instructions here assume an in-place upgrade.

  1. Make sure DB_SCHEMA_SEQUENCE is set to 23 in lib/DBDefs.pm.
  2. If you’re using the live data feed (your REPLICATION_TYPE is set to RT_SLAVE), ensure you’ve replicated up to the most recent replication packet available with the old schema. If you’re not sure, run ./admin/replication/LoadReplicationChanges and see what it tells you; if you’re ready to upgrade, it should say “This replication packet matches schema sequence #24, but the database is currently at #23.”
  3. Take down the web server running MusicBrainz, if you’re running a web server.
  4. Turn off cron jobs if you’re automatically updating the database via cron jobs.
  5. Switch to the new code with git fetch origin followed by git checkout v-2017-05-15-schema-change.
  6. Run cpanm --installdeps --notest . (note the dot at the end) to ensure your perl-based dependencies are up to date.
  7. Downgrade DBD::Pg by running cpanm TURNSTEP/DBD-Pg-3.5.3.tar.gz (version 3.6.0 breaks things currently).
  8. Run ./upgrade.sh (it may take a while to vacuum at the end).
  9. Set DB_SCHEMA_SEQUENCE to 24 in lib/DBDefs.pm as instructed by the output of ./upgrade.sh.
  10. Turn cron jobs back on, if applicable.
  11. Restart the MusicBrainz web server, if applicable. It’s also recommended you restart redis. If you’re accessing your MusicBrainz server in a web browser, run npm install followed by ./script/compile_resources.sh.

For those curious, here’s the list of resolved tickets (excluding MBS-8393):

Bug

New Feature

  • [MBS-9271] – Prevent usernames from being reused

Task

  • [MBS-9273] – Fix the a_ins_edit_note function in older setups to not populate edit_note_recipient for own notes
  • [MBS-9274] – Fix the edit_note_idx_post_time_edit index in older setups to handle NULL post_time

Improvement

  • [MBS-5452] – Support multiple lyric language values for works

Schema change release: Today at 17h UTC

We’re going to start our schema change release process today at 17h UTC.

We anticipate having a short downtime of a few minutes as we”ll need to restart our database server. As usual, we’re not certain when we will start the downtime, but we’ll keep people posted about our progress in IRC and on Twitter.

Once we’re done with the release we will post instructions on this blog on how to upgrade any replicated instances of MusicBrainz you might be running.

Stay tuned!

May 2017 Schema Change Release: May 15, 2017

We have picked our set of tickets and the date for our May 2017 schema change release: May, 15th 2017. This will be a fairly standard and minor schema change release — we’re only tackling 3 tickets that affect downstream users and no other infrastructure changes.

Take a look at our  list of tickets for this schema change release. There really are only two tickets that will affect most of our downstream users:

  • MBS-8393: “Extend dynamic attributes to all entities” Currently our works have the concept of additional attributes which allows the community to decide which sorts of new attributes to apply to a work. (e.g. catalog numbers, rhythmic structures, etc) This ticket will implement these attributes to all of our entities. Also, this ticket will not change any of the existing database tables, it will only add new tables.
  • MBS-5452: “Support multiple lyric language values for works” Currently only one language or the special case “multiple languages” may be used to identify the language used in lyrics. This ticket allows more than one language to be specified for lyrics of a work.

The following tickets are special cases — they will not really affect our downstream users who do not have edit data loaded into their system. We are only including this change at the schema change release time in order to bring some older replicated systems up to date. If you do not use the edit data, then please ignore these tickets.

  • MBS-9271: “Prevent usernames from being reused” This ticket does not change the schema, but for sake of minimizing downstream disruption, we’re going to carry out this ticket during the schema change.
  • MBS-9274: “Fix the edit_note_idx_post_time_edit index in older setups to handle NULL post_time” This ticket fixes an SQL index on an edit related table.
  • MBS-9273: “Fix the a_ins_edit_note function in older setups to not populate edit_note_recipient for own notes” This ticket also fixes an SQL index on an edit related table.

This is it — really minor this time around. If you have any questions, feel free to post them in the comments or on the tickets themselves.

 

Schema change release: What happened?

Now that we’ve finally finished the schema change release, I wanted to give an account of what happened in this arduous process. Before I dive into the details, I want to offer a picture that best sums up our current situation and challenges:

personal-container-mngmnt3

The shipping container is MusicBrainz and the boat is our hosting infrastructure. This picture perfectly describes the sort of challenges we’ve faced over the past few days. 🙂

Here is what happened:

Because the site was recently running slow and our search servers kept crashing, Zas and I were not available to help Bitmap prepare for the schema change release. This long process was left to Bitmap and Gentlecat to take care of on their own. We quickly realized that we were not ready for the release when the due date came and thus we delayed one week.

Sunday 22 May

Finally we were ready to proceed with the Postgres 9.5 upgrade. Once we started the process, we kept running into small problems that we didn’t get in our test setups. We do not have access to enough infrastructure to have a complete clone of our production environment, so we can only do so much to prepare for all the things that might happen when we run upgrades on our production servers.

All the while we attempted to start the upgrade, our backup database server was running much slower than anticipated. In the end we figured out that a step for optimizing the database (analyzing it) wasn’t carried out. During this time the site was really slow/unusable, but by the time the problem became apparent we had started the upgrade and could not turn back.

Once the upgrade was done, optimizing the database took much much longer than usual: 3 hours! This process wasn’t started until about 1am local time, which made for a very long night before that process finished. And even then we hit snags and had to start over a couple of times. At about 4:30am we had the site running on Postgres 9.5 in read only mode. The plan was to rest and start the schema change release in the morning.

Monday 23 May

Of course we had spent all of our time working on the Postgres upgrade and site stability, so our document that we use to plan the schema change was not in place. We spent the day preparing this and other bits for the release. To get an appreciation for what this document looks like, have a look! Note that some steps could be instant, others might take hours to carry out. Others might involve a sub-step or 20 not included in the document.

In the evening we were ready to make the change. By this point our backup DB was performing much better, so the read-only site worked acceptably. Thus, we started the release. Overall, the actual release process was reasonably smooth – we hit a few snags and had to do a lot of waiting for our slow servers. At about 1am in the morning things were finally complete. We proceeded with our sanity checks to make sure things went smoothly and all of them passed.

We proceeded to put the site into read-write mode and immediately saw portions of Postgres crashing, which is really bad. With community feedback we quickly deduced that some write operations were causing Postgres back-end processes to crash. We went back to read-only mode on the site and things stabilized and we finally went to bed at 3am.

Tuesday 24 May

In the morning we quickly found the source of database trouble with the help from the Postgres people on IRC. Thanks for the swift help Johto! We found that the steps for installing the updated third party extensions into Postgres had not completed correctly. Repeating the steps by hand fixed this problem.

Sadly yesterday morning we got an email informing us that our Live Data Feed replication stream had become corrupted. 😦 This was heartbreaking news to us, since it means a great inconvenience to all of our Live Data Feed users. We immediately split into two teams: Zas, chirlu and myself to fix the root cause of the issue and Bitmap to investigate fixing the stream.

I proceeded to setup a test environment was able to quickly reproduce the problem. Zas and chirlu were an amazing support team Googling issues as I came across them. Within fairly short time we fixed the problem and deployed the fix to our database server. The problem was caused by a bug in a piece of code that we’ve been using for 13 years! A change in Postgres caused this bug to actually become a problem and corrupt our replication feed. 😦

Once the problems were fixed we needed to initiate a new data dump and check to make sure the replication stream is working correctly. Of course we found a problem that we fixed and re-started the process to dump the data. Loads of hurry-up-and-wait situations to try our patience!

When we were satisfied that things were working correctly we re-enabled the site as read-write at about 1am and allowed people to continue editing. Exhausted we stumbled into bed waiting for data dumps to sync out to the FTP site.

Wednesday 25 May

Today Bitmap was flying home and as soon as WiFi became available on his flight he started working and helping with putting the schema change to bed. We’ve verified that everything is working as expected. At last this saga comes to and end and we can all take a break and catch up on sleep!

Thank you for your patience through all of this.