Migration to TimescaleDB complete!

Yesterday I posted about why we decided to make the switch to TimescaleDB and then later in the day we actually made the switch!

We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.

In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.

The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!

Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!

ListenBrainz moves to TimescaleDB

The ListenBrainz team has been working hard on moving our primary listen store from InfluxDB to TimescaleDB, and today at UTC 16:00 we’re going to make the switch.

We were asked on Twitter as to why we’re making the switch — and in the interest of giving a real world use case for switching, I’m writing this post. The reasons are numerous:

Openness: InfluxDB seems on a path that will make it less open over time. TimescaleDB and its dependence on Postgres makes us feel much safer in this regard.

Existing use: We’ve been using Postgres for about 18 years now and it has been a reliable workhorse for us. Our team thinks in terms of Postgres and InfluxDB always felt like a round peg in a square hole for us.

Data structure: InfluxDB was clearly designed to store server event info. We’re storing listen information, which has a slightly different usage pattern, but this slight difference is enough for us to hit a brick wall with far fewer users in our DB than we ever anticipated. InfluxDB is simply not flexible enough for our needs.

Query syntax and measurement names: The syntax to query InfluxDB is weird and obfuscated. We made the mistake of trying to have a measurement map to a user, but escaping measurement names correctly nearly drove one of our team members to the loonie bin.

Existing data: If you ever write bad data to a measurement in InfluxDB, there is no way to change it. I realize that this is a common Big Data usage pattern, but for us it represented significant challenges and serious restrictions to put simple features for our users into place. With TimescaleDB we can make the very occasional UPDATE or DELETE and move on.

Scalability: Even though we attempted to read as much as possible in order to design a scalable schema, we still failed and got it wrong. (I don’t even think that the docs to calculate scalability even existed when we first started using InfluxDB.) Unless you are using InfluxDB in exactly the way it was meant to be used, there are chances you’ll hit this problem as well. For us, one day insert speed dropped to a ridiculously low number per second, backing up our systems. Digging into the problem we realized that our schema design had a fatal flaw and that we would have drastically change the schema to something even less intuitive in order to fix it. This was the event that broke the camel’s back and I started searching for alternatives.

In moving to TimescaleDB we were able to delete a ton of complicated code and embrace a DB that we know and love. We know how Postgres scales, we know how to put it into production and we know its caveats. TimescaleDB allows us to be flexible with the data and the amazing queries that can be performed on the data is pure Postgres love. TimescaleDB still requires some careful thinking over using Postgres, it is far less than what is required when using InfluxDB. TimescaleDB also gives us a clear scaling path forward, even when TimescaleDB is still working on their own scaling roadmap. If TimescaleDB evolves anything like Postgres has, I can’t wait to see this evolution.

Big big thanks to the Postgres and TimescaleDB teams!

Picard 2.4 Beta 2

Following the first Picard 2.4 beta we have released Picard 2.4 beta 2 to address a couple of reported issues. Thanks a lot to everyone who tested the last beta and reported issues. The following bugs have been fixed in beta 2:

  • [PICARD-1864] – Adding single files does ignore existing MBIDs
  • [PICARD-1866] – Coverart pane does not update during / after saving files
  • [PICARD-1867] – Guess format fallback is broken
  • [PICARD-1868] – CAA type selection dialog does not translate “Unknown”

See our previous blog post about Picard 2.4 beta 1 for the changes since the last stable release.

Picard 2.4 beta 2 is available for download from the download page.

Please report bugs on the Picard issue tracker and provide feedback in the community forums.

Please also help translate Picard. There have been many changes to the user interface and existing translations need to be updated for the final 2.4 release. Translating is easy and can be done online: Head over to MusicBrainz’s translation page on Transifex and click on “Help Translate MusicBrainz”.
Once you have registered an account on Transifex you can start translating. For Picard the primary resource to translate is “picard“, but there is also the “picard_appstream” resource which is used for providing descriptions for various Linux software-center applications.

Picard 2.4 Beta 1

Picard 2.4 Beta 1 is now available. There have been some important changes and we would like to gather feedback with this beta release before releasing the final Picard 2.4.

This release contains code changes by Gabriel Ferreira, Laurent Monin, Bob Swift, Philipp Wolfer, RaysDev, Wieland Hoffmann and new contributors Adam James and jcaesar.

Thanks a lot to everybody who contributed to this release with code, translations, bug reports and general feedback.

What’s new?

The most notable change in this release are significant performance improvements when handling large amount of files thanks to the excellent work of Gabriel Ferreira.

We would also like to get some feedback on the new scripting auto completion feature and the scripting documentation provided directly inside Picard. Windows 10 users can also try Picard’s new support for Windows 10 dark mode.

Here is the full list of changes:

Bugfixes

  • [PICARD-1753] – Fix font size of script editor and log view on Windows
  • [PICARD-1807] – Wrong error handling when using python-libdiscid
  • [PICARD-1813] – $title function throws error on empty value
  • [PICARD-1820] – PLUGIN_VERSION no longer displayed correctly in plugins dialog
  • [PICARD-1823] – Genre tag ordering is non-deterministic
  • [PICARD-1826] – “no appropriate stream found” when saving .ogg (OPUS) file
  • [PICARD-1838] – Files with a .dff file extension are interpreted as DSF files and fail to load
  • [PICARD-1853] – Crash if tags contain null character
  • [PICARD-1855] – Relationships not tagged for non-album track
  • [PICARD-1859] – “ValueError: Invalid literal” followed by crash when opening certain files

New Features

  • [PICARD-1704] – Support Windows 10 dark mode
  • [PICARD-1797] – Autocompletion for script functions and variables
  • [PICARD-1798] – Add support for inline translatable script documentation

Improvements

  • [PICARD-824] – Expand all option submenus by default
  • [PICARD-920] – Remember selected options page
  • [PICARD-1117] – Instrumental recordings of a work should set language to “No lyrics”
  • [PICARD-1796] – Consider release date when matching files to releases
  • [PICARD-1805] – Make it easier to add the first script
  • [PICARD-1818] – Make PyQt5.QtDBus optional
  • [PICARD-1829] – Add support for disc numbers in cluster Info dialog tracklists
  • [PICARD-1831] – Mitigate performance impacts of file selection and UI updates during processing
  • [PICARD-1840] – Instrumental recordings of a work should drop the lyricist credit
  • [PICARD-1842] – AIFF and DSF: Add support for albumsort, artistsort, titlesort and discsubtitle
  • [PICARD-1843] – Improve load and clustering performance
  • [PICARD-1844] – Further improve loading and clustering performance
  • [PICARD-1845] – Add “lookup in browser” for musicbrainz_discid tag in metadata view
  • [PICARD-1846] – Metadata.unset should not raise KeyError
  • [PICARD-1847] – Restructure tag compatibility options
  • [PICARD-1852] – Make about a separate dialog
  • [PICARD-1854] – Improve sorting performance in main window
  • [PICARD-1856] – Use pgettext function in Python 3.8

Download

Picard 2.4 beta 1 is available for download from the download page.

Helping out

The easiest way to help us getting a great Picard 2.4 release is using and testing this release candidate. Please report bugs on the Picard issue tracker and provide feedback in the community forums.

Please also help translate Picard. There have been many changes to the user interface and existing translations need to be updated for the final 2.4 release. Translating is easy and can be done online: Head over to MusicBrainz’s translation page on Transifex and click on “Help Translate MusicBrainz”.
Once you have registered an account on Transifex you can start translating. For Picard the primary resource to translate is “picard“, but there is also the “picard_appstream” resource which is used for providing descriptions for various Linux software-center applications.

If you are a software developer you are very welcomed to provide fixes and features. Picard is free software and the source code is available on GitHub. See Developing on the Picard website to get started.

MusicBrainz Server update, 2020-06-29

The React conversion is back with this release. We’ve also fixed a regression that listed unrelated “recording of” relationship edits in history of artists, recordings, and releases, and made a lot of people quite frustrated during the last two weeks (sorry about that!). New edits won’t be wrongly listed anymore, and the existing wrongly-listed edits will be progressively unlisted during the following days. Finally, two new data reports have been created, and small display improvements have been made.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server. See the release notes for update instructions.

Thanks to loujin for contributing code. Thanks to DjSlash, hibiscuskazeneko, insolite, jesus2099, and mavit for having reported bugs and suggested improvements. Thanks to Atsushi Nakamura, kellnerd and salorock for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-29.

Bug

  • [MBS-10825] – Normalization of some Muziekweb links make them invalid
  • [MBS-10908] – Unrelated “recording of” relationship edits for works show up in history of artists, recordings, and releases

New Feature

  • [MBS-10770] – Report on relations with dates in the future
  • [MBS-10895] – Report for mediums with conflicting discID

Improvement

  • [MBS-10736] – Add autoselect + sidebar for Napster URLs
  • [MBS-10802] – Right-align columns of Reorder Relationship table
  • [MBS-10894] – Show label code after selecting label in the release editor
  • [MBS-10903] – Update Operabase cleanup and validation
  • [MBS-10907] – Define background color for header and footer too

React Conversion Task

  • [MBS-10777] – Convert Add/Remove Relationship edits to React
  • [MBS-10801] – Convert Reorder Relationships edit to React

Introducing the BookBrainz merging tool

Today we come with a big BookBrainz website update that allows you to merge duplicate entities!

Being able to clean up the database is an essential step towards importing public bibliographic records and catalogs from partner websites. As with MusicBrainz, you can visit an entity page on BookBrainz and click on a button to add an entity to a merge queue. You can merge multiple entities in one go easily.

BookBrainz merge queue

After clicking the merge button you will be presented with a page that lets you review and select the correct information in case of conflicting data. The revision history of merged entities is preserved, and in the near future you’ll be able undo merges.

BookBrainz merge page

Your feedback is very welcome! We also have a short tutorial on how to use the new merge tool for the curious.

This latest website update also adds annotations for any information that does not fit into the existing format, some small design improvements and bug fixes.

We’ve also added the ability to search for users on the search page. This last feature will come in handy soon as we introduce collaborative User Collections; stay tuned!

MusicBrainz Server update, 2020-06-15

Today’s release focuses on stability with almost half of changes being bugfixes. It also provides a fair number of small improvements and new features. On a side note, the third party affiliated Magic Tagger is now known as AudioRanger.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server and to fix a few issues in both standard setup and development setup. See the release notes for update instructions.

Thanks to Cyna, Freso, loujin for contributing code. Thanks to chaban, chiark, jesus2099, kellnerd, nikki, otringal, navap, wmorg, and yeeeargh for having reported bugs and suggested improvements. Thanks to kellnerd and salorock for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-15.

Bug

  • [MBS-6532] – Work edits not in release edit history
  • [MBS-10186] – “Set track lengths” edit is stuck
  • [MBS-10381] – Duplicate work entity lyrics languages during edit
  • [MBS-10821] – Edit changing medium tracklist and format is stuck
  • [MBS-10836] – My Collections edit link doesn’t show edit page
  • [MBS-10856] – Relationship credit with whitespace causes error
  • [MBS-10867] – uncaught exception: entity not in the entities array
  • [MBS-10873] – Artist not always shown for annotation edits
  • [MBS-10882] – ISE when filtering RGs
  • [MBS-10885] – BadAmazonURLs: TypeError: Cannot read property ‘href_url’ of null
  • [MBS-10887] – CSS misalignment of country/date text with other columns
  • [MBS-10890] – Work language displayed multiple times in edit listings
  • [MBS-10892] – Collaborators can’t add releases to private collection

New Feature

  • [MBS-1736] – Block setting format on too early releases
  • [MBS-10862] – Report for releases with catalog numbers that look like label codes

Improvement

  • [MBS-4644] – Indicate which releases have CAA art in listings
  • [MBS-9340] – Don’t allow more languages if [No lyrics] is selected
  • [MBS-9931] – Fail gracefully when trying to remove a relationship which is in use as an example
  • [MBS-10469] – Show releases more likely to have a processed cover art on front page
  • [MBS-10893] – Add help text to select the appropriate artist from CD lookup
  • [MBS-10897] – Block always more smart links

React Conversion Task

  • [MBS-10393] – Convert Add Standalone Recording edit to React

Other Task

  • [MBS-6864] – Remove PUID edits
  • [MBS-10771] – Block tagging for unverified users
  • [MBS-10891] – Replace Magic Tagger link with AudioRanger

MusicBrainz Server update, 2020-06-02

Now that PostgreSQL has been upgraded to version 12 (see earlier instructions), regular improvements, bugfixes, and React/JSX template refactoring are back on the menu. The most noticeable improvement is probably that we are now able to display more specific error messages in the URL relationship editor when a link is not allowed, instead of always giving a generic “not good”-style message. There are even new features, for admins only, that will allow them to spot and delete sock-puppets and to temporarily disable edit notes from editors who continue to be disrespectful to others without having to delete their accounts outright.

In news not directly connected to the MusicBrainz website, the public search endpoint has been moved from the old search server to the Solr-based search server. This can be used by slave servers if they do not need, or can not afford, to host their own search indexes.

And while talking about slave servers, a new release of MusicBrainz Docker is also available. It follows this update of MusicBrainz Server and fixes a regression that affects servers with live indexing. See the release notes for update instructions.

Thanks to Cyna for converting more edits’ display to React, to KamranMackey and navap for updating external links’ icons. Thanks to alastairp, BestSteve, chaban, chirlu, cyberskull, Freso, hibiscuskazeneko, jesus2099, and Kid Devine for having reported bugs and suggested improvements. Thanks to eduardomariohs, kellnerd, mfmeulenbelt, salorock, and stich94 for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-06-02.

Bug

  • [MBS-9010] – Alerts about new edit notes are separate for production and beta
  • [MBS-10613] – Version info footer is shown on some pages in production
  • [MBS-10732] – Internal Server Error: Adding a collaborator to a collection without selecting from autocomplete dropdown
  • [MBS-10839] – “Add selected recordings for merging” missing in standalone-only overview
  • [MBS-10849] – Add release group preview on RE shows “This entity has been removed”
  • [MBS-10850] – Bad Amazon URLs report doesn’t ignore “streaming page” relationship
  • [MBS-10855] – Historic track edit involving since-removed entity doesn’t show recording
  • [MBS-10863] – “Edit barcode” edit doesn’t show barcode

New Feature

  • [MBS-10834] – New account flag for disabling ability to write edit notes
  • [MBS-10845] – Tool to allow account admins to look up accounts by e-mail

Improvement

  • [MBS #1516] – Update Facebook, Google Play, and Spotify icons
  • [MBS-7822] – Update OverClocked ReMix favicon
  • [MBS-8412] – Align number of edits per page
  • [MBS-9516] – Display specific error messages depending on URL validation rules
  • [MBS-9963] – Update Genius link format and logo
  • [MBS-10412] – Update URL cleanup for Niconico URLs, specifically channel links for artists
  • [MBS-10727] – Deny kasi-time.com URLs for lyrics
  • [MBS-10789] – Add validation for Genius links
  • [MBS-10813] – Update the Bandcamp logo used in the sidebar
  • [MBS-10831] – Allow niconico channel links for other entities than artist
  • [MBS-10840] – Capitalise “in key” info correctly in English guess case
  • [MBS-10841] – Add “Guess case” per-medium
  • [MBS-10842] – Remove report user link from deleted editor profiles
  • [MBS-10853] – Link to overview page for edits by subscribed editors in subscription email

React Conversion Task

  • [MBS-10397] – Convert Edit Event edit to React
  • [MBS-10399] – Convert Edit Recording edit to React
  • [MBS-10793] – Convert historic Move Release edit to React
  • [MBS-10799] – Convert historic Move Release to RG edit to React
  • [MBS-10817] – Convert Edit Label edit to React

Other Task

  • [MBS-7781] – Merge duplicate artist credits
  • [MBS-10785] – Remove link to FreeDB Gateway documentation
  • [MBS-10822] – Change tableColumns tables to use named parameters
  • [MBS-10860] – Merge the production and beta Redis stores
  • [MBS-10878] – Convey search queries from slave servers to Solr

PostgreSQL 12 Upgrade Instructions for MusicBrainz Server

Thanks to everyone for your patience during our downtime today. As promised, here are steps to follow to upgrade your own PG instance to v12. (Confused? See the previous blog post on this subject.)

If you’re already running v12, there are still some instructions you must follow!

For MusicBrainz Docker

If you’re running the new MusicBrainz Docker setup, an upgrade script exists for you to use. See the release notes for specific – hopefully brief – instructions.

For a Manual Setup (INSTALL.md Based)

If you aren’t using Docker but rather set up musicbrainz-server by hand following INSTALL.md, see the steps below.

Know that as an alternative, you can always import new data dumps from scratch (again following the steps in INSTALL.md) into a new PG 12 cluster. Just make sure you’re on the v-2020-05-18-postgres12 tag of musicbrainz-server while doing so.

If on the other hand you don’t mind getting your hands a bit dirty, you can use the quicker method below. Like INSTALL.md, this assumes you’re using Ubuntu/Debian and their postgresql-common cluster management tools.

If you’re already running v12, you should still follow these steps; however, you can skip the ones involving apt-get, pg_dropcluster, and pg_upgradecluster. The main steps you need to follow in this case are running the 20200518-pg12-before-upgrade.sql and 20200518-pg12-after-upgrade.sql scripts in that order.

On distros other than Debian/Ubuntu where the postgresql-common tools aren’t available, you’ll have to manage with initdb and pg_upgrade on your own.

  1. First take down the web server running MusicBrainz (stop plackup) to prevent database access.
  2. Turn off any cron jobs updating or accessing the database (e.g. for the live data feed/replication packets).
  3. Switch to the latest musicbrainz-server code with:
    git fetch origin && \
    git checkout v-2020-05-18-postgres12
  4. With PG 9.5 (or whatever version you’re using) still running, run the following “pre-upgrade” script:
    psql -U postgres -d musicbrainz_db \
    -f admin/sql/updates/20200518-pg12-before-upgrade.sql

    This assumes that “postgres” is the name of your PG superuser, and “musicbrainz_db” is the name of your database. If you see a few messages about things not existing, that’s normal.

  5. Install packages for PostgreSQL 12. On Ubuntu/Debian you can obtain them from the PGDG apt repo.
    apt-get update && \
    apt-get install postgresql-12 postgresql-server-dev-12

    If you’re installing postgresql-12 for the first time, this will automatically create a new cluster at /var/lib/postgresql/12/main. Remove that empty cluster. Don’t run this if you already had v12 installed and have data there!

    pg_dropcluster --stop 12 main
    If you did already have v12 installed with musicbrainz_db running there, leave the cluster alone and skip the next step involving pg_upgradecluster.

    In the unlikely event that you already have a v12 cluster, but also have musicbrainz_db running in a separate, older cluster, these instructions won’t work for you. We recommend importing fresh data dumps into the v12 cluster and dropping the old one.

  6. Upgrade the old cluster. This assumes it’s version 9.5; if you’re using version 10 or 11, make sure to replace 9.5 below with 10 or 11. If you have other databases in your old cluster besides musicbrainz_db, be aware that this will upgrade all of them to PG 12.
     pg_upgradecluster -v 12 9.5 main
  7. If all goes well, the new cluster should be up and running. (You can drop the old one if you like; the output of the pg_upgradecluster command will tell you how.) Now run the following “post-upgrade” script on the database:
    psql -U postgres -d musicbrainz_db -f \
    admin/sql/updates/20200518-pg12-after-upgrade.sql
    This may take a bit, as it has to recreate some indexes.
  8. The upgrade is complete. You can turn cron jobs back on, if applicable.
  9. Restart the MusicBrainz web server / plackup, if applicable. If you’re accessing the server in a web browser, the usual release upgrade steps apply, like running ./script/compile_resources.sh again.

If you run into any trouble following the above, please let us know and we’ll try to help resolve your issue as soon as possible!

MusicBrainz Docker composes with Solr 7

The MusicBrainz virtual machine is dead, long live the MusicBrainz Docker Compose project. In fact, the virtual machine has been running it for years. Mostly because the data loaded with the virtual machine was too soon obsolete, it doesn’t seem worth it anymore. Plus, new search indexes are much larger than before, and using Docker Compose directly is much more versatile.

The MusicBrainz Docker Compose project has been deeply revamped since two years ago and now ships the new search server based on Solr 7. It can be used for mirroring the MusicBrainz website and database, testing your own app with a local MusicBrainz web service, or developing the MusicBrainz Server itself. Check out the release notes!

Thanks to everyone who reported issues and contributed patches for two years!