July 2020 – MetaBrainz Blog

Migration to TimescaleDB complete!

Yesterday I posted about why we decided to make the switch to TimescaleDB and then later in the day we actually made the switch!

We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.

In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.

The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!

Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!

ListenBrainz moves to TimescaleDB

The ListenBrainz team has been working hard on moving our primary listen store from InfluxDB to TimescaleDB, and today at UTC 16:00 we’re going to make the switch.

We were asked on Twitter as to why we’re making the switch — and in the interest of giving a real world use case for switching, I’m writing this post. The reasons are numerous:

Openness: InfluxDB seems on a path that will make it less open over time. TimescaleDB and its dependence on Postgres makes us feel much safer in this regard.

Existing use: We’ve been using Postgres for about 18 years now and it has been a reliable workhorse for us. Our team thinks in terms of Postgres and InfluxDB always felt like a round peg in a square hole for us.

Data structure: InfluxDB was clearly designed to store server event info. We’re storing listen information, which has a slightly different usage pattern, but this slight difference is enough for us to hit a brick wall with far fewer users in our DB than we ever anticipated. InfluxDB is simply not flexible enough for our needs.

Query syntax and measurement names: The syntax to query InfluxDB is weird and obfuscated. We made the mistake of trying to have a measurement map to a user, but escaping measurement names correctly nearly drove one of our team members to the loonie bin.

Existing data: If you ever write bad data to a measurement in InfluxDB, there is no way to change it. I realize that this is a common Big Data usage pattern, but for us it represented significant challenges and serious restrictions to put simple features for our users into place. With TimescaleDB we can make the very occasional UPDATE or DELETE and move on.

Scalability: Even though we attempted to read as much as possible in order to design a scalable schema, we still failed and got it wrong. (I don’t even think that the docs to calculate scalability even existed when we first started using InfluxDB.) Unless you are using InfluxDB in exactly the way it was meant to be used, there are chances you’ll hit this problem as well. For us, one day insert speed dropped to a ridiculously low number per second, backing up our systems. Digging into the problem we realized that our schema design had a fatal flaw and that we would have drastically change the schema to something even less intuitive in order to fix it. This was the event that broke the camel’s back and I started searching for alternatives.

In moving to TimescaleDB we were able to delete a ton of complicated code and embrace a DB that we know and love. We know how Postgres scales, we know how to put it into production and we know its caveats. TimescaleDB allows us to be flexible with the data and the amazing queries that can be performed on the data is pure Postgres love. TimescaleDB still requires some careful thinking over using Postgres, it is far less than what is required when using InfluxDB. TimescaleDB also gives us a clear scaling path forward, even when TimescaleDB is still working on their own scaling roadmap. If TimescaleDB evolves anything like Postgres has, I can’t wait to see this evolution.

Big big thanks to the Postgres and TimescaleDB teams!

Picard 2.4 Beta 2

Following the first Picard 2.4 beta we have released Picard 2.4 beta 2 to address a couple of reported issues. Thanks a lot to everyone who tested the last beta and reported issues. The following bugs have been fixed in beta 2:

[PICARD-1864] – Adding single files does ignore existing MBIDs
[PICARD-1866] – Coverart pane does not update during / after saving files
[PICARD-1867] – Guess format fallback is broken
[PICARD-1868] – CAA type selection dialog does not translate “Unknown”

See our previous blog post about Picard 2.4 beta 1 for the changes since the last stable release.

Picard 2.4 beta 2 is available for download from the download page.

Please report bugs on the Picard issue tracker and provide feedback in the community forums.

Please also help translate Picard. There have been many changes to the user interface and existing translations need to be updated for the final 2.4 release. Translating is easy and can be done online: Head over to MusicBrainz’s translation page on Transifex and click on “Help Translate MusicBrainz”.
Once you have registered an account on Transifex you can start translating. For Picard the primary resource to translate is “picard“, but there is also the “picard_appstream” resource which is used for providing descriptions for various Linux software-center applications.

Picard 2.4 Beta 1

Picard 2.4 Beta 1 is now available. There have been some important changes and we would like to gather feedback with this beta release before releasing the final Picard 2.4.

This release contains code changes by Gabriel Ferreira, Laurent Monin, Bob Swift, Philipp Wolfer, RaysDev, Wieland Hoffmann and new contributors Adam James and jcaesar.

Thanks a lot to everybody who contributed to this release with code, translations, bug reports and general feedback.

What’s new?

The most notable change in this release are significant performance improvements when handling large amount of files thanks to the excellent work of Gabriel Ferreira.

We would also like to get some feedback on the new scripting auto completion feature and the scripting documentation provided directly inside Picard. Windows 10 users can also try Picard’s new support for Windows 10 dark mode.

Here is the full list of changes:

Bugfixes

[PICARD-1753] – Fix font size of script editor and log view on Windows
[PICARD-1807] – Wrong error handling when using python-libdiscid
[PICARD-1813] – $title function throws error on empty value
[PICARD-1820] – PLUGIN_VERSION no longer displayed correctly in plugins dialog
[PICARD-1823] – Genre tag ordering is non-deterministic
[PICARD-1826] – “no appropriate stream found” when saving .ogg (OPUS) file
[PICARD-1838] – Files with a .dff file extension are interpreted as DSF files and fail to load
[PICARD-1853] – Crash if tags contain null character
[PICARD-1855] – Relationships not tagged for non-album track
[PICARD-1859] – “ValueError: Invalid literal” followed by crash when opening certain files

New Features

[PICARD-1704] – Support Windows 10 dark mode
[PICARD-1797] – Autocompletion for script functions and variables
[PICARD-1798] – Add support for inline translatable script documentation

Improvements

[PICARD-824] – Expand all option submenus by default
[PICARD-920] – Remember selected options page
[PICARD-1117] – Instrumental recordings of a work should set language to “No lyrics”
[PICARD-1796] – Consider release date when matching files to releases
[PICARD-1805] – Make it easier to add the first script
[PICARD-1818] – Make PyQt5.QtDBus optional
[PICARD-1829] – Add support for disc numbers in cluster Info dialog tracklists
[PICARD-1831] – Mitigate performance impacts of file selection and UI updates during processing
[PICARD-1840] – Instrumental recordings of a work should drop the lyricist credit
[PICARD-1842] – AIFF and DSF: Add support for albumsort, artistsort, titlesort and discsubtitle
[PICARD-1843] – Improve load and clustering performance
[PICARD-1844] – Further improve loading and clustering performance
[PICARD-1845] – Add “lookup in browser” for musicbrainz_discid tag in metadata view
[PICARD-1846] – Metadata.unset should not raise KeyError
[PICARD-1847] – Restructure tag compatibility options
[PICARD-1852] – Make about a separate dialog
[PICARD-1854] – Improve sorting performance in main window
[PICARD-1856] – Use pgettext function in Python 3.8

Download

Picard 2.4 beta 1 is available for download from the download page.

Helping out

The easiest way to help us getting a great Picard 2.4 release is using and testing this release candidate. Please report bugs on the Picard issue tracker and provide feedback in the community forums.

If you are a software developer you are very welcomed to provide fixes and features. Picard is free software and the source code is available on GitHub. See Developing on the Picard website to get started.