Migration to TimescaleDB complete!

Yesterday I posted about why we decided to make the switch to TimescaleDB and then later in the day we actually made the switch!

We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.

In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.

The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!

Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!

ListenBrainz moves to TimescaleDB

The ListenBrainz team has been working hard on moving our primary listen store from InfluxDB to TimescaleDB, and today at UTC 16:00 we’re going to make the switch.

We were asked on Twitter as to why we’re making the switch — and in the interest of giving a real world use case for switching, I’m writing this post. The reasons are numerous:

Openness: InfluxDB seems on a path that will make it less open over time. TimescaleDB and its dependence on Postgres makes us feel much safer in this regard.

Existing use: We’ve been using Postgres for about 18 years now and it has been a reliable workhorse for us. Our team thinks in terms of Postgres and InfluxDB always felt like a round peg in a square hole for us.

Data structure: InfluxDB was clearly designed to store server event info. We’re storing listen information, which has a slightly different usage pattern, but this slight difference is enough for us to hit a brick wall with far fewer users in our DB than we ever anticipated. InfluxDB is simply not flexible enough for our needs.

Query syntax and measurement names: The syntax to query InfluxDB is weird and obfuscated. We made the mistake of trying to have a measurement map to a user, but escaping measurement names correctly nearly drove one of our team members to the loonie bin.

Existing data: If you ever write bad data to a measurement in InfluxDB, there is no way to change it. I realize that this is a common Big Data usage pattern, but for us it represented significant challenges and serious restrictions to put simple features for our users into place. With TimescaleDB we can make the very occasional UPDATE or DELETE and move on.

Scalability: Even though we attempted to read as much as possible in order to design a scalable schema, we still failed and got it wrong. (I don’t even think that the docs to calculate scalability even existed when we first started using InfluxDB.) Unless you are using InfluxDB in exactly the way it was meant to be used, there are chances you’ll hit this problem as well. For us, one day insert speed dropped to a ridiculously low number per second, backing up our systems. Digging into the problem we realized that our schema design had a fatal flaw and that we would have drastically change the schema to something even less intuitive in order to fix it. This was the event that broke the camel’s back and I started searching for alternatives.

In moving to TimescaleDB we were able to delete a ton of complicated code and embrace a DB that we know and love. We know how Postgres scales, we know how to put it into production and we know its caveats. TimescaleDB allows us to be flexible with the data and the amazing queries that can be performed on the data is pure Postgres love. TimescaleDB still requires some careful thinking over using Postgres, it is far less than what is required when using InfluxDB. TimescaleDB also gives us a clear scaling path forward, even when TimescaleDB is still working on their own scaling roadmap. If TimescaleDB evolves anything like Postgres has, I can’t wait to see this evolution.

Big big thanks to the Postgres and TimescaleDB teams!

The ODI publishes two reports on Sustainable Data Institutions

The Open Data Institute has just published two reports: Designing Sustainable Data Institutions and Designing Trustworthy Data Institutions which include insights provided by us regarding our MusicBrainz project.

When I was starting out MusicBrainz and was trying to work out how to make the project sustainable, I would’ve given just about anything to have access to these reports. I am proud that, nearly 20 years later, I was able to contribute to these reports so that others may benefit from our hard work.

I find the section Suggestions for those scoping, designing and running data institutions on page 40 of the PDF version of Designing Sustainable Data Institutions quite enlightening:

  1. Ensure your revenue model aligns with your organisational goals
  2. Understand how your revenue sources will change during your institution’s lifecycle
  3. Consider both financial and non-financial aspects of sustainability
  4. Identify and mitigate future risks
  5. Learn from others

Each of these points represent a whole collections of small lessons that I’ve learned by (often painful) experience of the past years. Also, I feel that these points are not strictly limited Data Institutions, but many also apply to making open source projects sustainable. If you’re in the business of running a data or open source organzation, I would strongly encourage you to read this paper!

Also very interesting is the second report about Designing Trustworthy Data Institutions:

For example, the representative from MusicBrainz said, “[A culture of honesty] builds trust, and this trust builds sustainability”

Compared to sustainability, the concepts of trust were much more clear to me from the beginning. However, that doesn’t make this report any less relevant — especially in current times, I welcome an emphasis on trust!

Thank you to the ODI for including MusicBrainz and doing all of the hard work on these reports!

 

ListenBrainz release 2020-03-28

We’ve just finished pushing a new release to the production server for ListenBrainz. We’ve spent quite a long time working on this because we needed to completely revamp how we were generating user statistics and that process is now finally complete and live. The other good news on user statistics it that we now have a generalized framework for creating them and that should make it much easier to create more user statistics going forward. We’ve triggered the stats engine to produce updated top artist statistics for everyone and those should update for users automatically sometime later today.

This release also includes an improved importer from last.fm, moving it to react and making it more friendly on a mobile device. This particular feature hasn’t been super well tested, so if you find a problem, please submit a bug report.

Next, if your listening history is screwed up for some reason, you can now delete all listens and start over, perhaps with a clean import from last.fm.

Finally, this release includes a pile of security updates to make the overall system more secure, but users shouldn’t notice anything different.

Thank you to iliekcomputers, Mr_Monkey, ishaanshah[m], shivam-kapila, pristine__ and everyone else who was involved in creating this update!

Welcoming Paula LeDieu to our board of directors!

Late in 2019, we finally filled our one vacant spot on our board of directors — we had been holding out until we found the right person and we finally have! And then towards the end of the year we all got distracted by holidays and world events and never got around to formally announcing that we have a new addition to our board.

With great pleasure I would like to announce that Paula LeDieu, an amazingly connected person who seems to know everyone, has joined the MetaBrainz Foundation Board of Directors! I first met Paula when I was bootstrapping MusicBrainz and pondering how to setup a foundation for the project — we would continuously bump into each other at various conferences in the world. Paula’s professional history includes a lot organizations that early on shaped the internet, including the BBC, iCommons and Mozilla. Paula’s professional experience and connections will be a great asset to our organization.

Paula lives in Sydney, Australia, stretching our board of directors across 3 continents and far too many time-zones. Thank you for agreeing to join our board of directors and welcome to the team, Paula!

Upgrading Postgres instead of schema change: 18 May, 2020

Hello!

We’ve long procrastinated upgrading our production Postgres installation and we’ve decided to forego a schema change upgrade and instead upgrade Postgres to version 12.x. (We will migrate to whatever the latest stable version in the 12.x series will be).

This means that on 18 May we will not make any changes to the MusicBrainz schema, but  we will have some amount of down-time and/or read-only time while we upgrade Postgres on our production servers. We haven’t sorted out all of the exact details of how we will carry out this database upgrade, but the date is now confirmed.

If you operate a replicated instance of the MusicBrainz database we STRONGLY urge you to upgrade your installation shortly after we upgrade the production servers. After this release our team may start using Postgres features not available in Postgres 9.5.x, which is our current production version.

As usual for our releases that impact our downstream users, we will post many more details closer to the date and once the migration is complete, we will post detailed instructions on how you can upgrade your own installation.

Please post any questions you may have!

Thanks!

Thank you for your continued support, Google!

We’ve recently received our annual $30,000 support from Google. The brings the total amount donated by Google’s Open Source Programs Office to us to over $470,000 — hopefully next year we’ll cross the half million dollar threshold!

I can’t quite express my gratitude for this level of support! Without Google’s help, especially early on, MetaBrainz may never have made it to sustainability. Google has helped us in a number of ways, including Google Code-In and Summer of Code — all of these forms of support have shaped our organization quite heavily over the past 15 or so years.

Thank you to Google and everyone at the Google Open Source Programs Office — we truly appreciate your support over the years!

Thank you Microsoft!

Microsoft reached out to us back in early 2018 in order to use our data in Bing — we followed the normal sort of on-boarding procedure that we use for our supporters. During one of these on-boarding calls we were asked if there was more that Microsoft could do to help us and support our mission. Soon thereafter I provided them with a list of things that would be useful to us. Sadly, the request to buy a major record label and then to give it to us to manage was turned down for being too expensive. 😦

However, Microsoft did like two items on our list and agreed to support us — they were:

1) Azure hosting credits — we’re always looking for more hosting capacity and these credits will allows us to provide virtual machines to our team and to close collaborators who are doing good work, but might be lacking the computing power to push their projects forward. This contribution is of direct benefit to our community — often times our projects contain quite a lot of data and thus have some heavy processing requirements. We’re currently using our hosting credits to do some large data set crunching and some testing for the Virtual Machine that we provide to users who wish to get up and running with MusicBrainz data quickly.

2) Sponsoring our summit — our annual team meeting and foundation summit happens at the end of each September, normally in Barcelona where we have our main office. Microsoft’s sponsorship allows us to invite more people to the event, since we have the means to cover their expenses. Our summits have traditionally been our annual forum for meeting the other team members and volunteers and to take a breather from the normal course of business. At the event we see a more human side of each other and we’re more easily able to discuss our challenges and the vision for the future.

We really appreciate our supporters who go above and beyond the normal levels of support for us — these contributions really sweeten the deal of hacking on open source software!

Thank you so much to Microsoft and everyone at Microsoft who helped move this contribution forward!

Please nominate us for the Open Publishing Awards!

We’ve recently found out about the Open Publishing Awards::

The goal of the inaugural Open Publishing Awards is to promote and celebrate a wide variety of open projects in Publishing.

All content types emanating from the Publishing sector are eligible including Open Access articles, open monographs, Open Educational Resource Materials, open data, open textbooks etc.

Open data? That’s us! We’ve got a pile of it and if you like the work we do, why not nominate us for an award?

Thanks!

We were sued by a copyright troll and we prevailed!

must be monetary compensation

On August 9th, 2018 we were served with a United States federal copyright infringement lawsuit over a handful of images displayed on our musicbrainz.org artist pages. These images were made available by Larry Philpot, a photographer, on Wikimedia Commons and we “deep linked” to the images (that note the license details and attribute the images to their creator) from our artist pages, in accord with the license terms.

The MetaBrainz Foundation prides itself in treading carefully in legal matters and so we were surprised to receive a lawsuit of this nature. All allegations in the suit were deemed false by our legal team. If you wish to find out more about this lawsuit, we encourage you to read the documents that were served to us.

Upon being served with the lawsuit, MetaBrainz contacted our legal guardian angel: Ed Cavazos of Pillsbury Winthrop Shaw Pittman LLP, who has been watching over the foundation since its inception. Ed proposed our case to the pro-bono committee at Pillsbury and to our great pleasure the case was accepted! Pillsbury officially became our legal representatives in defending us in this lawsuit.

Ed assembled a team (Brian Nash, Ben Bernell, Sarah Goetz) who fired off an immediate response to the lawsuit. The team filed a timely response with the court and then began a lengthy journey of educating themselves on how MetaBrainz conducts business, how it hosts its websites, and how these websites came into existence. Over the course of many emails and calls, MetaBrainz produced volumes of conversations, bug reports, Git commits and various other forms of substantiating information that the legal team used to form a strategy.

Our legal team operated on the basis that “the best defense is a good offense”. The team’s filing showed that the accusations were unfounded and went on to question the motives and methods of the plaintiff, who has a history of taking legal action against Creative Commons users. In these legal actions he claims that the users have violated Creative Commons licenses, according to narrow, non-customary interpretations of the obligations and limitations set out in CC licenses. It didn’t take long for the plaintiff to feel our pressure and decide to cut their losses. On February 28, 2019 the lawsuit was dismissed with prejudice!

Now that this is behind us, the MetaBrainz Foundation had to figure out what to do about showing Wikimedia Commons images on our websites. We talked with both the Wikimedia Foundation and Creative Commons to discuss what had happened. We learned that both Wikimedia and Creative Commons had started their own processes to examine and address the issues that led to the lawsuit being filed against the MetaBrainz Foundation.

We’re looking forward to seeing firm and decisive action from our friends at Creative Commons and Wikimedia, before other people and nonprofits are put in harm’s way by what in our opinion constitutes unacceptable, predatory misuse of CC licenses and Wikimedia Commons. MetaBrainz has made sure that CC and Wikimedia know about our experience and now we’re returning our focus to our core mission.

While we wait for Wikimedia Commons and Creative Commons to take action on this, we will not reinstate artist images or include any images that link to Wikimedia Commons. We prevailed in this lawsuit and thanks to our pro-bono legal team we suffered no harm. Being dragged through lengthy court proceedings by trolls hoping to make an example of us could exhaust our reserves and leave us broke — but that won’t stop us from vigorously defending ourselves. We are not going to let a bully push us around.

That’s about all we can say about this. The court filings speak volumes about the merits of the case and the problems of predatory abuse of CC licenses. It sucks to be the target of a pointless, predatory lawsuit. We’ve always been very careful about staying on the right side of the law, and we’re prepared to go to court to prove it, even if we can’t get pro-bono counsel.

The MetaBrainz Foundation owes a debt of gratitude to Ed Cavazos, Brian Nash, Ben Bernell, Sarah Goetz and Pillsbury Winthrop Shaw Pittman LLP. We cannot overstate how fortunate we are that the team came to our rescue at a very critical juncture. Thank you to the whole team and Pillsbury Winthrop Shaw Pittman LLP. Thank you!

We would also like to thank Cory Doctorow (one of our directors) for initiating and participating in many conversations. Not only was Cory’s advice critical in dealing with the lawsuit, but it was Cory and the EFF who connected us to Ed Cavazos in the first place, 15 years ago. Thank you!

I personally would like to thank Nicolás Tamargo and Michael Wiencek for their support in digging for documentation to support our side of the case. Thank you for your tireless efforts!

Finally I would like to thank our board of directors for their support in this process. Thank you Cory, Matthew, Rassami, Paul and Nick!

UPDATE: A few people have requested for us to publish our response to the lawsuit. On 28 September 2018, we filed this response with the court.  That is the only public filing we made — the lawsuit was dropped on February 28, 2019 as a direct result of private conversations with the plaintiff.