api – MetaBrainz Blog

We can’t have nice things… because of AI scrapers

In the past few months the MetaBrainz team has been fighting a battle against unscrupulous AI companies ignoring common courtesies (such as robots.txt) and scraping the Internet in order to build up their AI models. Rather than downloading our dataset in one complete download, they insist on loading all of MusicBrainz one page at a time. This of course would take hundreds of years to complete and is utterly pointless. In doing so, they are overloading our servers and preventing legitimate users from accessing our site.

Now the AI scrapers have found ListenBrainz and are hitting a number of our API endpoints for their nefarious data gathering purposes. In order to protect our services from becoming overloaded, we’ve made the following changes:

The /metadata/lookup API endpoints (GET and POST versions) now require the caller to send an Authorization token in order for this endpoint to work.
The ListenBrainz Labs API endpoints for mbid-mapping, mbid-mapping-release and mbid-mapping-explain have been removed. Those were always intended for debugging purposes and will also soon be replaced with a new endpoints for our upcoming improved mapper.
LB Radio will now require users to be logged in to use it (and API endpoint users will need to send the Authorization header). The error message for logged in users is a bit clunky at the moment; we’ll fix this once we’ve finished the work for this year’s Year in Music.

Sorry for these hassles and no-notice changes, but they were required in order to keep our services functioning at an acceptable level.

Pissed off by Spotify Enshittifying more API endpoints? We can help!

Today Spotify announced that a number of APIs will no longer be available for new users.

While Spotify won’t immediately take away these endpoints for existing users, it certainly does not inspire confidence for their longevity. Spotify cites “security reasons” as an explanation of why they are closing off these APIs, but we are unclear as to how that will improve security, so we need to assume that Spotify has some other motivations behind this move. More likely than not, they are hatching a strategy to protect their algorithmic assets from data crawlers used by third-party AI companies.

Needless to say, the Spotify services continue to get enshittified, taking away very useful features that developers have come to rely on. ListenBrainz has very different goals, being entirely open-source and part of a non-profit foundation, and we won’t pull the rug out from under our users for monetary or “security” reasons.

On the contrary, our very small team works in direct collaboration with users and developers interested in developing new discovery tools in the music space, and we embrace the variety of ways passionate music lovers want to interact with music collections and recommendations.

Our own frustrations with Spotify’s ever-worsening recommendations was the spark that lit up our interest in recommendations, but again our approach is one of fairness (we don’t tip the scales) focused on the user’s experience rather than the deep pockets of multinational labels.

For developers frustrated that their app stopped working, the good news is that the ListenBrainz team has been working on building some new datasets and API endpoints that offer replacements for what Spotify is taking away. While not everything that Spotify is enshittifying has a direct replacement with ListenBrainz, we can at least offer a path forward for developers.

These features/datasets include:

Artist similarity*: You can use our ListenBrainz Labs API to explore similar artists.The similarity datasets are still somewhat limited, since we’ve not been able to run all available data through that algorithm, but we plan to do that in the very near future.
Recommendations for a user: Recommendations are available for ListenBrainz users who send us listen information.
Custom playlist generation: https://listenbrainz.org/explore/lb-radio/ allows you to customize generated playlists as in-depth as you want (see the documentation for more details)
Popularity data: Find out the popularity of artists on ListenBrainz, as well as popular tracks for an artist
Fresh releases: Keep up with new albums coming out, generally or specifically for your taste (as seen on https://listenbrainz.org/explore/fresh-releases/)

Future new datasets include:

Track similarity
Album similarity
Your dream feature here

All of this data is Creative Commons CC0 licensed (read Public Domain) and available on our API endpoints, for free, forever. MetaBrainz is a California 501(c)3 non-profit organization dedicated to creating, maintaining and ensuring that these datasets are available for public use.

And on top of that, the person who coined the term “Enshittification”, Cory Doctorow, has been on our board of directors for 20 years, further ensuring that we’re enshittification proof.

Come play with our data – we’d love your feedback! We’re working hard to make this data better and if it doesn’t yet meet your needs, we hope to meet them soon!

* for the similar artist search, use this value for “algorithm”: session_based_days_7500_session_300_contribution_5_threshold_10_limit_100_filter_True_skip_30

State of the Brainz: 2019 MetaBrainz Summit highlights

The 2019 MetaBrainz Summit took place on 27th–29th of September 2019 in Barcelona, Spain at the MetaBrainz HQ. The Summit is a chance for MetaBrainz staff and the community to gather and plan ahead for the next year. This report is a recap of what was discussed and what lies ahead for the community.

The end of the replication nightmare!

I’m pleased to report that our nightmare of finding/reconstructing the missing replication packets is finally over!

Through many heroic hours of work, Bitmap and Chirlu have reconstructed the missing replication packets. All clients should now be on their way to being up to date. We’ve learned a number of lessons (some good, some bad — that’s life, right?) in this ordeal and we hope to avoid these issues in the future.

An integral part of this recovery process were a number of people from our community who helped us: Users mbcz, rembo10 and xeam sent us their complete DB dumps! Bitmap used these to sanity check and diff several other database to finally extract the missing packets. Thank you for dropping what you were doing and sending us a few GB of data over blazingly fast connections. Without you this would not have been possible; and this is not an exaggeration. Thank you!

After some more rest we’re going to continue to put out smaller fires that remain from the move to NewHost, but for now, the big fires are put out. Just in time for the weekend!

In the 11 year history of the replication stream we’ve had to have users restart their stream about 3-4 times because of problems on our end. Zero would’ve been nicer, but I’m proud that we’ve been able to make this system work for so long. On a daily basis we seem to have about 400 replicated copies of MusicBrainz running all over the world. Clearly this part of our service is well used and I sleep a little better at night knowing that our most critical data is backed up across the globe.

Just for fun, here is a graph of the replication API usage over the last 6 months:

Towards the end the graph shows the week plus long break, then a small blip as some of our replicas got unstuck yesterday and the much larger spike shows the rest of the replicas getting unstuck. Now, as to what caused the blip in mid-October — I have no idea.

Anyways, please accept my apologies for the replication stream outage and keep replicating!

Thanks!