We created AcousticBrainz 7 years ago and started to collect data with the goal of using that data down the road once we had collected enough. We finally got around to doing this recenty, and realised that the data simply isn’t of high enough quality to be useful for much at all.
We spent quite a bit of time trying to brainstorm on how to remedy this, but all of the solutions we found require a significant amount of money for both new developers and new hardware. We lack the resources to commit to properly rebooting AcousticBrainz, so we’ve taken the hard decision to end the project.
Read on for an explanation of why we decided to do this, how we will do it, and what we’re planning to do in the future.
Why we’re doing this
When we launched AcousticBrainz, we had a few goals which we wanted to achieve with the project and collected data
- Generate a list of musical characteristics of audio recordings, such as musical key and tempo (BPM).
- Use the extracted data to automatically predict other musical characteristics such as instrumentation, genre, or mood of the music based on the current state of the art algorithms and models for music classification.
- Provide a source of mathematical features extracted from audio which other people could use to build their own models to predict other musical characteristics
Unfortunately, a number of things happened with the data that we collected which made us decide that the quality of the data isn’t as useful as we had hoped
- The musical key data that we were generating was accurate on some styles of music, but not on the full range of music that we collected in AcousticBrainz. The BPM tools work well on a wide range of music, but there are many recordings for which the predicted value is incorrect. The data that is generated by these algorithms is unable to indicate a confidence level of the predicted value, and so we are unable to determine which data we can trust.
- Early on in the release of the AcousticBrainz data we determined that the existing models that we had for categories such as genre didn’t work very well, however further experiments that we performed to build new models showed that it was difficult to get good results that covered the full range of content in the database.
- Right about the time that we released the AcousticBrainz data extractor, Deep Learning techniques for performing this kind of prediction started to become more prevalent. Unfortunately, the resolution of the data that we collect in AcousticBrainz is not enough to be used in this type of machine learning, and so we were unable to try these new techniques using the data that we had available in the database. The type of data that we made available meant that researchers and others who were working on this kind of task were not as interested in the data as we had hoped.
- We spent some time introducing content-based similarity to AcousticBrainz, but when we used this data ourselves for generating similar / recommended recordings, it didn’t give good results.
Unfortunately, within the MetaBrainz team we don’t have the resources and developer availability to perform this kind of research ourselves, and so we rely on the assistance of other researchers and volunteers to help us integrate new tools into AcousticBrainz, which is a relationship that we haven’t managed to build over the last few years.
What we’re going to do next
Based on the current state of the data in AcousticBrainz, we don’t want to keep promoting it as an accurate representation of the music that has been analysed, therefore we have decided to stop collecting data.
In the next month or so we will stop accepting new data submissions to AcousticBrainz. We’ll remove downloads for the submission tools, and modify the AcousticBrainz API to stop accepting new submissions. The rest of the API and other tools in the site will continue to work as before.
We’ll make a full dump of all data available in AcousticBrainz, so that if anyone wants to download and use it themselves, they will be able to do so. In early 2023 we will shut down the AcousticBrainz site.
What we’re planning to do in the future
Part of the initial goal of AcousticBrainz was to provide a way to characterise and organise the recordings that are in the MusicBrainz database. This is still something that we’re interested in collecting, and we have some ideas about how to integrate this into other MetaBrainz projects. We have a few current ideas about how we want to go about this:
- Focus on user-provided tagging for music characteristics such as genre and mood/emotion. We have a good base for storing this in MusicBrainz, and plan to integrate new functionality into ListenBrainz to encourage the MetaBrainz community to help add more data. This data will be used in the new recommendation systems that we are starting to build into ListenBrainz.
- Use some improved tools to compute specific musical characteristics. We have been reviewing some of the recent work in tempo estimation and are looking to see how we can integrate it with tools such as Picard so that we can allow people to compute these features if they need them, and help us confirm that the computed data is correct.
Importantly, this doesn’t mean that we are not interested in generating tools for music recommendation. On the contrary, our recent work has shown us that the data that we already have in ListenBrainz (user listening history), and data in MusicBrainz (metadata, relationships, links, and tags) give great results for the recommendations that we have started to build, and so we want to focus on improving and using this data going forward. Also, focusing only on one project, rather than two will actually allow us to reach these goals sooner.
Please leave a comment if you have any questions!