Summit – MetaBrainz Blog

You are invited to MetaBrainz Summit 25

MetaBrainz Summit 25 is upon us! September 15-19 in Barcelona, Spain.

We would love for you to join us remotely. If you are reading this, you are qualified to attend. Congratulations! Read on for more information.

MetaBrainz Summit 2024

*MetaBrainz nerds at the Jantar Mantar observatory.* *Left to right: jasje, reosarevok, atj, zas, KasukabeDefenceForce, monkey, yvanzo, lucifer, mayhem, ansh, theflash_, kellnerd, bitmap, akshaaatt, ApeKattQuest, outsidecontext, aerozol*

This year it was New Delhi, India, that was invaded by data nerds from across the globe!

The MetaBrainz team was treated to the glorious chaos, hospitality, sights, noise, sweets, monkeys, traffic, heat, and delicious food of India. We reflected on the last year in MetaBrainz, planned and collaborated for the future, and got a little work done – when we could fit it in between mouthfuls of Indian sweets.

Read on for a comprehensive summit recap, including the annual recap for each MetaBrainz project, as well as breakout session notes, photos, and links to the slides and video recordings.

You are invited to MetaBrainz Summit 24

MetaBrainz Summit 24 is upon us! September 23-27 in New Delhi, India.

We would love for you to join us remotely. If you are reading this, you are qualified to attend. Congratulations! Read on for more information.

MetaBrainz Summit 2023

As always, the silliest photo is the best photo. Left to right: aerozol, zas, outsidecontext, mayhem, yvanzo, bitmap, monkey, kellnerd, akshaaatt, reosarevok, laptop: atj, lucifer

A year has flown by and once again the MetaBrainz team found itself in the MetaBrainz HQ in Barcelona, Spain, for #summit23. And once again we were munching on a mountain of international chocolates, hiking Mt Montserrat, bird-watching, groaning at terrible puns, testing out mayhem’s Bartendro cocktail robot (some of the team committing themselves too thoroughly to this testing), and of course discussing everything and anything MetaBrainz related. This year we had a longer summit, taking place over the week instead of the usual weekend, broken up into three days of presentations, followed by two days of hands-on ‘hacking’.

This means it’s time to strap in for a long post!

You are invited to MetaBrainz Summit 23

Has it been a year already? MetaBrainz Summit 23 is one week away! Mark it in your calendar – October 2-6 at MetaBrainz HQ in Barcelona, Spain.

We would love for you to join us remotely. All summit information and links are in the Summit 2023 wiki page. You are invited, regardless of how long you have been in the MeB community, or your MeB interests.

MetaBrainz Summit 2022

The silliest, and thus best, group photo from the summit. Left to right: Aerozol, Monkey, Mayhem, Atj, lucifer (laptop), yvanzo, alastairp, Bitmap, Zas, akshaaatt

After a two-year break, in-person summits made their grand return in 2022! Contributors from all corners of the globe visited the Barcelona HQ to eat delicious local food, sample Monkey and alastairp’s beer, marvel at the architecture, try Mayhem’s cocktail robot, savour New Zealand and Irish chocolates, munch on delicious Indian snacks, and learn about the excellent Spanish culture of sleeping in. As well as, believe it or not, getting “work” done – recapping the last year, and planning, discussing, and getting excited about the future of MetaBrainz and its projects.

We also had some of the team join us via Stream; Freso (who also coordinated all the streaming and recording), reosarevok, lucifer, rdswift, and many others who popped in. Thank you for patiently waiting while we ranted and when we didn’t notice you had your hand up. lucifer – who wasn’t able to come in person because of bullshit Visa rejections – we will definitely see you next year!

A summary of the topics covered follows. The more intrepid historians among you can see full event details on the wiki page, read the minutes, look at the photo gallery, and watch the summit recordings on YouTube: Day 1, Day 2, Day 3

Mobile Apps: Let’s welcome the ListenBrainz App!

Greetings, Everyone!

During the recent summit, we discussed the future of our mobile apps. We believe that the MusicBrainz app serves a particular user base which is highly interested in scrolling through their collections, using the barcode scanner, searching for entities and viewing this data with a native mobile experience. The tagger in the android app is not accurate and doesn’t carry forward the expectations brought in from using Picard on the Desktop. Hence, we have decided to retire the tagger from the MusicBrainz app.

Recently, we have added the BrainzPlayer to the app, Spotify support and functionalities to review and submit listens to ListenBrainz. While the features are really good, they don’t align with the MusicBrainz app and confuse the two separate user bases, that of MusicBrainz and ListenBrainz.

Given that we have limited contributors working on our mobile apps, we have decided to separate the two mobile apps with their respective features. MusicBrainz App will be stripped of these excessive features, while also removing the tagger and continue to be available on the Play store as a minimalistic app.

Our major focus will move to the ListenBrainz app which will continue to have regular updates and features made while existing on the Play store as a separate app.

We are excited and happy with this announcement. Hope you agree with our decision. Thank you!

Thank you Microsoft!

Microsoft reached out to us back in early 2018 in order to use our data in Bing — we followed the normal sort of on-boarding procedure that we use for our supporters. During one of these on-boarding calls we were asked if there was more that Microsoft could do to help us and support our mission. Soon thereafter I provided them with a list of things that would be useful to us. Sadly, the request to buy a major record label and then to give it to us to manage was turned down for being too expensive. 🙁

However, Microsoft did like two items on our list and agreed to support us — they were:

1) Azure hosting credits — we’re always looking for more hosting capacity and these credits will allows us to provide virtual machines to our team and to close collaborators who are doing good work, but might be lacking the computing power to push their projects forward. This contribution is of direct benefit to our community — often times our projects contain quite a lot of data and thus have some heavy processing requirements. We’re currently using our hosting credits to do some large data set crunching and some testing for the Virtual Machine that we provide to users who wish to get up and running with MusicBrainz data quickly.

2) Sponsoring our summit — our annual team meeting and foundation summit happens at the end of each September, normally in Barcelona where we have our main office. Microsoft’s sponsorship allows us to invite more people to the event, since we have the means to cover their expenses. Our summits have traditionally been our annual forum for meeting the other team members and volunteers and to take a breather from the normal course of business. At the event we see a more human side of each other and we’re more easily able to discuss our challenges and the vision for the future.

We really appreciate our supporters who go above and beyond the normal levels of support for us — these contributions really sweeten the deal of hacking on open source software!

Thank you so much to Microsoft and everyone at Microsoft who helped move this contribution forward!

State of the Brainz: 2019 MetaBrainz Summit highlights

The 2019 MetaBrainz Summit took place on 27th–29th of September 2019 in Barcelona, Spain at the MetaBrainz HQ. The Summit is a chance for MetaBrainz staff and the community to gather and plan ahead for the next year. This report is a recap of what was discussed and what lies ahead for the community.

AcousticBrainz at the 2018 MetaBrainz Summit

We had an in-person meeting at the MTG during the MetaBrainz summit to discuss the status and future of AcousticBrainz. We came up with a rough outline of things that we want to work on over the next year or so. This is a small list of tasks that we think will have a good impact on the image of AcousticBrainz and encourage people to use our data more.

State of AcousticBrainz

AcousticBrainz has a huge database of submissions (over 10 million now, thanks everyone!), but we are currently not using the wealth of data to our advantage. For the last year we’ve not had a core developer from MetaBrainz or MTG working on existing or new features in AcousticBrainz. However, we now have:

Param, who is including AcousticBrainz in his role with MetaBrainz
Rashi, who worked on AcousticBrainz for GSoC and is going to continue working with us
Philip, who is starting a PhD at MTG, focused on some of the algorithms/data going into AcousticBrainz
Alastair, who now has more time to put towards management of the project

Because of this, we’re glad to present an outline of our next tasks for AcousticBrainz:

Short-term

Some small tasks that are quick to finish and we can use to show off uses of the data in AcousticBrainz

Merge Philip’s similarity, including an API endpoint

Philip’s masters thesis project from last year uses PostgreSQL search to find acoustically similar recordings to a target recording. This uses the features in AcousticBrainz. We need to ensure that PostgreSQL can handle the scale of data that we have.

An extension of this work is to use the similarity to allow us to remove bad duplicate submissions (we can take all recordings with the same MBID and see if they are similar to each other, if one is not similar we can assume that it’s not actually the same as the other duplicates, and mark it as bad). We want to make these results available via an API too, so that others can check this information as well.

Merge Existing PRs

We have many great PRs from various people which Alastair didn’t merge over the last year. We’re going to spend some time getting these patches merged to show that we’re open to contributions!

Publish our Existing models

In research at MTG we’ve come up with a few more detailed genre models based on tag/genre data that we’ve collected from a number of sources. We believe that these models can be more useful that the current genre models that we have. The AcousticBrainz infrastructure supports adding new models easily, so we should spend some time integrating these. There are a few tasks that need to be done to make sure that these work

Ensure that high-level dumps will dump this new data (If we have an existing high-level dump we need to make a new one including the new data)
Ensure that we compute high-level data for all old submissions (we currently don’t have a system to go back and compute high-level data for old submissions with a new model, the high-level extractor has to be improved to support this)

Update/fix some pages

We have a number of issues reported about unclear text on some pages and grammar that we can improve. Especially important are

API description (we should remove the documentation from the main website and just have a link to the ReadTheDocs page)
Front page (Show off what we have in the project in more detail, instead of just a wall of text)
Data page (instead of just showing tables of data, try and work out a better way of presenting the information that we have)

Fix Picard plugin

When AB was down during our migration we were serving HTML from our API pages, which caused Picard to crash if the AB plugin was enabled while trying to get AB data. This should be an easy fix in the Picard plugin.

High Impact

These are tasks that we want to complete first, that we know will have a high impact on the quality of the data that we produce.

Frame-level data

We want to extract and store more detailed information about our recordings. This relies on working being done in MTG to develop a new extractor to allow us to get more detailed information. It will also give us other improvements to data that we have in AB that we know is bad. This data is much bigger than our current data when stored in JSON (hundreds of times larger), so we need to develop a more efficient way of storing submissions. This could involve storing the data in a well-known binary data exchange format. A bunch of subtasks for this project:

Finish the essentia extractor software
Decide on how to store items on the server (file format, store on disk instead of database)
Work out a way to deal with features from two versions of the extractor (do we keep accepting old data? What happens if someone requests data for a recording for which we have the old extractor data but not the new one?)
Upgrade clients to support this (Change to HTTPS, change to the new API URL structure, ensure that clients check before submission if they’re the latest version, work out how to compress data or perform a duplicate check before submission)
Deduplication (If we have much larger data files, don’t bother storing 200 copies for a single Beatles song if we find that we already have 5-10 submissions that are all the same)

MusicBrainz Metadata

Rashi’s GSoC project in 2018 helped us to replicate parts of the MusicBrainz database into AcousticBrainz. This allows us to do amazing things like keep up-to-date information about MBID redirects, and do search/browse/filtering of data based on relationships such as Artists just by making a simple database query. We want to merge this work and start using it.

Dumps

When we changed the database architecture of AcousticBrainz in 2015 we stopped making data dumps, making people rely on using the API to retrieve data. This is not scalable, and many people have asked for this data. We want to fix all of the outstanding issues that we’ve found in the current dumps system and start producing periodic dumps for people to download.

Build more models

In addition to the existing models that we’ve already built (see above, “Publish our Existing models”), we have been collecting a lot of metadata that we could use to make even more high-level models which we think will have a value in the community. Build these models and publicly release them, using our current machine learning framework.

Wishlist

These are tasks that we want to complete that will show off the data that we have in AcousticBrainz and allow us to do more things with the data, but should come after the high-impact tasks.

Expose AB data on MusicBrainz

As part of the process to cross-pollinate the brainz’s, we want to be able to show a small subset of AB data that we trust on the MB website. This could include information such as BPM, Key, and results from some of our high-level models.

Improve music playback

On the detail page for recordings we currently have a simple YouTube player which tries to find a recording by doing text search. We want to improve the reliability and functionality of this player to include other playback services and take advantage of metadata that we already have in the MusicBrainz database.

Scikit-learn models

The future of machine learning is moving towards deep learning, and our current high-level infrastructure written in the custom Gaia project by MTG is preventing us from integrating improved machine learning algorithms to the data that we have. We would like to rewrite the training/evaluation process using scikit-learn, which is a well known Python library for general machine learning tasks. This will make it easier for us to take advantage of improvements in machine learning, and also make our environment more approachable to people outside the MusicBrainz community.

Dataset editor improvements

Part of the high-level/machine learning process involves making datasets that can be used to train models. We have a basic tool for building datasets, however it is difficult to use for making large datasets. We should look into ways of making this tool more useful for people who want to contribute datasets to AcousticBrainz.

Search

With the integration of the MusicBrainz database into AcousticBrainz, we will be able to let people search for metadata related to items which we know only exist in AcousticBrainz. We think that this is a good way for people to explore the data, and also for people to make new datasets (see above). We also want to provide a way that lets people search for feature data in the database (e.g. “all recordings in the key of Am, between 100 and 110BPM”).

API updates

As part of the 2018 MetaBrainz summit we decided to unify the structure of the APIs, including root path and versioning. We should make AcousticBrainz follow this common plan, while also supporting clients who still access the current API.

We should become more in-line with the MetaBrainz policy of API access, including user-agent reporting, rate limiting, and API key use.

Request specific data

Many services who use the API only need a very small bit of information from a specific recording, and so it’s often not efficient to return the entire low-level or high-level JSON document. It would be nice for clients to be able to request a specific field(s) for a recording. This ties in with the “Expose AcousticBrainz data on MusicBrainz” task above.

Everything else

Fix all our bugs and make AcousticBrainz an amazing open tool for MIR research.

Thanks for reading! If you have any ideas or requests for us to work on next please leave a comment here or on the forums.