AcousticBrainz at the 2018 MetaBrainz Summit

We had an in-person meeting at the MTG during the MetaBrainz summit to discuss the status and future of AcousticBrainz. We came up with a rough outline of things that we want to work on over the next year or so. This is a small list of tasks that we think will have a good impact on the image of AcousticBrainz and encourage people to use our data more.

State of AcousticBrainz

AcousticBrainz has a huge database of submissions (over 10 million now, thanks everyone!), but we are currently not using the wealth of data to our advantage. For the last year we’ve not had a core developer from MetaBrainz or MTG working on existing or new features in AcousticBrainz. However, we now have:

  • Param, who is including AcousticBrainz in his role with MetaBrainz
  • Rashi, who worked on AcousticBrainz for GSoC and is going to continue working with us
  • Philip, who is starting a PhD at MTG, focused on some of the algorithms/data going into AcousticBrainz
  • Alastair, who now has more time to put towards management of the project

Because of this, we’re glad to present an outline of our next tasks for AcousticBrainz:

Short-term

Some small tasks that are quick to finish and we can use to show off uses of the data in AcousticBrainz

Merge Philip’s similarity, including an API endpoint

Philip’s masters thesis project from last year uses PostgreSQL search to find acoustically similar recordings to a target recording. This uses the features in AcousticBrainz. We need to ensure that PostgreSQL can handle the scale of data that we have.

An extension of this work is to use the similarity to allow us to remove bad duplicate submissions (we can take all recordings with the same MBID and see if they are similar to each other, if one is not similar we can assume that it’s not actually the same as the other duplicates, and mark it as bad). We want to make these results available via an API too, so that others can check this information as well.

Merge Existing PRs

We have many great PRs from various people which Alastair didn’t merge over the last year. We’re going to spend some time getting these patches merged to show that we’re open to contributions!

Publish our Existing models

In research at MTG we’ve come up with a few more detailed genre models based on tag/genre data that we’ve collected from a number of sources. We believe that these models can be more useful that the current genre models that we have. The AcousticBrainz infrastructure supports adding new models easily, so we should spend some time integrating these. There are a few tasks that need to be done to make sure that these work

  • Ensure that high-level dumps will dump this new data (If we have an existing high-level dump we need to make a new one including the new data)
  • Ensure that we compute high-level data for all old submissions (we currently don’t have a system to go back and compute high-level data for old submissions with a new model, the high-level extractor has to be improved to support this)

Update/fix some pages

We have a number of issues reported about unclear text on some pages and grammar that we can improve. Especially important are

  • API description (we should remove the documentation from the main website and just have a link to the ReadTheDocs page)
  • Front page (Show off what we have in the project in more detail, instead of just a wall of text)
  • Data page (instead of just showing tables of data, try and work out a better way of presenting the information that we have)

Fix Picard plugin

When AB was down during our migration we were serving HTML from our API pages, which caused Picard to crash if the AB plugin was enabled while trying to get AB data. This should be an easy fix in the Picard plugin.

High Impact

These are tasks that we want to complete first, that we know will have a high impact on the quality of the data that we produce.

Frame-level data

We want to extract and store more detailed information about our recordings. This relies on working being done in MTG to develop a new extractor to allow us to get more detailed information. It will also give us other improvements to data that we have in AB that we know is bad. This data is much bigger than our current data when stored in JSON (hundreds of times larger), so we need to develop a more efficient way of storing submissions. This could involve storing the data in a well-known binary data exchange format. A bunch of subtasks for this project:

  • Finish the essentia extractor software
  • Decide on how to store items on the server (file format, store on disk instead of database)
  • Work out a way to deal with features from two versions of the extractor (do we keep accepting old data? What happens if someone requests data for a recording for which we have the old extractor data but not the new one?)
  • Upgrade clients to support this (Change to HTTPS, change to the new API URL structure, ensure that clients check before submission if they’re the latest version, work out how to compress data or perform a duplicate check before submission)
  • Deduplication (If we have much larger data files, don’t bother storing 200 copies for a single Beatles song if we find that we already have 5-10 submissions that are all the same)

MusicBrainz Metadata

Rashi’s GSoC project in 2018 helped us to replicate parts of the MusicBrainz database into AcousticBrainz. This allows us to do amazing things like keep up-to-date information about MBID redirects, and do search/browse/filtering of data based on relationships such as Artists just by making a simple database query. We want to merge this work and start using it.

Dumps

When we changed the database architecture of AcousticBrainz in 2015 we stopped making data dumps, making people rely on using the API to retrieve data. This is not scalable, and many people have asked for this data. We want to fix all of the outstanding issues that we’ve found in the current dumps system and start producing periodic dumps for people to download.

Build more models

In addition to the existing models that we’ve already built (see above, “Publish our Existing models”), we have been collecting a lot of metadata that we could use to make even more high-level models which we think will have a value in the community. Build these models and publicly release them, using our current machine learning framework.

Wishlist

These are tasks that we want to complete that will show off the data that we have in AcousticBrainz and allow us to do more things with the data, but should come after the high-impact tasks.

Expose AB data on MusicBrainz

As part of the process to cross-pollinate the brainz’s, we want to be able to show a small subset of AB data that we trust on the MB website. This could include information such as BPM, Key, and results from some of our high-level models.

Improve music playback

On the detail page for recordings we currently have a simple YouTube player which tries to find a recording by doing text search. We want to improve the reliability and functionality of this player to include other playback services and take advantage of metadata that we already have in the MusicBrainz database.

Scikit-learn models

The future of machine learning is moving towards deep learning, and our current high-level infrastructure written in the custom Gaia project by MTG is preventing us from integrating improved machine learning algorithms to the data that we have. We would like to rewrite the training/evaluation process using scikit-learn, which is a well known Python library for general machine learning tasks. This will make it easier for us to take advantage of improvements in machine learning, and also make our environment more approachable to people outside the MusicBrainz community.

Dataset editor improvements

Part of the high-level/machine learning process involves making datasets that can be used to train models. We have a basic tool for building datasets, however it is difficult to use for making large datasets. We should look into ways of making this tool more useful for people who want to contribute datasets to AcousticBrainz.

Search

With the integration of the MusicBrainz database into AcousticBrainz, we will be able to let people search for metadata related to items which we know only exist in AcousticBrainz. We think that this is a good way for people to explore the data, and also for people to make new datasets (see above). We also want to provide a way that lets people search for feature data in the database (e.g. “all recordings in the key of Am, between 100 and 110BPM”).

API updates

As part of the 2018 MetaBrainz summit we decided to unify the structure of the APIs, including root path and versioning. We should make AcousticBrainz follow this common plan, while also supporting clients who still access the current API.

We should become more in-line with the MetaBrainz policy of API access, including user-agent reporting, rate limiting, and API key use.

Request specific data

Many services who use the API only need a very small bit of information from a specific recording, and so it’s often not efficient to return the entire low-level or high-level JSON document. It would be nice for clients to be able to request a specific field(s) for a recording. This ties in with the “Expose AcousticBrainz data on MusicBrainz” task above.

Everything else

Fix all our bugs and make AcousticBrainz an amazing open tool for MIR research.


Thanks for reading! If you have any ideas or requests for us to work on next please leave a comment here or on the forums.

Delhi Mini-Summit 2018

Rob, Suyash, Param and I met in the bustling city of Delhi where “horns are applied very liberally” (it is a very noisy city!) for a mini summit. Some may even call it elaborate break-out sessions on ListenBrainz and CrtiqueBrainz. We had discussions over a span of two days over laptops and notebooks, riding on bumpy roads in tuk-tuks and over spicy chicken biryanis. Here is a summary of all that we discussed:

ListenBrainz
Data Visualizations
We started Day 1 with graphs for ListenBrainz. After a long marathon of heavy development weightlifting tasks by Param and Rob (how do we work with BigQuery correctlty?), we are finally at a stage, where we can have some really cool amazing visualizations out of our dataset. What will they be? Where will they be? How will we implement them? Can our community pitch in with requests and maybe even play around with code?

After scrounging through a lot of other websites which do music-y data visualizations, and the few responses on our user survey, we started listing various ideas, and went through ideas on our community forum. We ended up dividing the data visualizations (from now on, called graphs) into two categories:

User specific graphs: showcasing a user’s listening history and taste
Site-wide graphs: showcasing the overall listening patterns on ListenBrainz

We had to make some tricky calls based on technical constraints, but overall, for starters, we decided some cool user graphs. We have detailed 6 of them over the summit:

  1. Listening history of a user: how much have you listen-ed, what you have you listened too, listen counts, etc
  2. Your top artitsts
  3. Your tracklist (listen history)
  4. How much music did you explore
  5. Which artists are trending in what parts of the worlds
  6. Listener count across the world

All these graphs will be available over different time durations (last week, month, year) and will also have handles to manipulate them. They will also have tools to easily share them on social media networks. We think, our community will really enjoy tracking their listening history with these. We also discussed a few ideas of how we can create a sandbox so our community can pitch in with ideas, vote on ideas and send pull requests for new graphs. More on that later, as we get there!

Rating System
If you are listening to a tracklist while working over something, how possible it is that you will rate a track saying “This is 3.5? This is 4.2? That is 5 stars!” So you see, ratings on ListenBrainz are tricky. It is very dynamic and interactive in real time, unlike other dear *Brainz projects, so we think that a Last.fm-like rating i.e like and dislike makes sense for ListenBrainz. There was also some discussion about where the ratings should reside — is CritiqueBrainz the correct place?

Home Page
We worked on redesigning the “My Listens” page as well the home page. We now plan to include, apart from the graphs, an infographic explaining how ListenBrainz works and things you can do with it! I will further detail out the mockup later this week.

Potential Roadmap
After almost two days of discussions, we could chalk up a rough roadmap for ListenBrainz, which include data visualizations, ability to rate/like tracks, create collections, follow users, and more. This also includes encouraging cross brainz pollination!

CritiqueBrainz
With Suyash around (he worked on Critique Brainz as part of GSoC last year, and has been actively involved since), there were obviously a lot of discussions on reinvigorating the project. We discussed quite a few ideas, which included innovating ways of writing and sharing reviews, sharing it on social media, cross *brainz interactions, a few UI changes, etc. We’re considering allowing Quick Reviews that, like Twitter, are limited to 280 characters. What do you think? Suyash has written down his ideas for the same and would love some feedback from the community!

MessyBrainz
With all these talks, a critical need to build some matching and clustering infrastructure was highlighted. Rob has written a possible roadmap for the project trying to compose his thoughts!

And of course! We couldn’t let Rob’s first visit to India be all about work. After the sunset, we went exploring the city of Delhi. That included rides in tuk-tuks, spicy chicken biryanis, shopping for some colorful clothes and definetly, the Indian chaat 🙂

All in all, it was a very productive mini summit and definitely made us all, more excited to start working on the ideas we discussed. We will keep you updated and post more soon!

food-01.jpg
Some A lot of Indian food!
IMG_20180322_211308.jpg
The troope at India Gate
IMG_20180323_195125.jpg
Param is really into (a lot of) selfies.

MetaBrainz Summit 17

While the streets of Barcelona were filling up with the referendum conundrum, a bunch of people were spotted chattering and bantering, sometimes with pillows and colorful socks, searching for gelato.

Yes, that bunch of people would be us. 😀

Our annual MusicBrainz summit was held on September 30th–October 1st in the colorful, lively city of Barcelona. We had people (and chocolates) from nine countries: Spain, India, Germany, UK, USA, France, Estonia, Denmark, and Iceland.

Summit participants with *Brainz pillows
From left→right, top: Wieland (Mineo), Sambhav (samj1912), Sean (Leftmost Cat), Nicolás (reosarevok), Ben (LordSputnik), Jérôme (loujin), Alastair (alastairp); middle: Leo Verto, Freso, Michael (bitmap), Elizabeth (Quesito), Chhavi; bottom: Yvan (yvanzo), Rob (ruaok), Param (iliekcomputers), Suyash (ferbncode). Laurent (zas) behind the camera.


Having a majority of our team in a room with food obviously lead to lots of productive discussions. We talked about translations, recommendations engines, voting, and packaging. We also talked about SpamBrainz, user scripts being included as part of our projects, documentation, single sign-on for all Brainz, and a bunch of other things.

One of the nice things we could do this summit was to go over our user survey results. As you might remember, we had this banner on our site asking us to take part in a survey. The results gave us a good idea of our community in regards to what language they use, what Brainz project they use more, how do they come to know about us, and so on.

Summit session in progress
Summit session in progress.

We got to know what you like, but more importantly what you don’t. We heard all of you, and we are on it. We will publish a detailed report on that soon.

You cannot be in Barcelona with such a good lot of people and not end up exploring the city. The team ended up cycling on the streets of Barcelona (many times on the wrong side), climbing up to the mosaic-y Park Güell, snacking on pinchos and tapas, visiting the Pompeu Fabra University (where our AcousticBrainz project resides) and taking their daily after-lunch strolls through the Arc de Triomf.

Apart from that, some of the record-breaking points from the summit would be:

  • We had nice colorful pillows with all our kids (we mean, Brainz projects) printed on them. And summit t-shirts too.
  • The summit was live streamed on our YouTube channel, for all those who couldn’t make it. That went pretty well, with only minor technical difficulties, and it provided a good overview (literally! 🙂 ). For those who missed it (or want to rewatch it), the archived streams are available on YouTube.
  • We finally decided to improve the user experience of our projects (more on the blog about that later).
  • We worked on a new wonderful Sound Team recording while having a terrace barbeque hosted by Elizabeth.
  • More gelato was eaten than ever before. (That shouldn’t be surprising.)

We’re wrapping up the summit with this blog, but we have all the memories preserved. Find the amazing moments captured by our in-house photographer Zas in his Facebook gallery, and those moments in motion in my own video here:

Until next year,
Cheers 😀

Live streaming MB Summit 17

The MetaBrainz Summit 17 is slowly starting up, with everyone having arrived in Barcelona now, and people have already started discussing a bit in the corners of the MetaBrainz office. (As well as devouring a lot of chocolate!)

The summit officially starts tomorrow however (we’re aiming to begin at around 11 AM Barcelona time (CEST)), and while we’re having probably the most people at a summit ever, we recognise that a lot of people from the community are not able to be here for one reason or another, so we’re going to try something new tomorrow: live streaming the summit!

We’ll be live streaming on our YouTube account at https://www.youtube.com/channel/UClC89t81khDKLCVs45prLqg/live – there will be a live chat as well, which I will try to monitor as best as I can. Keep in mind that this is a first for us, so sorry in advance for the technical difficulties we will almost certainly encounter. 🙂

Recap of the MusicBrainz Summit 15

The MusicBrainz Summit 15 participants.
From left→right (top) chirlu, reosarevok, ruaok, Freso, (bottom) Leftmost, alastairp, Gentlecat, bitmap, zas, and LordSputnik. Special guest on the laptop monitor: caller#6.

A couple of weeks ago (Oct. 30th through Nov. 1st), the MusicBrainz Summit 15 took place in Barcelona, at Rob “ruaok”‘s place. We had all of the MetaBrainz employees there, Rob/ruaok (local), Michael/bitmap (US), Nicolás/reosarevok (Spain/Estonia), Roman/Gentlecat (recently local), Laurent/zas (France), and myself, Freso (Denmark) – in addition to a bunch of other people from the community: Sean/Leftmost (US) and Ben/LordSputnik (UK), the two lead developers of BookBrainz; chirlu (Germany), long-time volunteer developer on MusicBrainz; and Alastair/alastairp (local), lead of AcousticBrainz. Between us, we represented 7 countries, 8 nationalities, and 9 languages.

Talking around the table. We managed to cover a lot of ground on the serious topics, discussing how to avoid data/MBID loss and how to version data, how to deal with labels (the entities, not the corporations…) and other unresolved style issues, how to integrate all the various *Brainz projects more and better, and a bunch of other things. The official notes for the summit is stored in a public Google Docs document. Feel free to read through and it jot down your own comments!

One of the big things was the we decided again-again-again (for the third or fourth year in a row?) to release the translations of MusicBrainz.org. But this time we actually did it! So MusicBrainz.org is now available in German, Dutch, and French (in addition to English) – go check that out if you have not done so already. 😉 At some point in the not-too-distant future™ we will also enable translating all of our documentation. Sean/Leftmost volunteered to look into options for this. Expect to hear more on that later!

MusicBrainz Style BDFL: Nicolás/reosarevok
Our Style BDFL: Nicolás a.k.a. reosarevok

We had some talk about how and why MBIDs get lost and what we can do to prevent this. As part of this discussion, we decided to make more edits autoedits for everybody. This was partly due to a wish of having a shorter queue of open edits (and there’s been a significant drop in open edits since Nov. 16!), but also very much to avoid losing MBIDs once they have been generated. More in depth discussion of the reasoning (and some of the community’s response) can be seen in the server release blog post and its comments.

We talked about a few other things like genres, reviewing the work of the style BDFL and the community manager, the future direction of the MetaBrainz Foundation, and a couple of other topics. The summit notes should contain more information on what we talked about and decided on these points.


Obviously it was not all talk and talk and talk. There was also plenty(!!) of chocolate. yeeeargh helped us by getting a lot of Ritter Sport as he apparently lives right next to their factory, and sending it along with chirlu to Barcelona. Thank you, yeeeargh! Gelato! We also managed to take in a vast amount of gelato (Italian ice cream), as there was an amazing gelato place close by Rob’s apartment. And got to walk a bit around the city of Barcelona. And have various social hanging out that only most of the time was Meta-/MusicBrainz related… but not all of it. 😉 Our system administrator, Laurent/zas, also took a bunch of pictures capturing the summit. A few of them are shown here, but you can peruse them all in the slideshow at the bottom.

Finally, a big thank you to Google and Spotify for helping to fund this meeting. It would have been a lot harder to bring all these people together from around the world without their (continued, no less!) support. Here’s to 2016 and summit 16!

This slideshow requires JavaScript.

MusicBrainz Meetup: Chicago, IL, USA, 25-26 January 2014

In case the name didn’t tip you off, this is rather more casual of a get-together than our usual summits, but for those of you with the inclination, a free weekend and a decent way of getting to Chicago: we discovered that our fearless leader Rob was going to be in the same city as one of our developers (bitmap) and figured we’d fly me (the other developer) in too and make a thing of it! We’ll be hanging out in-person through January 25-26, plus probably part of the evening of January 24th, and we’d love to have you join us.

We don’t have much by way of details, at present, in part because this is quite informal. However, if you’re interested, we have a wiki page with arrival times for those of us with plane tickets already, and which we’ll update with any other plans we end up making. If you’d like to come, please add yourself!

MusicBrainz Summit #13

Over the weekend, 17 MusicBrainz fanatics got together at WikiMedia’s German headquarters to discuss the immediate future of MusicBrainz. And in short – we had a blast! A tremendous amount of topics were covered, and we feel this was one of the most productive summits we’ve had so far. From genres to acoustic properties, to internationalization, to artist & label artwork, an incredible amount was discussed.

MusicBrainz Summit #13 Atendees
From left to right: ijabz, CatCat, ianmcorvidae, reosarevok, ruaok, navap, santiissopasse, Anders Arpteg (Spotify), LordSputnik, ocharles, warp, fractalizator, Freso, Mineo, kepstin, JonnyJD

A summary of all topics covered and points discussed can be found on the wiki, with thanks to diligent note taking by everyone who attended. As you’ll see, a lot of topics are now actionable, so hopefully work will begin to move forward with these. While it remains unclear what the solution is to some topics, the constructive conversations around them is helping us slowly move forward in the right direction.

Of course, it wouldn’t be a MusicBrainz summit if it was work work work – there was plenty of mayhem and play too! On Saturday we had our summit meal at Max & Moritz – complete with a police escort due to some unfortunately timed protests. A novel twist for a group meal… yet oddly consistent with the fresh and unpredictable nature of MusicBrainz.

ruaok takes a hard earned break... on Freso!
ruaok takes a hard earned break… on Freso!

We also want to thank our sponsors who made the summit possible. Thank you to Spotify and Google’s Open Source Programs Office! Your support paid for some airfares, our lodging, summit meals and a large pile U-Bahn tickets. And of course, a big thanks also goes to Wikimedia Germany and in particular to Lydia Pintscher for baby-sitting us all weekend and also for making awesome introductions to other people to help with specific summit topics.

Thanks again to everyone who attended. Until next year!