
This year it was New Delhi, India, that was invaded by data nerds from across the globe!
The MetaBrainz team was treated to the glorious chaos, hospitality, sights, noise, sweets, monkeys, traffic, heat, and delicious food of India. We reflected on the last year in MetaBrainz, planned and collaborated for the future, and got a little work done – when we could fit it in between mouthfuls of Indian sweets.
Read on for a comprehensive summit recap, including the annual recap for each MetaBrainz project, as well as breakout session notes, photos, and links to the slides and video recordings.
Firstly, a huge thank you to lucifer. lucifer was our host for the 2024 summit and planned and booked everything together with the MetaBrainz chief of mayhem, mayhem. Both of them kept the ball rolling, taking care of all the details, keeping the food coming throughout and fielding endless questions and texts from a gaggle of wide-eyed foreigners. A big thank you to everyone else who helped out, as well as those who made the effort to travel and attend. A special thank you also to the rest of the Indian team, who made us feel welcome and shared their favourite snacks. Thank you rdswift for getting up early and joining us via Zoom, and everyone else who popped in to watch. Thank you reosarevok and everyone else who helped with the excellent meeting notes that made this recap possible.
We had too many extra curricular (non-work related) adventures to share here… jet lag be damned, we plunged into the chaos of Chandni Chowk (Old Delhi) for a street food tour, explored the 300 year old astronomical equipment of Jantar Mantar, the light show of Qutub Minar, museums, street art, trains, bought Kurta (traditional Indian menswear), shopped, watched monkeys playfight on our balcony, dodged traffic, sweated our asses off, and emerged victorious from the shitshow that is the Taj Mahal.
India challenged us, and rewarded us. Many of the overseas team also extended their India stay, travelling North and South of Delhi after the summit. You will have to ask them for their photos and stories!
A summary of the topics covered follows. You can also see full event details on the wiki page and watch the recordings on YouTube: Day 1, Day 2, Day 3
State of MetaBrainz

presenter: mayhem (youtube | slides)
MetaBrainz had two main goals for the year: pay our people well, and increase supporter prices. mayhem is really happy with how the raises went, and is confident that everyone in the team is getting paid fairly now.
The team pay rises necessitated a corresponding increase in support prices. This has been more complicated – some supporters agreed to the rise, some decided to leave, and a few new supporters came in their stead. All in all, we’re more or less where we started before the price rise. Income is up 17% and donations are up 70%, which is very nice! Salaries are up 33%, which is partly because of the raises and partly because most people have been busy (and so have been billing for more hours).
The summit has been more expensive than expected but not by a huge amount. We are considering alternating between Barcelona and Delhi for future summits. We are reporting a loss of $50k this year, but this is mostly because of one-off expenses, without which we would be at a $20k profit for the year. We are not in a bad situation, but we should be looking to bring in more money for the future.
Donations have been falling (even though they’re up this year) and we are going to try to increase the amount of donations we get from individuals. This is important because the US requires non-profits to show they are receiving community support, as opposed to selling services. We are hoping to increase the donations coming from ListenBrainz, whose users don’t contribute as much back into the ecosystem when compared to MusicBrainz. Ideally we will be covering our full hosting costs with donations.
Our income is flat, which usually indicates a saturated market and we need to branch out. Luckily, we have been working on exactly that. One of these avenues that we are looking to explore is polishing our messaging for data users, including AI users (once AI trainers have had enough of being sued we expect them to come round to the idea of paying for data). For example:
– It’s not clear that LB is open source.
– It’s not clear we’re a data service provider.
– It’s not clear that we’re an ethical and enshittification-proof non-profit.
BookBrainz is still far away from driving revenue, but has potential. What if we spend a year doing our best to get BookBrainz to the point where it has lots of data, good replication, and everything else it needs to be able to work at a data service provider level.
State of MusicBrainz

presenter: yvanzo & reosarevok (youtube | slides)
MusicBrainz numbers keep going up, along with their rate of change. The numbers of entities, edit activity, etc, it’s all going up.
The Event Art Archive was released in June, and 2.6% of events now have artwork. We are also seeing more event additions since the EAA was released, a 57.5% increase.
MusicBrainz had two GSoC projects this year, both using Rust:
– Jade Ellis’ GSoC project will allow us to localize emails sent to users as well as improve email performance, bringing us out of the 1990’s.
– Ashutosh Aswal’s GSoC project will save URLs submitted as relationships or in edit notes to the Internet Archive’s Wayback Machine, making it easier to follow old links in edit notes.
The React conversion keeps making progress.
– The event editor has been converted to react, and work has started on the release editor. The artist credit editor was also significantly improved.
– The remaining templates to be converted are mostly edit forms, which are a mix of Perl/Template Toolkit and JavaScript/jQuery+Knockout.js.
A number of search upgrades, with the Solr cluster getting new hardware and an upgrade to Solr version 9. The mirror servers and search server indexer (sir) still need to be upgraded.
Weblate has been migrated, and is now being used by the MusicBrainz website and Picard for static data (countries, attributes, relationships, etc.) and user documentation.
– Wiki pages now document the internationalization for each project.
– It’s now a lot easier to find problems with translations, and many localization issues were found and addressed.
It has been a really good year for volunteers to musicbrainz-server (special shoutout to derat), as well as volunteer translators. And let’s not forget our beta testers, ticket reporters, and editors, voters and script authors.
This year saw a small schema change, adding genre collections.
2023-2024 in numbers (-1)
Since October 2023
- Releases
– 4,346,941: Releases in MusicBrainz, total
– 501,199: Releases added to MusicBrainz
– 1,405: Releases added to MusicBrainz every day, on average
– 55%+: Releases added with an importer - Genres
– 13,754,930: Genre tag upvotes and downvotes, total
– 79%: MusicBrainz tags that are in the genre whitelist
– 6.5%: MusicBrainz artists that have a genre
– 41%: Release groups that have a genre - Editors
– ~2,100: Weekly active MusicBrainz editors
– ~160: Weekly active MusicBrainz voters - Edits
– 11,911,577: Edits added to MusicBrainz
– 30%: Of MusicBrainz edits were entered by the top 25 editors
State of Community

presenter: aerozol (youtube | slides)
- We had 773 email reports (e.g. user reports) in the last year. A big thank you to reosarevok who handles all of these promptly and with care, patience and compassion.
- Pretty much all our social media and chat platforms are up year on year, activity on the forums and the blog remains static.
- Discord is by far our fastest growing channel. Since lucifer and atj bridged Discord to our other channels the rest of the MetaBrainz community is also starting to notice an uptick in activity from this generally young(er) userbase, that are not engaging elsewhere.
State of BookBrainz

presenter: monkey (youtube | slides)
The one-man team (!) that is monkey was apologetic about it being a slow year for BookBrainz updates, due to a focus on ListenBrainz. However, some progress was made, consolidating crucial features, improving infrastructure and refactoring old code. There were also some great community style discussions, with some welcome new faces.
Goals for this year
- …are the same as last year:
- The API needs to be deployed to production. This requires more testing first.
- Deploying kellnerd’s ongoing 2023 & 2024 GSoC project, which will allow imports from other open databases (OpenLibrary, Bookogs, Library of Congress) – the database importer project was actually started back in 2018!
- Integrating with the Internet Archive to display book covers. The Book Cover Archive? BCA? BAA? 🐑
The number of new users is steadily increasing, with August 2024 seeing a particularly large jump in new users. Correspondingly, the number of revisions (aka ‘edits’, for you MB folks) has also been increasing.
However, the top 10 editors list hasn’t changed much since last year. Thank you, editors!
New entities since last year
- 18,296 works
- 5,451 authors
- 4,801 editions
- 4,472 edition groups
- 1,158 publishers
- 1,066 series
Those numbers will explode when we run the database importer project! The only number that went down since last year is the number of merged pull requests, at 74.
Thank you to everyone who participated in BookBrainz in 2024, in any way, shape or form.
State of Infrastructure

presenter: zas & atj (youtube | slides)
Although MetaBrainz still has the same number of servers as it did in 2023, we have significantly better performance now, thanks to some swapping out of old machines for newer, shinier, and faster machines. MetaBrainz runs 49 servers; 29 physical servers, 14 virtual machines (VMs) and 6 external servers. Server costs remain very low, and we are very satisfied with Hetzner, who we have been working with for 9 years now.
Important changes
- A new Solr cluster (our search engine) has been deployed in Hetzner Cloud. These nodes are running on ARM hardware.
- We have a load balancer provided by Hetzner (VM).
- Our overall response time has been cut by ~5 to 10 times.
- Everything has become easier to maintain, now being deployed by Ansible.
New machines
- Although the number of servers is constant, our resource have increased, due to new (replaced) machines being much faster.
- Many nodes were replaced by newer and higher performance machines.
- In exchange, there was a slight increase in Hetzner prices.
New hardware
- The new Hetzner servers now provide faster CPUs and ECC RAM.
- They also come with better SSD drives (which are faster and more reliable).
- We are using ARM architecture where we can, which gives us cheaper servers due to a lower demand.
Ansible
- We’re closer to managing everything with Ansible.
- The setup for Ansible requires a lot of initial work , but once we get there maintenance is significantly improved.
- Writing Ansible roles and playbooks is not a trivial task, one that we lean on atj for.
- The next major goal is to manage Docker with Ansible.
Goals for next year
- Get rid of old nagios checks.
- Use ZFS more. Currently these are used by our database servers, providing better error detection and compression.
- Move to 10.10.10.0 virtual networks.
- Improve the Openresty and certificate deployment.
MetaBrainz bandwidth output from our Openresty instances (rex & rudi) is 10MB/s at peak hours. MetaBrainz serves almost 20 TeraBytes of data every month!
State of MusicBrainz Picard

presenter: outsidecontext & zas (youtube | slides)
Another busy year for MusicBrainz Picard, with two Picard releases (2.11 and 2.12), a excellent GSoC project, and a number of significant steps towards Picard v3.
The year’s progress, in brief
- A great GSoC project for cover art image processing, by twodoorcoupe, which allows users to resize images, convert image formats and can be extended by plugins.
- The PyQt v6 port has been completed and staged for Picard v3.
- Progress has been made on the new plugin system for Picard v3.
- 93 tickets were resolved.
- 20 GitHub contributors (including translators).
- Lots of code refactoring and improvements.
- Improved OAuth2 support, which will ease the transition to the new plugin system.
- The transition to Weblate has been well-accepted and much translation progress has been made.
Some things did not go so well…
- Platform-specific issues, Qt 6 issues, or issues related to specific setups consumed a lot of development time.
- outsidecontext has less time available for Picard development – partly due to getting distracted by shiny projects like Harmony.
- Development of the new plugin system has stalled.
Progress on Picard v3
- The port to PyQt v6 and general OAuth improvements (webflow, token revocation, PKCE) have been completed.
- The new plugin system, support for Apple silicon, and code cleanups are still in progress.
- Remaining on the to-do list are porting all the “essential plugins”, updating Snap and PPA packaging for Qt6, and implementing MetaBrainz OAuth.
State of ListenBrainz Apps

presenter: akshaaatt, jasje, pranav & theflash_ (youtube | slides)
The young(er) and cool(er) developers took some time to share the progress on MetaBrainz apps for those new portable “devices”. They say they are like computers, but for in your pocket. Treally!
- Updates this past year:
– Year in Music 2023 was an app feature
– GSoC project by pranav, revamping the dashboard, artist and user pages.
– Improved navigation.
– A listen service revamp, which didn’t go to plan but the team is working to improve and finish it. - User feedback:
– We are getting better for existing users,
– but the app needs to focus on the new user experience, where we struggle. The first impression is important, as are the reviews that new users leave. - The ListenBrainz Android app has around 700 active users and 4,000 downloads.
- We have submitted 1 million listens to ListenBrainz via the Android app this year (up from 63 thousand last year!)
- The app is still being tested, but has seen a lot of progress this year:
– GSoC project by theflash_, adding a dashboard section to the app as well as a feed revamp.
– Added support for adding reviews, pins and recommendations, user search, and Year in Music (which arrived a bit later than for the Android app).
– User navigation is in progress, but still has some issues to address. - The team had a bit of a breakthrough, and found a way to submit listens from Apple devices! This only works from Apple Music. Hopefully we can implement this as a feature soon.
- We have had 336 listens submitted to ListenBrainz via the iOS shortcut.
State of ListenBrainz

presenter: lucifer & mayhem (youtube | slides)
The year in numbers
- 959 million all-time listens
- 165 million listens submitted this year (so far)
- 31.6K all-time users
- 6.1K new users this year (so far)
- 409 pull requests since last summit
- 67 releases since last summit!
New features (some of them)
- Popularity statistics
- Entity pages, aka artist and album pages
- Apple Music playback (a GSoC 2023 project, finished and improved in 2024)
- The frontend is a single page app now, with seamless playback
- Much faster (but still too slow) Spark stats
- Apple and Soundcloud metadata caches
- Music Neighborhood
- Queues in BrainzPlayer
- ericd’s GSoC project, RSS feeds, which is now just waiting for oAuth to be completed
- And lots more…
A major event last year was the listens database accidentally being deleted – everything was restored from dumps, except for data submitted in the 18 hours before the incident. This shows that our dumps work fairly well, but we have since made some improvements that will mitigate the risks going forward.
We also had some issues with Spotify rate limits, which have been solved for now. As the user base grows this issue will probably be recurring, and something to keep an eye on.
We added dedicated ListenBrainz channels to ChatBrainz, which has led to a nice uptick in community interaction.
Roadmap
- We’re working on improving the first user experience, in particular the “cold start” problem.
- We’ve started working on user flairs and donation pages. LB is starting to consume more resources, which we are looking at offsetting with user donations.
- We have too many A-level priorities at the moment; we need to review our priorities.
- We need to make sure the world understands we’re open source and a data service provider (DSP). This means finishing the last datasets and then licensing it and letting the world know. Popularity datasets, for example, have been asked about in the past.
We expect to hit 1 billion ListenBrainz listens some time this year!
The following are notes from breakout/hack sessions. Note that some sessions did not necessitate notes, and some were not recorded.
MetaBrainz-wide OAuth

presenter: lucifer (youtube | previous doc)
OAuth, the huge project that was put into lucifer’s already-full hands last year. OAuth represents the MetaBrainz’ planned centralised login system. OAuth will allow us to do a lot of important things, like make signing up and logging in much easier and quicker, and serve user notifications to a user across all ‘Brainz’ sites.
- Picard/OOB – PKCE – Req/SHA-256 – Let’s enforce these from day 0
- Allow customizable port on the client side
- Compatible versions of Picard 2 & 3 should be released at the same time as MeB OAuth
- Refresh tokens will expire if unused, for a period of time yet to be decided
- We will revoke the existing token related to the same application after reauthorisation
- A lot of things will support OpenID Connect by default. If we can add support, later on, it will make our lives a lot easier
MusicBrainz website performance

presenter: zas (youtube)
The MusicBrainz website is way too slow compared with the resources we use for it (and a lot slower compared with the API only servers).
- Some pages being slow is normal, because they are huge, but for others we have no real explanation of why they take so long.
- Releases are by far the worst offender, because of the extra data they have to load.
- We should review the amount of database queries we’re doing to load each entity, in case we can speed it up.
- We could consider not loading things like all relationships and works and whatnot for every user, since casual users may be only interested in basic tracklists.
- While the React migration is ongoing, pages combining TT and React code might actually be slower than before, temporarily.
- Once React migration is completed, more lazy loading of huge lists can be done with React Table.
- We need more indicators about what levels of caching are actually used.
- We might be able to have a separate set of site servers for bots only where we send queries that are obviously from bots – since they make cache inefficient and this could let us prioritize real user queries.
- Hosting webservice and website containers on different nodes might help with identifying issues.
- We can consider moving to ARM VMs, which are both cheaper and faster than Intel ones, for the front-end only.
ListenBrainz Importers (last.fm, Spotify etc)
presenter: rob & lucifer (youtube)
New ListenBrainz users face some troubles getting started. This problem has two parts, users struggling to find how/where to get their data into ListenBrainz, and our importers not always doing their job.
- We get a lot of complaints about our importers (especially the last.fm importer) having problems.
- We could consider having an official JSON format that we guide third party devs to.
- We actually have a lot of useful things already (those importers, JSON-L options, etc) that we should finish and/or document, and then have a “How to get my listens into LB” page that gives the users all the different options of getting their stuff in (depending on their use case, players, history, etc.). This should probably be shown to all users when they have no listens yet.
- Aerozol commented that a new page is not the solution, it has to be part of the UX.
- A ticket was created: LB-1644: Create a “How to get my Listens into ListenBrainz” page.
- One of our users, shisma., created a flowchart (see the above ticket) that we can use as a basis for the UX.
ListenBrainz New User Experience

presenter: lucifer & rob (youtube)
This is an extension of the previous topic, but focussed on the issue of users not getting much interesting data upon signup – the cold start problem.
- New users who import their stuff don’t get any cool stats fast enough, which would help wow them.
- We should make a new Spark cluster, because the current one actually sits idle half the day, but it is occupied half of the day with all sorts of other tasks. We’re starting with one server.
- It would be great if we could prioritize new users who import a fair amount of data. Then we can show them recommendations as soon as possible – true cold starts will take longer to manage. For now let’s try to use the existing code and not create new code yet.
- It would be good for users to have a clear idea of when things are going to refresh/when new data is going to get them something cool to look at, because people keep asking about that and getting frustrated.
- Aerozol commented that he is unsure that the issue is that the stats aren’t turning up quickly enough – more that users don’t see anything and not given an idea of when they will. Rob mentions that this is something that aerozol could work on, and a ticket is born: LB-1645: Create a “when does my stats update” information routine
Genres across MetaBrainz
presenter: aerozol (youtube)
Since we can tag at almost every level in MusicBrainz – artist, release group, release, recordings, work – and there is no way to easily tag multiple levels at once, tags get split up and buried. This discussion asks the question: Is there a way to link all of these tag “levels”? For instance, so that artist tags can include genres from down at the recording level.
Picard goes through every level as fallback – for example, if it doesn’t find anything at the recording/release/RG level it will look for tags on the artist level. One concern was that ListenBrainz radio/playlists might not be getting good data as a result, but lucifer clarified that ListenBrainz also has similar fallbacks in place.
Some people mentioned that inferring, for example, artist level genres from recordings might be tricky. We might only want to do that if a certain threshold of the artist’s recordings are tagged with a specific genre.
Release-level tagging being applied on the release group level was also discussed. Should we do that in most cases, or only if the user specifically asks to apply it? Similarly, should the API return derived tags from lower levels when asking for tags from the artist? One thing that was clear is that if yes, it should apply to genre tags only. Some people were uncomfortable with the idea of “their” genre tags being applied to other releases in the release group, and tags being applied to “their” releases. Others responded that as long as the information is correct that is the nature of a collaborative database.
A good real-world example is Roon, which only takes genres from the MusicBrainz release level. A fair amount of other software seems to do the same, in which cases just having the tags at the recording level is useless. If we can use those at the release level, for example using a genre for the release if most of the release’s recordings include the tag, that might make our data more useful.
ListenBrainz might be a good way to get a lot more genre data, and to let users propagate it between levels (possibly also between sources!), but that requires a separate discussion about genre implementation in ListenBrainz.
MetaBrainz positioning
presenter: mayhem (youtube)
Nowadays we have quite a few MetaBrainz datasets that are open source and AI ready, and a neat datasets page to showcase them. But it’s really hard for an interested user to find, for example, that we have a similarity dataset.
Mostly, our website homepages don’t make it clear that we provide these services, including services nobody else provides. The websites are either too messy (MusicBrainz) or slightly too clean (ListenBrainz). We need these pages to specify that we’re open source and indicate what datasets we have, and so on.
We need to work out the fastest way to transform the current MusicBrainz homepage to something modern that feels more like the ListenBrainz homepage. That requires having an understanding of MusicBrainz, and moving the style code to a modern framework (Bootstrap or equivalent). Then we can use the page as a blueprint for the other projects. When a user moves between project sites it should be clear that they are related.
The homepage is really only useful for first-time users. In ListenBrainz, returning users (logged in users) get sent to a more useful page, their dashboard. Is this something that we could implement in MusicBrainz, where returning users are presented with data that is more useful to them, such as recent edits for artists relevant to them, and the like?
We all work really hard to avoid enshittification (we even have the guy who coined the term in our board!). Today there is a lot of talk about ethical companies, but we’ve been there all along. We barely had to change anything to deal with GDPR). We do our best to help everyone who contacts support, even in their own languages when we can. This is something that we should showcase and help the world find out about.
We have some specific target audiences for our websites and datasets: users don’t necessarily want to hear the same details as corporate supporters, even though everyone wants the same high-level thing (a service that “cares” and will not compromise your data for money).
We will try to identify clearly the message we want to send and a way to send it during the rest of the Delhi meeting, and then give it to aerozol to turn that into an actionable design.
aerozol note: we ran out of time to action this, but it has not been forgotten!
Translations

presenter: yvanzo (youtube)
We should work on making sure BookBrainz and ListenBrainz are translatable (and translated) if we want to attract as many users as possible. Similarly, MetaBrainz should be translatable, especially if we’re going to use it as the general login page for all projects. BookBrainz and ListenBrainz have localization as a medium-term goal, but MetaBrainz should have it fairly soon.
MusicBrainz uses gettext for historical compatibility reasons (that is, it would take a long time to move to something else) but MetaBrainz/BookBrainz/ListenBrainz should start with a protocol that is popular and common, and supported by Weblate. For instance, ICU MessageFormat or Fluent. If it works nicely we can try to use it in MusicBrainz as well.
Sometimes users don’t know that there are incomplete translations available, but we also don’t want to imply to give users a bad experience if they are expecting a fully translated site. We could decide to show all available languages on the sites (or all that are over a reasonably low threshold) but make it clear (with color-coded icons?) which ones are complete, fairly complete, or raw.
For apps (and of course websites) we should make sure the user is not forced to use a translation, even if it exists and matches their system language – it could be a bad translation, or they might just prefer English.
It’s important to build translation communities, and we should figure out how to get more translators involved. This could mean even having a badge for “I translated X lines of the site” for translators.
Importers (MusicBrainz)
presenter: atj (youtube)
This is a general discussion about MusicBrainz release importers, in light of how important they are to the MusicBrainz ecosystem.
Previously the most glaring issue with importers was how they dealt with giant lists of ‘release countries’. The new Harmony importer solves this issue by omitting the release country field altogether if the countries list is longer than 10.
Additionally, a way to indicate which countries a release was not issued in might be useful, but this would be a schema and an API change. Another suggestion was to have a “digital” release country that side-steps the issue of the streaming world having a different concept of release country than physical releases, but not all people agreed. Supporting very defined regional blocks is also currently being considered by the style lead.
Transitioning away from the release-country topic, data standardization among different providers was discussed. Importers should be careful about cleaning up the data too much (i.e. applying guess case), but where multiple providers exist and disagree on certain stylings (showing a lack of artist intent), it could make sense to enforce the MusicBrainz standards (e.g., usage of hyphens instead of parentheses for extra title information). It could also be useful if importers had the option to allow the user to run a regular expression across all titles to fix issues, before importing.
Client Side memory not being cleaned
akshaaatt has been looking at the memory profile from the chrome dev tools and has found that the memory keeps piling in the tab when navigating to different pages. The garbage collector is supposed to clean up the objects that do not have a reference anymore.
Old bugs in MusicBrainz

Participants: bitmap, kellnerd, MonkeyPython, reosarevok, yvanzo (oldest unresolved MBS tickets)
- [MBS-15]: Add some method to cancel editing – closed as duplicate (linked)
- [MBS-122]: Combine forgot username and forgot password pages/functions – commented
- [MBS-151]: Add last edited timestamp to the webservice – commented
- [MBS-4501]: Alternative tracklists – bitmap is working on it and is probably going to try to get it done before the release editor React conversion – in progress
- [MBS-6680]: Medium sections – we do want to do this, but we won’t start it until the release editor is converted to React because writing it in TT and then converting it would be too annoying – blocked
- [MBS-4635]: Allow replacing images – we agreed that we should look into whether it’s relatively simple to add a new edit type that does this, since the provided benefit seems clear – todo
- [MBS-8393]: Extend dynamic attributes – This is currently blocked by both the search update and the React conversion, but it should not be super hard after that so once those two are done – blocked
- [MBS-157]: Sorting tables – This is blocked by the React conversion and also by us running a horribly outdated version of react-table (so old it’s not even called that anymore) – blocked
- [MBS-8781, MBS-3993, MBS-1735, MBS-1 3768]: Medium barcodes, sub-formats and catalog numbers, and medium MBIDs – We are planning to eventually have a medium-centric schema change that implements all four of these. No ETA, but interest for sure – todo
- [MBS-603]: Support relationships on mediums. The only current use case is DJ-mixes, which can really be inferred from recording-level relationships. As such, it might not make sense to do the large effort that implementing this would require – possibly wont do
- [MBS-5449]: Per-Medium Front Cover Artwork – We agreed that it could be a good addition to the CAA to be delivered at the same time as a Schema Change. We added the two requirements as:
– [IMG-31]: With multiple-disc releases, it should be possible to associate artwork with specific discs
– [MBS-13768]: Add MBIDs to mediums
Solr

We have an Ansible Solr cluster running on production and beta. We’re currently missing it on mirrors, which means we have 3 Solr 7 clusters running to serve the mirror use.
Migration to docker-compose 2 is scheduled for November, so we can update the mirrors at the same time.
The Sir indexer needs to be updated to Python 3. The updated PR works, but we have performance issues – a full indexing of recordings runs out of RAM. Some simplifications might help resolve the issue. Sir updates are not needed for the Solr 9 release, but if we can do this in one go then mirrors will have to just update once rather than twice.
Getting rid of RabbitMQ and doing everything in Postgres should also simplify problems and improve performance and should ideally also be done at the same time to again avoid an extra mirror update.
We are considering recycling an old server into a dedicated Solr/PSQL testing machine that just runs indexing tests, to avoid hitting the production database.
We recorded a full day of search requests to throw at a test server in order to load test Solr, we probably should have a repository for this sort of thing.
Simplification discussion:
- Currently, the data stored in Solr is in the same XML format as written by the API. We could use a more efficient format.
- Some fields are queried by the indexer to be stored in Solr and returned in responses but are not searchable.
- We will be using dbmirror tables to know if there’s a change for an entity while doing incremental indexing, but still use the database directly for full indexing.
November goal:
- (yvanzo) Upgrade mirrors to Solr 9 with Docker Compose 2
Stretch goals:
- JSON schema verification (similar to RELAX NG for XML).
- Currently the MB website code performs postprocessing of the Solr response to add more information. This should be changed so that the Indexer submits all required information to Solr so that postprocessing is no longer required. Probably something to implement client-side using ReactTable.
- (bitmap) Move away from RabbitMQ, we’ll start with Postgres queue tables approach for now – there are two challenges: handling missed and duplicate updates.
Actionable tasks agreed on:
- (bitmap/zas) Log the number of JSON vs XML queries to the search server, either at the gateway level or in the MusicBrainz Server. Reminder: the queried format can be specified either through URL query parameter “fmt” or through HTTP header.
– Collect data for a relevant period of time and decide on either keeping XML output or dropping XML output so as to save efforts from reimplementing XML support while simplifying the data stored in Solr.
Rough approximation: 15% are XML, but we should dig deeper. - (lucifer) Try indexing without XML to make sure it does make a significant difference to the performance.
- (zas, atj) Recycle an old server to set up a containerized Postgres/RabbitMQ/Solr for testing indexing performances.
– (zas, atj) Make possible to use ZFS snapshots for reproducing tests faster
– (lucifer) Make use of it to test Python3 performance issues
Possible long term goals:
- Use Kafka and Debezium to replace RabbitMQ / Postgres queue tables, which should improve scalability and reliability. This approach will likely be used in LB in the short/medium term, so lessons learned from that can be applied to MB.
BookBrainz breakout

Participants: Monkey, MonkeyPython, kellnerd, ansh, outsidecontext, leftmostcat (remote)
Style guidelines
Currently there is no process to update guidelines. The process could look like:
- Open topic on forums
- Have some discussion, hopefully arrive at consensus
- Someone creates a PR to update the documentation, which eventually gets merged
This is currently missing a style lead – even if there is a consensus, someone needs to do a pull request (PR). Pbryan was previously creating a guideline, and did the formatting and opened a PR. That worked, but only because someone volunteered to do it all.
It is decided that MonkeyPython will take over the BookBrainz style lead role. Tasks include monitoring community discussion and formalizing any outcomes. Monkey will assist by doing the technical parts of the PR.
Next steps are for MonkeyPython to go over existing discussions and turn them into guidelines. Reosarevok will provide assistance when questions arise regarding how to handle certain cases, applying his experience with community guidelines.
Edition groups
MonkeyPython demonstrated different editions of Alice in Wonderland. This included two versions with visually similar covers, but one is English and one is a Norwegian translation. Other versions included a German ebook edition with a different cover.
This illustrated the question of “what is inside one edition group”:
- Different languages (separate works), but same general visual appearance (not necessarily the exact same format)
- Same language (same work), regardless of the general appearance
Ideas:
- Can an edition be in more than one group? This would probably be a big code change and needs to be considered thoroughly.
- Do we drop edition groups in favor of works? Edition groups have the advantage when it comes to books that contain more than one work.
- Do we use a single edition group for all editions in all languages of the main work.
– Counter example: Anthologies of SciFi stories, which are clearly different editions with the same content, need to be grouped - Do we create separate works for the text and the illustrations, linking all editions with the same textual contents and the same appearance together, respectively.
Leftmostcat mentioned that currently a single edition has a single edition group it belongs to. This concept is fuzzy in reality and does not really fit the model, meaning that we should be looking at idea 1. more closely.
Deciding on a guideline is important, but tricky to get correct and concise. It was decided to put this topic up for community discussion (with screenshots) on the forums and look into the “multiple edition groups” solution (1.)
Importing
The BookBrainz importer project should be presented to the community with a detailed blog post explaining the feature.
We still need to define guidelines for importing entities, which make it clear what the importer does automatically and what is the responsibility of the user when approving an imported entity.
It would be good to have edition groups well defined first before finishing the feature. We can do a import run with limited entities (5,000) in order to gather community feedback.
Prioritization of main missing features
- Move the identifier & alias editing to be part of the main editing flow (WIP)
- Edit table of content
- Review tickets in the “Minimal usable interface” fix version (needs to be split up into separate tasks)
- Revert revisions (WIP)
- Adding dates to relationships (WIP)
- Database replication packets
- Finishing imports
- Dockerization: Getting from the Docker development setup to a more automated setup for end users which just want to run a local clone of the database and website
Ban evading
This remains a big problem – for instance, a single user has made something like 60 accounts.
The following solutions were brainstormed:
- Flag users as beginners, and limit editing for that status
- Block beginners from editing specific entities if there is a sign of dodgy edits (note: they might add duplicate artists instead)
- Solicit more community support to help with ban evaders: for instance, vote on bans (empower community/auto editors)
- Allow a lower rate of edits (on specific entities seeing dodgy edits)
- Assign editors reported as “bad” into a mentorship program (this is done on other sites, to apparent good effect)
- Is it possible to block certain throwaway mail providers?
- Lock users out of editing after enough downvotes
- Identifying users based on IP is difficult
- Generate reports based on editing pattern (for example, editors adding several releases with “feat.” in track titles).
- Don’t make new releases visible immediately + make it clear that added releases are not yet visible (just for beginner users)
- …and access your own releases for tagging (major change, currently not feasible)
- Some AI solution (also not feasible right now)
- It was mentioned that we might be asking something that is impossible: Allowing users to register and immediately edit public data, and block users from editing with new accounts/evade bans
Celebrating users

This was a casual discussion while on a train, featuring: aerozol, zas, bitmap, atj
Wouldn’t it be cool to reward users and celebrate their success? For instance, a notification or a email is of minimal effort for a user reaching their first 100 edits, while a user reaching a threshold like 1 million edits is a big deal and rare enough that we could afford to make it special.
“Celebrating” a user could involve:
- Personal notifications (like the anniversary one)
- Site-wide notifications
- Emails
- Badges/‘gamification’
- Mail out physical merch or vouchers for the MeB store!
Mobile design goals
Attendees: akshaaatt, aerozol
aerozol and akshaaatt discussed what the priorities should be design-wise, to move the mobile app development forward:
- GSoC reviews
- List of things to do + mockups before go-live
- Revamp storefront
Docker
A technical discussion on all things Docker, which is used by all projects except for MusicBrainz Picard.
- Docker compose v1 has been EOL for some time, however LTS Linux distributions still provide packages for version 1 and will continue to for some time.
- The documentation needs to be updated, to tell users that they should be installing from the official Docker repositories.
- Docker compose version 2 offers lots of extra features that most projects aren’t making use of at the moment. For example, it can reduce the size and time required to create images.
- We should try to disable buildx on production servers if possible, but it is required on test servers (e.g. wolf).
- There is currently no automatic pruning of old images, which results in extra disk usage. We have monitoring of the number of images in Grafana with a warning alert, which then requires manual intervention.
- We would like to have some monitoring of security alerts in software that is included in our Docker images. There is an official Docker product called Scout which can automate scanning of Docker images and provide alerts if there are vulnerabilities etc. It seems to be a paid product so we would need to decide if the cost is worthwhile for the benefits it provides.
- Potential replacement of docker-server-configs with Ansible based solution. Best option would be to create a separate Ansible repository to manage this. Proposed to add one small project such as the Picard website to the repository. This would be a big change for everyone in the team who currently uses docker-server-configs, so writing good documentation etc. would be very important before it could be fully used in production.
- It was proposed to add Docker compose v2 files directly into the musicbrainz-server repository from scratch instead of converting musicbrainz-docker.
ListenBrainz breakout

Mapping
Currently we are only doing mappings using artists and recordings. We should also use releases, whether with one index or with two separate indexes.
We are using release and recording MBIDs for mapping if sent (not track MBIDs) but it seems some users are not seeing the right results for those (especially for live), so something is probably broken. We’re going to be reviewing the way we do this.
Alternate scripts break the mapping, because we don’t use MusicBrainz aliases. We need to make the mapping more scalable and performant, then work on that. Just using aliases is not trivial because it will break the way the current mapping works. Alternative tracklists might eventually help as well, for track/recording and release names.
Importers
Both Spotify and last.fm importers have been 80% ready for a long time. We will try not to start any new features until these are done and ready because it’s annoying people that really want to get all their historical data in. Once these are ready, we can look into further importers (Apple Music for example). This applies for both one-off and continuous (connected) importing, as well as smaller things like importing loved tracks and the like.
Genres
Are we legally allowed to suggest to the user genres included in the Spotify API? Probably, yes. As such, we could show the user “our external data sources suggest these genres might apply to this music” and let the user apply them (submit them to MB) or not.
This ties into discussions about flowing genres across entities in MusicBrainz as well – the UI would be the same to pull genres from other levels in ListenBrainz (e.g. from the artist to the release) and suggest them to the user to click (+) or (-) to add them for real real. Similarly, we could suggest genres from music we know is similar to the stuff the user is listening to.
Missing data
We could consider adding links to Harmony on the missing data page. It’s probably not too legally problematic and we can take the links down if someone complains.
Stats on demand
We should add a way to generate old stats – at least at first, it might be enough to have a way to tell the user how long that’s going to take.
We currently generate all stats from scratch daily (except that deletions happen every 15 days). Should we keep doing this? Should we have iterative generation instead?
Notifications

We have a project already built for sending emails, but we also have a series of requirements for our notifications. We should first consider what our requirements are, then see if the project can actually fulfil it. For example, it currently doesn’t support attachments, which we use for Year in Music emails, so we need to either change that or figure out a different way to do so.
Onsite notifications should be easier, so we can start with that.
Donations and flairs
We can look into showing donor status across MetaBrainz (such as in ListenBrainz and MusicBrainz). Would it be interesting to have MusicBrainz edits earn donor status with edits, as well?
It was raised that we shouldn’t push for donations from MusicBrainz users, because they already donate a lot of time. It was mentioned that because they are invested to that level, they might be happier to give us money without being pushed (e.g. don’t bug them, but give them the opportunity to contribute).
Donator flairs should probably not be shown in places like the edit notes, on sites like MusicBrainz, (to avoid the impression that donators have more say than any other user) but they can and possibly should be shown on user pages.
We could eventually give flairs (and badges?) for MusicBrainz editing – BookBrainz editors already get badges, for example.

you guys have certainly been busy! To another wonderful year with the best community project online! – sound.and.vision
Thanks all! The work you do matters and we are all glad you are doing it!
Thanks a lot for the detailed notes, this is very helpful to understand what’s going on with MB!