MetaBrainz Summit 2023

As always, the silliest photo is the best photo. Left to right: aerozol, zas, outsidecontext, mayhem, yvanzo, bitmap, monkey, kellnerd, akshaaatt, reosarevok, laptop: atj, lucifer

A year has flown by and once again the MetaBrainz team found itself in the MetaBrainz HQ in Barcelona, Spain, for #summit23. And once again we were munching on a mountain of international chocolates, hiking Mt Montserrat, bird-watching, groaning at terrible puns, testing out mayhem’s Bartendro cocktail robot (some of the team committing themselves too thoroughly to this testing), and of course discussing everything and anything MetaBrainz related. This year we had a longer summit, taking place over the week instead of the usual weekend, broken up into three days of presentations, followed by two days of hands-on ‘hacking’.

This means it’s time to strap in for a long post!

Thanks again to everyone who popped in and took part on the online stream, particularly rdswift, who got up in the early mornings to take part in every session, and lucifer who once again got ripped off by the totally arbitrary Spanish Visa system. aerozol ran the livestream, which had substantial issues on the first day (thanks Pratha-Fish for summarizing), but pulled itself together for the rest of the summit.

Thanks also to ApeKattQuest who made a long trip for a pre-summit BookBrainz session. We were sad to miss out on the company of atj, who tested positive for COVID-19 just before the arrival of his taxi to the airport. Thank you for arranging an amazing Airbnb for everyone, atj!

As always, a summary of the topics covered follows. Intrepid historians can see full event details on the wiki page, read the notes, look at the photos, and watch the recordings on YouTube: Day 1 (if you dare), Day 2Day 3


State of MetaBrainz

downloading data from mayhemBrainz

presenter: mayhem (youtube | slides)

The summit kicked off with a introduction and a welcome from mayhem (and more chocolate eating, there was always chocolate eating), followed by project summaries.

Financials

  • mayhem made a financial plan for MetaBrainz this year, and we raised prices (but raising prices is hard)
    • This offsets raises for the team, which was a 8% increase across the board. We have a $35,000 burn rate per month.
    • Hosting bills are low ($2,434 to Hetzner and $200 to Google)
      • Probably running at a loss of ~$20,000 this year, not bad
    • 12.3 months of money in the bank
    • We have lost alastairp but gained ansh, so we remain on an even keel
    • There is still uncertainty, and we are still avoiding spending
    • We signed Epidemic Sound, and Spotify is back

Long term goals

  • Grow our supporter base
  • Bring more users into MusicBrainz
  • BookBrainz becoming a fully fledged alternative to GoodReads, which is getting a lot of negative press. With ListenBrainz we have built a lot of in-house knowledge and we are feeling prepared for when BookBrainz takes off.
  • ListenBrainz is starting to show promise
    – We have identified two distinct audiences, a hardcore data-focussed audience for MusicBrainz, and a more music-focussed audience for ListenBrainz.
    – We are starting to see a return on these ‘metadata loops’, where users start with ListenBrainz and then become editors on MusicBrainz.

How we get there

  • Rassami and aerozol worked on putting together a publicity package, without great success. What used to work no longer works
    – Forget B2C (business to consumer), forget B2B (business to business), we are aiming for C2C (consumer to consumer)
    – A good example is #listenbrainzmonday, started by a community member, where we are getting the word out with image-based grids.
  • We are working on ‘ListenBrainz Local’, which brings ListenBrainz radio to Funkwhale and Navidrome
  • We have found that the industry sees no value in ‘Discovery’, that they have found it not to be monetizable. This leaves a big gap for us to fill.
  • MetaBrainz is starting to come into focus, with MessyBrainz changing into something else, CritiqueBrainz to be absorbed into the other sites, AcousticBrainz is defunct.
    – We are looking at three core pillars, MusicBrainz, ListenBrainz and BookBrainz, giving us ‘three pillars’ to focus on.
    – Picard is also enjoying a renaissance moment as some people move away from streaming.

What could go wrong?

  • Our main concern is the ‘hockey stick curve’ growth, where the organisation or project grows exponentially, costs increase, but income doesn’t come in to match.
    – For instance, if ListenBrainz suddenly needs 50 new servers – scaling the technology is easy but scaling people is hard. atj and zas can’t be scaled.
    – How do we preserve this amazing team if we grow
  • Runaway inflation
  • ListenBrainz can’t support itself, and doesn’t reach any paying users or customers (it is currently funded by MusicBrainz)
  • ListenBrainz becomes too successful and we are blocked by streaming services
  • We get sued by deep pockets

Thank you to everyone, where we are at is absolutely incredible.

State of MetaBrainz Infrastructure

zas (center) hard at work, or sleeping

presenter: zas (youtube | slides)

State of infrastructure

  • We have 29 physical servers (~14 virtual, ~10 external)
  • Important changes this year:
    – We are managing infrastructure with Ansible
    – New gateways that use Hetzner’s load balancer and have horizontal scalability
    – We have a new backup solution based on Ansible Borgmatic role
    – Upgrade on Consul which now runs on a virtual network
    – New servers that use ZFS
  • Why do we have new gateways?
    – Metabrainz projects are experiencing exponential growth
    – HTTPS/2 is becoming very common, and requires more resources than HTTP
    – kiki & herb have been replaced by faster rex & rudi
    – we now have horizontal scalability, rather than a single entry point
  • New hardware
    – AMD Ryzen 9 7950X3D based servers, these are very fast for a decent price, about four times faster than the previous professional-grade servers
    – pink & floyd are retiring, replaced by jimmy & hendrix
    – the main database performance has been increased by a minimal factor of 4 (e.g. a process that took 4 seconds now takes 1 second)
    – bigger SSDs provide far more disk space, something like 0.5tb to 2 tb
  • Ansible
    – Eases the deployment of new servers, we can deploy a new server in about 10 minutes
    – Eases the maintenance of existing services
    – Provides oversight of the whole infrastructure
    – Helps us to harmonise the production environment (things are faster!)
    – Ansible eases upgrades to new major versions of the base OS
    – The shell scripts we previously used are now obsolete, expect them to be removed from the syswiki soon
    – Tracking changes is easier, and tests help to maintain things
    – Would be good to get some more of the team knowledgeable about how it works, so we have some more redundancy. It isn’t too complicated.

Future

  • Generalize the use of ZFS
  • Complete the transition to 10.10.10.0 virtual network and slowly drop physical network interface on 10.2.2.0
  • Move docker-server-configs service startup scripts to Ansible
  • Upgrade SOLR and deploy it via Ansible, likely moving the cluster to physical servers
  • New improved Openresty setup, to consolidate LE certs related stuff (WIP)
  • Move to Ubuntu 22.04 (most servers are on 20.04)

Regardless of the path forward, what hardware or software we use, making sure the expertise to run services doesn’t all sit with one person is a key concern.

State of BookBrainz

monkey catches up on work when everyone else is finishing up for the day, with a little bit of help from a friend

presenter: monkey (youtube | slides)

The wonderful community members have been getting stuck in again this year, in particular with improving the editing style guidelines. monkey hasn’t had as much time to spend on BookBrainz as he would have liked, being spread among multiple MetaBrainz projects.

GSoC projects

Administration panel, by Shivam Awasthi. This is in beta, and is running well. Expect it to be deployed with the next version. Up until now all of the BookBrainz admin has sat on monkey’s shoulders, somehow! The admin panel provides a visual interface that allows trusted users to improve data types, and has useful presentation pages for relationship and identifier types.

Import databases, by kellnerd. This is the zombie revival of a unfinished 2018 project, with the goal of pre-importing public book databases (Open Library, Bookogs, etc). This has involved a lot fixing code from five years ago that doesn’t work anymore, but also discovering blockers in BookBrainz, leading to BookBrainz improvements, such as the addition of more types. Don’t panic – these imports are separate in the database and require manual validation.

Main goals

These are basically the same as last year, with one in four of last years to-do list crossed off (implementing author credits!)

  • Deploy the API in production
  • Import the Bookogs catalog (the ‘database import’ GSoC project is part of this)
  • Book cover art archive, in cooperation with the Internet Archive

Activity

Up and to the right! The number of new users steadily increases, but the impact of some of the new users on our community is much greater than the numbers could show. monkey points out a massive 2018 spike, which is from the 2018 Google Code-in, and mentions that we are still cleaning up bad data entered then. A bump in February 2023, if anybody has any idea what could have caused that?

Monthly contributions and number of edits are up and to the right as well, overwhelmingly from a small number of prolific editors, as usual, but that number of prolific editors has increased. A big spike in early 2023, largely due to one single contributor. Thank you indy133, pbryan, and all the other BookBrainz editors! We expect the database importer to further bump this rate up.

We are seeing a better spread of different entities being entered over time, for instance a lot of works added at the start of 2023, which points towards a very healthy database. A lot of different factors involved in how and why different things are being entered (there was some group discussion on this).

Pull requests are sitting around 100 over the last year. monkey would like to have done more, but with a recent increased focus on ListenBrainz the time is split more. ansh will hopefully help with this split.

State of MusicBrainz

reosarevok pondering awful puns to torture the other attendees with

presenter: reosarevok (youtube | slides)

Recap

No major code milestones in MusicBrainz this year, but things are chugging along nicely, and the team is staying on top of issues as they arise and making smaller adjustments.

  • React conversion
    – The number of lines of Template Toolkit dropped from 8114 to 6000!
    – The React relationship editor was released in February after four months of beta testing and about 100 tickets. The release editor might be the next big conversion job, where a lot of the MusicBrainz’ team core focus is on that for a number of months, but hopefully also with excellent results.
    – Remaining templates to be converted are mostly edit forms
    – Some templates will remain for writing emails
  • Schema change
    – The smallest schema change in a number of years, but a schema change with nothing half-done in it, for a change!
    – Editing/removing edit notes
    – Switched the default replication system to use dbmirror2 packets
    – Dropped more unused schema items
  • Spam
    – Fixed a vulnerability allowing a malicious actor to register a spam URL as a username and enter another person’s email
    – Removed 40,000 spam accounts
    – Improved admin interface to deal with spam (for instance, paginated results for some spammer queries, which we initially didn’t think would be needed…)
  • Better docs, tests, translations, and UI
    – The database schema documentation page has been thoroughly completed and improved
    – Work continued on improving test coverage and test documentation, which also uncovered some tests doing ‘weird’ stuff for the last ten-twenty years
    – Better flow type coverage with flow strict
    – Switched our translation system to Weblate
    – A few more translators already joined the effort, including for new languages
    – Professionally designed improvements to the UI

2022-2023 in numbers

“It’s nice to be importing, but it’s more importing to be nice”

  • A steady climb on our rate of change year on year
  • 3,845,742 releases in MusicBrainz
    – 2022: 3,401,467 / 2021: 3,009,931
  • 447,266 releases added this year
    – 2022: 391,536 / 2021: 346,116
  • 1225 releases added every single day on average
    – 2022: 1099 / 2021: 997.5
  • That’s over a month of audio documented every 24 hours

Importing takeover

  • 53% of all releases added since Oct. 2022 were added with some importer (compared to 47% in 2021-22):
    – Atisket: 97,229
    – Discogs: 73,552
    – Bandcamp: 57,817
    – Deezer: 5,570
    – iTunes: 2,590
  • Atisket (Spotify/Deezer/Apple Music) has grown significantly in popularity this year.

Genres (and moods?)

  • Slower manual entry rather than last year’s bot spree
  • 12,941,076 upvotes/downvotes for genres
    – 2022: 12,607,881 / 2021: 5,014,790
  • 45% of release groups have at least one genre
    -Down from 49% last year, so many new additions do not get genres
  • ListenBrainz genre support is likely to help, as well as ListenBrainz radio. Genres attached to recordings are relatively uncommon – the team discusses how ListenBrainz genre support may bump those up substantially.
  • Mood support should be ready to go once ListenBrainz is ready and we get a mood list

Top genres

  • Out of our almost 2,000 genres, most are very much part of the long tail
  • Top 10:
    – rock (1,422,339)
    – electronic (1,122,807)
    – pop (668,125)
    – jazz (357,365)
    – hip hop (288,809)
    – experimental (261,259)
    – ambient (254,774)
    – classical (248,896)
    – punk (237,347)
    – alternative rock (232,897)
    – metal (232,088)
    There is some talk about how this reflects the MusicBrainz nerd ‘type’, with rock genres being above pop, and R’n’B being absent from the top 10.
  • New genres are added pretty much every week, both by request and following what other genre lists are adding
  • Lots of love for our surprisingly active harsh noise wall community, closing the top 50 at over 6,000 uses
  • reosarevok invites users to keep submitting new genres and tickets, and as long as they have sources for them, they will probably be added (as long as it’s not soundtrack!)

Other milestones for the year

  • Over 100,000,000 (108) MBIDs
  • Over 40,000,000 relationships
  • Over 30,000,000 recordings
  • Over 3,000,000 unique ISRCs (thank you third party tools)
  • Over 1,000,000 disc IDs
  • Over 350,000 unique ISWCs
  • Over 250,000 labels
  • Over 70,000 events
  • Over 20,000 series

More editors, more edits

  • Around 1,800 active editors per week (up 200 year on year)
  • Around 133 active voters per week (slightly up from last year)
  • 11,004,054 edits last year (up 1.5 million year on year)
  • The top 25 editors entered 37% of edits
    – Only 19% of the add release edits, vs 42% of add relationship edits
  • Editors outside the top 100 most active entered another 37% of edits
    – Only 32% of the add relationship edits, vs 59% of add release edits
  • This confirms what we already know, that beginner editors are more likely to add releases than relationships. It also confirms that we should focus our efforts to simplify the interface/UX for new users on those parts of the editor.

State of ListenBrainz App

A rare shot of akshaaatt quietly listening, on the amazing airbnb rooftop terrace

presenter: akshaaatt (youtube | slides)

A big change with the app last year was a push towards simplicity. The MusicBrainz app was getting too complicated, and two distinct audiences were starting to emerge for MB vs LB. So work started on creating a new dedicated ListenBrainz app.

You listen to music?

  • Submit your listens to ListenBrainz
  • Share cool things with your friends and other social platforms
  • A single user interface for everything
  • Access your local music and submit that as well
  • Dark theme 🙂

The feed section, making ListenBrainz social

  • Made by jasje
  • Check out what others on ListenBrainz are listening to
  • Two new feed sections, ‘Follow listens’ and ‘Similar listens’, allowing you to follow the activity of your followed users as well as listens from users that are similar to you

The share feature and unique UX

  • Gesture based scrolls and transitions
  • Intent based connection to external social platforms
  • Focus on ease of sharing

Listens and settings

  • Real-time updates for listening now and listen history
  • SDK support for Spotify and YouTube
  • Allow users to customize what apps they want to submit listens from and app wide accessibility

Local music player and submitter

  • Submitting listens from other apps requires notifications, which isn’t available to all Android users. In these cases they can submit listens from app locally.
  • The local player is also known as the ‘BrainzPlayer’
  • The local player allows the user to be in app at all times, and continue listening to music and use the social features

Statistics

  • 700+ downloads, after under a year, and without any promotion outside the MetaBrainz blog (on purpose, until we are ‘ready’ to launch widely)
  • The ListenBrainz App codebase has attracted 6 new contributors to MetaBrainz
  • People have submitted about 63,000 listens from the app!

In summary, a huge thanks to the app team, akshaaatt, jasje and lucifer. And the testing team, mayhem, aerozol and monkey. The app is the best code base of any app that akshaaatt has worked with so far, and he feels that the team can build anything going forward. The next big step will be moving more things into the design system, to help streamline and centralise development even more. Another thing to note is that a new contributor, TheFlash, has picked up the development of an iOS app, so we will hopefully see that in future as well.

outsidecontext offers to help moving the app over to Weblate translations during a hack session this during the summit.

State of ListenBrainz

lucifer joined us remotely from India – probably not from the waterfront, but this looks nicer than a screen capture from Zoom so what the heck

presenter: lucifer (youtube | slides)

Where music meets data

  • 759,000,000 all-time listens
    – 145,000,000 listens this year (so far)
  • 23,700,00 all-time users
    – 4,300 users registered this year (so far)
  • 411 pull requests since the last summit
  • 67 releases since the last summit

GSoC 2022 projects
These had final touches pending at the last summit, but have now been completely integrated.

GSoC 2023 projects
These are all still in progress, but should be integrated soon.

New data dumps

Spotify metadata cache
Which allows us to export ListenBrainz playlists to Spotify. Previously we were not able to map MusicBrainz recordings to tracks in Spotify.

  • We now have a local metadata cache of the entire Spotify catalog
  • We use this cache to export ListenBrainz recommendations and playlists to Spotify
  • We are developing a Apple metadata cache, along similar lines

More new features

  • Manual MBID mapping
    – Automatic mapping doesn’t always work
    – Some users want precise mappings
    – This features now allows users to manually link their listens to preferred recordings
  • Listener stats
    – Each artist, recording and release has a total listen count and top listeners
    – Will be surfaced in the upcoming artist pages
  • Similarity datasets
    – Give it any recording, and it feeds back similar recordings or similar artists
    – Powering ListenBrainz recommendations
  • Popularity datasets
    – Most popular artists, most popular recordings of an artist
    – Uses a combination of MLHD data and ListenBrainz data!
  • ListenBrainz art creator
    – Create artwork/images that can be shared
    – Based on a users listens, within a selection of time ranges
    – Multiple image options
  • SoundCloud integration
    – You can now play music using SoundCloud, by enabling it in your settings
    – Paid account not required
  • Year in Music
    – Many improvements over the inaugural edition
    – Moar cover art
    – Easier sharing options, particularly on mobile
    – Older YIM reports (2021) still available
  • Stats improvements
    – Handling artist features better, no longer splitting out into their own entry
    – Top albums are now based on release groups not releases
    – Cover art displayed on stats page
  • Playlists
    – All recommendations now delivered in form of playlists
    – Daily Jams, Weekly Jams, Weekly Exploration
    – Export to JSPF and to Spotify
  • ListenBrainz Radio
    – In progress, but coming along nicely!
  • Misc
    – Listen table schema migration
    – Make ListenCard options consistent across website
    – Listen Submission API validation improvements
    – React 18, Typescript 4, Python 3.11 upgrades
    – Use Kombu as RabbitMQ client
    – Update MB Metadata Cache incrementally
    – Improve docker build performance
    – Remove artist and release msids

State of Community

We’ve already had a lot of pictures of reo, so here’s a nice one of bitmap. And a butt.

presenter: reosarevok (youtube | no slides)

Spam

  • Most spam, and other community issues, is in MusicBrainz (our largest project, and involves the most person-person interaction)
  • A positive change this year is that we have someone reporting editors often and consistently (thanks chaban!)
    – This means we catch most editors that actively add spam entities pretty quickly
  • We can now remove spam edit notes
  • We lack tools to see what users are up to across MetaBrainz projects. For instance if they are vandalising MusicBrainz they may also be vandalising BookBrainz. oAuth should help with this.
  • We have seen a rise in AI/LLM posts, edit notes, and reviews. We now have some guidelines in CritiqueBrainz to deal with this, which can be expanded to other projects. Those guidelines allow LLM to be used as a tool, but if it’s clearly the main author of a text it will be removed. This will probably have to be revisited in future as LLM develops and we see users use (and abuse) it in new ways.
  • We have significant amounts of spam coming from the same IPs. Initially we thought ‘we’re not going to need more than 50 rows for the same IP’. Yes we do.
    – Similarly some email providers should possibly be entirely blacklisted, or at least be automatically flagged for community managers to have a closer look.
  • We are in a situation where, without new tools, we are never going to be able to get on top of people creating spam accounts where they never actually enter edits. We’re not sure if this is really an issue, though it probably creates issues with Google indexing.

Beginner editor changes

This year we made an optimistic change where we allowed beginner users to vote on edits, which was immediately abused. This has now been changed so that beginners can comment but not vote.

Which brings us to another issue, which is that we don’t have many voters, so it can be difficult for users to get out of the beginner editor period. But it is clear that we cannot trust beginners not to abuse the system – usually with sock puppets to vote their own edits in or to push deletions through.

There is a PR open that would make it so that all beginner edits go through the voting queue, rather than any going into auto-edits. One comment that has come up is that this may clog up the voting queue, meaning that more, rather than less, spam gets through, so it is uncertain how we should proceed.

State of Picard

outsidecontext, explaining how he manages to update and maintain the worlds best tagger in his free time

presenter: outsidecontext (youtube | slides)

How is Picard doing

  • Still not obsolete!
  • Continuously developed with frequent commits, weekly and sometimes daily
  • Frequent releases, about 8-10 a year (usually 2 minor versions and several patch releases)
  • We don’t track downloads or uses, but:
    – Picard is the second most active category in the community forums
    – Frequent tickets
    – Ubuntu Snapstore: > 5,000 active devices
    – Microsoft Store: 3,250 user sessions, ~600 active users, ~500 downloads per month (note that this store installer is used much less that the portable installer – and this is another reminder that Windows tracks your data!)

What happened this year

  • Picard 2.9 released
  • Merged skelly37’s GSoC 2022 project: Single instance mode
  • Python 3.12 compatibility
  • Moved translations to Weblate
  • 100 tickets closed since last summit
  • Started port to Qt6 to prepare for Picard 3
  • 10 contributors on GitHub
    – but only 3 developers actively contributing

New release procedure

  • Building and releasing is automated with Github Actions:
    – Windows (installer, portable, Microsoft Store)
    – macOS (10.12+ and 10.14+)
    – PyPI (source + binary packages for Windows and macOS)
  • Automated code signing
  • Only a few manual steps:
    – Copying files to FTP
    – Updating Picard Website
    – Releasing to Microsoft Store
    – Triggering Ubuntu Snap Build
    – Updating Ubuntu PPA
    – Updating documentation (thank you rdswift)
  • We want to automate more
  • Very active packagers for Flatpak, Debian, Fedora and Arch Linux

Picard 3
Planning to work on Picard 3, which will contain breaking changes, but we aim to keep distributing Picard 2.

  • Upgrade to PyQt6
    – necessary to continue support for modern systems
    – but it will break compatibility with older systems (Windows 7, macOS < 11?)
  • Limit scope to allow for a timely release (ideally early next year)
    – PyQt6
    – New plugin system
    – General bug fixes and minor new features
  • Other breaking changes
    – Only if someone pledges to implement them

The future

  • Improved cover art handling (resizing images, minimum size, better dealing with local cover art, etc)
  • Setup wizard
  • Guided UI mode (aerozol and outsidecontext to meet about general UI/UX improvements during a hack session)
  • Performance improvements
  • Unified genre options
  • Unified data submission (AcoustID, ISRC, genres)
  • Use ListenBrainz API as an alternative tagging source?
  • …?

State of Community, Part 2

aerozol took most of the photos, luckily someone snapped this obvious candidate for a ‘caption this picture’ competition

presenter: aerozol (youtube | slides)

An addendum to reosarevok’s State of Community talk, mainly digging out some statistics.

Support
Approx numbers coming through the inbox since Feb 2 (8 months ago – when aerozol was added to the support inbox)

  • ~610 email reports
    – ~405 by chaban
  • Two this morning (on the day of the summit)
  • Every one handled quickly and patiently by reo!

Our channels

Further notes of interest

  • The YouTube video ‘What is MusicBrainz Picard’ is by far the most popular
    – presumably it has clicked with the YouTube algorithm so it comes up near the top of searches, it has some comments on it which may have helped
  • The blog
    – Top countries for visitors are United States (61), India (36), Malaysia (19)
    – Lots of people/contributors/staff authoring posts
    – Top referrers, in order, are MusicBrainz, GSoC, search engines, Twitter, the Picard website
  • Discord
    – Great vibes
    – Thanks afro for boosting the server

BookBrainz mini summit

reosarevok, monkey and ApeKattQuest getting BookBrainz’ plans juuuuuuuuust right

The BookBrainz team got together pre-summit, while ApeKattQuest (MetaBrainz’ instrument lead and reader of books) was in Barcelona, to discuss some of the more complex BookBrainz topics. First, progress made over the past year was acknowledged, mainly the new admin system, types editing, Wikipedia extracts, and author credits, and then the team got stuck into future plans

Roadmap

  • Focus on “basic” features like relationship attributes
    – Date attributes
  • Revision reverting
    – Start with basic reverting (for one level of history)
  • Notifications
    – Notes which have been left on your own revisions
    – Revisions for entities in one of your collections (“subscriptions”)
    – Use the yet-to-be-developed MetaBrainz-wide notification system
  • Book Cover Archive
    – Scans (or photos) of only “non-copyrighted” stuff (fair use), i.e. the full cover, title page, imprint page, table of contents
    – Coordinate with work being done for the Event Art Archive 
  • Revisions page
    – View should be simpler to read (cleaner diff with a sensible order of properties)
    – Overview should show more information (changes, revision notes)
  • Tabulated views
    – Too many relationships on author pages, split by entity type and maybe even relationship types (we should not mix writing credits with credits for other tasks like translations or illustrations)
    – Filtering/sorting by language (original language, user’s language preference, selected language)
  • Tags/genres/topics
    – Sit down with reo to define the SQL schema
    – Start with free-form tags *but* with a list of suggested genres to help with typos, spelling variations etc.
    – Separate genres from topics and other tags.
    – Perhaps prefix topic tags (i.e. “topic:aviation”) to make it clear that the tag is a topic and not just some random tag. This also allows for easier sorting, filtering, API results, etc. Maybe even do that for genres (i.e. “genre:science-fiction”)
    – Separate inputs for genres, topics and other tags, which automatically add the right prefix

Style

  • [no author]
    – Is there any use for it now that author credits are optional?
    – Probably remove [no author], since we can’t find a good use-case, i.e. entities which should not be credited to [unknown] or [traditional] instead
  • Circa dates
    – Probably doable simply with a boolean flag for “circa” and a date
    – Useful separately to have date ranges (i.e. work written between 1984 and 1989)
    – ISO 8601 might have provisions for approximative dates, but it might just be partial dates (i.e. 2000-10 for october 2000)
  • Attributes
    – Date attributes for relationships
    – Page attributes for “Work in Edition”
    – Sorting/ordered relationships
  • Table of contents
    – Have a multi-purpose component that allows users to define a page attribute for the works inside an edition (work-edition rel) but also add lines for chapters and such, allowing for page number + chapter title along with the works
    – It should definitely be possible to have chapters which are not linked to a work entity as we don’t want to have a separate work for each chapter (unless they were separately published)
  • Edition Groups and their Editions
    – What do we do with Edition Groups and how two different “books” relate to each other, i.e. a) original version without illustrations, b) original version with illustrations, and c) translated version of b) with the same illustrations?
    – It’s a gray area with multiple cases where two books could be considered part of the same EG. Is it the same publisher? Same language? Same content?
    – Ask the community, get some examples written down
    – We need to find good examples of Editions that should or should not be grouped together, e.g. Jurassic Park & Jurassic Park
  • Links between MusicBrainz and BookBrainz
    – For audiobooks you should be able to add “narrated by” Author-Edition relationships, and get more precise sound-related information directly pulled from MusicBrainz using a link identifier to display on the BB Edition page
    – On MusicBrainz, we should have an “includes booklet” relationship pointing to a BB Edition (and vice-versa)
    – MB Work to BB Work relationship for printed music scores

Reports

It would be useful to have reports of deletions and merges from new accounts.

Entity reports in general would be useful for:

  • Report for editions without author credits
  • Reports for entities with “impossible” relationships, i.e. ones that were possible in an earlier (breakier) datamodel
  • Editions with ISBN-13 but not barcodes and vice-versa

After the project summaries, smaller topics were discussed.

The chocolate table was a wonder to behold

Spam

presenter: yvanzo & resoarevok (youtube)

This sadly spam-less presentation kicked off with an overview of the types of spam we deal with in each project, and how we are currently dealing with it.

MusicBrainz
Estimate: 80% off topic spam | 10% ‘music’ SEO | 10% fake release vandalism 

To summarize, we currently have editors who manually report spam (mainly chaban), and then reosarevok spends an obscene amount of time dealing with it. This is covered in more detail in the State of Community summit topic, in this same post.

The current MusicBrainz spam-combatting arsenal consists of:

  • Admin UI, which gives some additional tools to identify suspicious behavior
  • SpamBrainz (under construction), where you can report any entity (instead of just users)
  • reosarevok and chaban

CritiqueBrainz
Estimate: 90% self reviews | 10% SEO

Not a lot of spam compared to MusicBrainz, but a high percentage compared to the amount of legitimate reviews (~16 reviews /week total, maybe half are ‘spam’). We have a lot of artists (often from Africa or the Middle-East) self-reviewing positively, in a obvious manner. We leave these if they are legitimate reviews, because we don’t have rules against self reviewing. Often they are removed because they are AI written or have been copied from another site, or include SEO links. CritiqueBrainz doesn’t notify users when their reviews are hidden, which is probably positive, because we never have these users return.

ListenBrainz
Estimate: 100% ‘fake’ plays | 0% other spam

In ListenBrainz we are talking about spam in terms of automated listen submissions, where users ‘pretend’ to play music in large volumes. Presumably to boost stats. This doesn’t seem like a huge problem right now, but for the future who knows. We have a tool to identify users with a very similar listen history, to help with bot identification. In practice this tool is less helpful because a lot of users have, possibly accidentally, set up two accounts to set listens to, which dominate the list. reo has mailed a few of them and they were happy for him to remove/delete accounts they’d made accidentally. There was a bit of discussion about when we would consider people playing their own music on loop illegitimate, since they could be really playing the music 24-7 (unlikely as it seems). We can apply our own judgment here, if we want to make it our problem, using the ‘do not game our services’ clause in the Terms of Service > Code of Conduct.

The future

There was discussion throughout, and afterwards, about how we can improve how we deal with spam. I’ve helpfully collected and summarized these points for you below, dear human and robot readers.

The new MetaBrainz-wide OAuth/profile system will open a lot of doors for combating spam across projects. Currently it is a situation of reo, and a few others, ad-hoc deciding how to do things as they come up, rather than using a centralized set of tools and guidelines. Currently one person cannot easily get an overview (e.g. the support inbox only contains MusicBrainz reports). If CritiqueBrainz becomes more integrated into MusicBrainz and ListenBrainz we may see cross-project spam become more prevalent – we see very little of it at the moment. Another tool that would be useful is the ability to deal with multiple spammers/groups of entities at once, and automatically delete edits and forum accounts.

Related to that, there was praise for the new BookBrainz admin panel, where you can leave notes regarding reasoning and courses of action. We should look at following that model going forward, because not having notes available can make moderation difficult.

Another flow-on positive from the OAuth system will be the ability to, potentially, moderate internal messages. Because users send emails to each other via private inboxes at the moment, we can’t monitor these. For instance, with an internal system we could track a user sending the same message to 100 other users, or a very high volume of messages, and flag them.

A lot of other potential auto-flags were discussed, as well as aerozol being tasked with finding methods that other organizations and services use and collating them into a ticket. It was noted that different services can get different patterns and types of spam, so these may need tweaking on a project level. 

Monkey also suggests that closing all the projects would be an effective anti-spam measure.

Security audits

presenter: yvanzo (youtube)

OAuth migration

presenter: mayhem & lucifer (youtube)

Note: OAuth is a jargon term, which describes a very cool and much-needed feature. In short, it is a centralized MetaBrainz login/profile that a user can use to access all the MetaBrainz projects. This will make signing up and logging in a much smoother experience, and will also enable further developments, like a notification system.

The OAuth migration is coming along, with lucifer having completed most of the technical work – hack sessions had already taken place at the summit, with mainly UX issues flagged for lucifer to work on. Once the UX is sorted, the next step will be one member of each project that uses OAuth to work with lucifer and exchange information regarding the changes that will be necessary to authenticate the servers against the new system.

One question that was raised earlier was if the endpoint for OAuth validation would be public, and whether it would be possible to brute-force access tokens. lucifer clarified that it should be fine, because it will be rate-limited by default. If anyone can think of other possible security issues please let the team know.

Work has been done on splitting MusicBrainz tables into data that MetaBrainz should know, and data that is only needed in MusicBrainz – for instance if a user has entered their date of birth into MusicBrainz, the team has decided not to track that on MetaBrainz. So work is being done on what needs to be deprecated and what needs to be kept in MusicBrainz, and this will have to be addressed for each project. There was some discussion around this.

On migration day, services will need to migrate tables and integrate their OAuth setup. This will necessitate concurrent downtime for all projects.

There are two key parts related to the migration. One is our user data, the other is external app data. Our user data can be seamlessly migrated during our downtime. But for the external apps we need to coordinate with the app owners and inform them that after the update they will need to point to the new MetaBrainz urls for authentication. An upside is that we should be able to copy all of the existing keys to MetaBrainz, so the only work involved for third party applications is changing the url in their codebase.

lucifer will write a script to email all our users, once we know when and what the exact plan is for the migration. We’re not sure of the transition phase window yet – probably not too long, as those who don’t transfer within three months will be unlikely to transfer within the next ten years (in MetaBrainz’ experience). During this transition time we will fallback to authenticating apps through MusicBrainz, if they haven’t been updated to use MetaBrainz yet.

More OAuth discussions will take place in smaller groups during the summit hack sessions, and ongoing.

Service alerts

presenter: yvanzo (youtube)

Internationalization (weblate)

yvanzo, adding fuel to the conspiracy theory that we do sometimes get work done at summits

presenter: yvanzo (youtube)

yvanzo presented our progress and plans for MetaBrainz Weblate – our new translation/internationalization (i18n) tool!

Status

  • Translations are frozen in Transifex
  • Translations have been moved to Weblate
    – It is hosted by the Weblate team, so we support them financially, but have less maintenance overhead
    – It is responsive and has single sign-on (SSO)
    – Translators can comment source messages and translations
    – Developers can provide basic instructions for each project with Markdown
    – Integration in our git-based development workflow
  • MetaBrainz projects
    – MusicBrainz (database + server) and MusicBrainz Picard (app + website + user guide) are using it already
    – CritiqueBrainz and MetaBrainz.org website are in the process of adopting it
    – Other projects have no support for i18n: BookBrainz, ListenBrainz… (BB will look into using Fluent to implement i18n support)
  • Complementary communication tools are needed
    – A bevy of Wiki pages
    – Community forums, see the new internationalization category
    – A multi-lingual add-on has been added to the forums but is not configured yet
    – New Internationalization ticket label

Roadmap

  • Clean up glossaries in Weblate (See ticket OTHER-419)
  • Update language-specific pages in the wiki (list)
  • Contact Transifex editors
  • Configure the Multilingual add-on on Discourse
  • Make use of translators’ comments
  • Create a walkthrough
  • Publish a blog post welcoming new translators
  • Publish a final announcement on Transifex
  • Allow discovering our Weblate instance from the main hosted Weblate instance

User survey 2017 preview

presenter: aerozol (youtube | slides)

On the long flight(s) to Spain aerozol took some time to dig into Leo Verto’s incomplete 2017 MusicBrainz User Survey. There’s a lot to dig through and a lot of data to untangle, so the results are far from finished. aerozol has put together a preview of the progress so far. This is just a brief summary of results – it’s recommended to check out the slides, linked above, to view the graphs themselves.

What is it?

  • Survey run in 2017
  • For MusicBrainz and MusicBrainz Picard users (sometimes the answers are vague as to which project is being referred to)
  • Started by Leo Verto
  • ~1,200 responses
  • Sanitizing the data sucked (some free-text answers were manually rewritten to be able to group replies together, to be able to summarize results – an inexact science)
  • These slides are a quick scratch of the surface!
  • Incomplete + possibly inaccurate (this was mainly done on a plane, and not yet double checked)

Results: Using MusicBrainz

How long have you been using MusicBrainz (bar graph)

  • About half of our users drop off after a year
  • If you’re still around after five years, you’ll likely still be around in twelve!

Why do you use MusicBrainz (top 5 answers)

  • To keep track of my music collection (812)
  • To add data for specific artists (may include yourself) (655)
  • I just love data (636)
  • To help others correctly tag their music (618)
  • To correctly tag my music (71)

What language UI do you use

  • English (1054)
  • German (72)
  • French (48)
  • Dutch (12)

What do you dislike about MusicBrainz (top 5 answers)

  • user interface (62)
  • Complexity (51)
  • Performance (50)
  • Nothing (49)
  • barrier to entry (35)

What MusicBrainz feature can’t you live without (top 5 answers)

  • Picard (91)
  • AcoustID (61)
  • Tagging (53)
  • Relationships (43)
  • Album art (40)

How happy are you with MusicBrainz in its current state (bar graph)

  • The majority are happy (4/5)
  • Most of the rest sit at 3 or 5
  • Very few at 1 or 2
  • This leads us to believe that although our user base is very vocal and critical, they clearly love MusicBrainz

Results: Getting started

How easy was it for you to get started adding data to MusicBrainz (bar graph)

  • Most results sitting at 3/5, tapering off to either end
  • Few at 1 and few at 5
  • There was musing about how some of the experienced editors (e.g. the ones likely to fill out surveys) might not remember their starting difficulties that well

How did you find out about MusicBrainz (top 5 answers)

  • Don’t know/don’t remember (346)
  • Using a tagger which relies on MB (326)
  • Last.fm (110)
  • Music player using MB for metadata (96)
  • Google/Search engine (80)

 What was the hardest part of getting started on MusicBrainz (top 5 answers)

  • Styleguide (125)
  • user interface (43)
  • Schema (41)
  • voting system (41)
  • Relationships (38)

Results: Contributing to MusicBrainz

Which genre/s do you mainly edit (top 5 answers)

  • Rock (621)
  • Pop (413)
  • Electronic/EDM (397)
  • Metal (295)
  • Soundtrack (252)

How have you contributed to MusicBrainz (top 5 answers)

  • Added data (971)
  • Edited existing data (836)
  • Voted on data (464)
  • Opened JIRA tickets (167)
  • Edited the Wiki/Documentation (69)

How familiar are you with… (bar graphs)

  • Events: Ouch!
  • Places: Ouch!
  • Relationships: Good
  • Series: OUCH!
  • Works: Okay
  • Labels: Good

Answers still to come/to be analyzed

  • What is one feature you would like to see in MB?
  • How often do you use the following data sources?
    – [Physical Releases] [Online music stores] [Other music databases] [First hand information]
  • Which other projects from the MetaBrainz family do you use?
    – [CritiqueBrainz] [ListenBrainz] [AcousticBrainz] [MusicBrainz Picard] [BookBrainz] [AcoustID]
  • Which programs do you use to tag your music?
  • How old are you?
  • What gender do you identify with?
  • Which country do you currently live in?
  • Which languages do you speak?
  • Are you a musician yourself?
  • Are you employed in the music industry?
  • Is there anything else you’d like to say or ask about MusicBrainz or this survey?

Stay tuned for the full results!

State of the music industry

Congratulations to mayhem for obliterating the world record for ‘most facepalms during a single presentation’

presenter: mayhem (youtube)

Pre-summit mayhem asked if anyone had any requests for topics, and one that came up was a ‘State of the music industry’ talk. mayhem obliged, based on his recent insights and discussions at various meetings and conferences.

NFT’s, micro-copyrights, endless middle-men. The music industry chasing any and every new fad, anything to avoid simply paying the artists more. Inaccessible and incomplete databases that make it hard to actually pay artists correctly. An industry uninterested in fixing these problems, because it isn’t in their interest to help artists.

In summary, “It’s bad”.

MusicBrainz already has some of the solutions. Could MetaBrainz one day engineer new solutions for hosting and distribution as well? Could we connect listeners and artists? Watch the presentation and decide for yourself.

OSS Donations

presenter: mayhem (youtube)

Over the last few years the concept of ‘1% for open source’ was discussed, which is a concept where organizations should donate 1% of their income to open source projects that they make use of.

Since then inflation came in, and mayhem went into ‘cash hoarding mode’, and decided it wasn’t financially prudent to do it at the time. mayhem isn’t entirely pleased with that situation, but also isn’t happy to commit to 4,000 euro annually quite yet – and suggests we start with a smaller amount, 1,000 euro, and then move up from there once the financials become more stable.

It was raised that we should aim to support smaller organizations that aren’t better funded than us. yvanzo has been tasked with gathering a list of organizations to support, to begin with, and get that back to mayhem. We will aim to start paying in January.

Code signing

presenter: outsidecontext (youtube)

Picard uses code signing for the macOS and Windows packages and, since recently, for the source code releases.

Code signing could be interesting for other MetaBrainz projects which provide release packages. Setting up code signing for the ListenBrainz desktop app has already been discussed with lucifer.

Knowledge about the code signing process, signing keys and credentials should be shared:

  • Information regarding the code signing process is documented, but currently shared only between few people (outsidecontext, mayhem, zas, lucifer).
  • Documentation is primarily updated by outsidecontext whenever changes are made to the signing process.
  • The code signing keys and credentials are currently managed by outsidecontext (with mayhem having the ability to access the keys if needed).
  • Normally such information is handled in MetaBrainz’ internal syswiki. As outsidecontext is a volunteer and not a contractor he has no  access to the syswiki for privacy reasons.

It was agreed to setup a dedicated wiki that is available to the core team and outsidecontext to hold the documentation and credentials.

In-house GSoC/code-in alternative

presenter: akshaaatt (youtube)

We discussed whether it would be worth planning an in-house program, similar to Google Summer of Code and Hacktoberfest. Currently we rely on GSoC to bring in a lot of new faces.

The main points of the discussion were:

  • Maintaining and managing such a program is a big job
  • We currently don’t have the bandwidth or ruleset in place to pull it off
  • GSoC and mentoring is a lot of work
  • Some programs can bring in low-quality or very short-term contributions
  • MetaBrainz is, generally speaking, not geared up for ‘outreachy’ programs

However, it is agreed that this is something we should revisit if GSoC is discontinued.


kellnerd, hard at work on code that I wouldn’t even try to understand

For the rest of the summit, with the ‘all-hands on board’ topics finished, attendees broke out into smaller hack sessions. For these please check the agenda for a list of sessions, and then you can find notes in the full summit notes document. Believe it or not, WordPress is already starting to creak under the length of this post. Typing and then waiting for the words to pop up is giving me flashbacks to the day 1 summit stream…

Until next year!

Let’s finish with a team photo that our parents can be proud of. Left to right: aerozol, zas, outsidecontext, mayhem, yvanzo, bitmap, monkey, kellnerd, akshaaatt, reosarevok, laptop: atj, lucifer

4 thoughts on “MetaBrainz Summit 2023”

  1. Excellent write up. I would have loved to have been there even virtually, but too many other commitments. But a big plaudit for the eternal core Picard team who continue to innovate and also be patient with clueless or semi-clueless people like me.

    Two minor comments:
    1. Please provide a short explanation of the projects’ scope for those of us who aren’t aware of where the scope had moved since we were aware.
    2. I would love to be involved in the Picard V3 UX hackathon. If someone could let me know the date & time I will see if I can make it.

  2. Thanks for the write-up! I’m very excited about the future of… all projects! How can we contribute to the chocolate table?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.