Leaked email address incident: 2020-11-23

We’re saddened to write that we’ve let some of our users down by accidentally leaking their email addresses and birth dates via a bug in the web pages of musicbrainz.org. This caused some users to receive unwanted spam emails.

However, we would like to emphasize that no passwords, passwords hashes or any other bits of private user information other than email addresses and birth dates were leaked.

If you have never added or edited an annotation on MusicBrainz, then your email address and birth date were never leaked and you can ignore this — your data has not leaked.

What happened

About two weeks ago a MusicBrainz editor contacted us to say that their email address that was in use only at MusicBrainz had received spam. The user changed the email address to a very distinct email address in order to rule out a spammer guessing the updated email address. But it happened again, and the user received email to the unguessable email address. 

At this point we began an audit of the MusicBrainz server codebase in an attempt to find out where the leak was, patch it as soon as possible, and discover who was affected by it.

What we found

On 2019-04-26 we released a new version of the MusicBrainz server and in this version we added email addresses to the list of editor data we pass to our server to build MusicBrainz pages. The goal of this was to display them in admin-facing pages to, ironically, be able to fight spammers who were using MusicBrainz as a spamming tool. We also added the editor’s birth date, to be able to congratulate them on their birthday. Neither of these cases should have ever been a problem, since the private data should only be used on pages built and sent from our own server (where the data cannot be seen by anyone else), and any editor info sent to the users’ browser goes through a “sanitizing” process eliminating all this private information.

After some digging, we discovered that due to a bug we had overlooked in the code that stripped this data, the addresses and dates had started being sent to the browser whenever an entity page with an annotation was requested. The email address and birth date of the last person to have edited an annotation in MusicBrainz (any annotations, attached to any of our entities) was leaked on the page for the entities in question. This data was contained in a massive block of JSON data in the page source and was never shown on the web page for humans to see, which is why this issue went undetected for so long.

Who was affected

We looked at all editors who wrote any annotations that were displayed between the date the problematic code was released and the date the bug was fixed. This can mean either the annotation was written during this time period, or it was written before that but (being the latest version of the annotation for the entity) it was still displayed during this time period. This gave us a total of 17,644 editors whose data was at some point visible from the JSON block in at least one entity’s source code. We sadly do not have a way to know for sure how many of the affected were actually ever found and stored by spammers, since we attempt to block botnets as much as possible. As such, we simply have no way of knowing who was really affected by this leak — only who might have been.

What we’ve done

Once we detected the issue on November 22, we immediately put out a hotfix to all production (and beta) servers plugging the leak. The hotfix acted to sanitize the editor data by removing email addresses and birth dates from the JSON. We also deployed two additional changes that should help prevent similar issues from occurring, by avoiding sending sensitive editor data to our template renderer altogether. See all changes from the git tag v-2020-11-22-hotfix.

We are planning to improve our testing infrastructure to detect exposure of editor data — this will become a routine part of our continuous integration process. We are also going to ensure that any pull request dealing with editor data goes through a strict testing checklist.

How did spammers get these email addresses?

You might be wondering how such an obscure leak in a web page can end up in spammers finding and using your email — you’re not alone. 

Our sites are under near constant traffic from seemingly random internet bots fetching thousands of our pages in a day, with no apparent goal. All of our metadata is available for download, so why would someone download pages from us at random?

Well, we now know — web pages can contain a whole host of random data that shouldn’t be there. Email addresses, birth dates and such are just the starting point — there have been websites that have leaked credit card numbers and even login passwords, possibly compromising the integrity of user accounts.

In this case it appears that a botnet kept downloading pages from musicbrainz.org and driving the load on our servers up. We’ve been trying to block botnets ever since they’ve come into existence, but this is a laborious task that is never complete.

It appears that spammers used the botnet to scour the internet for private data such as emails to then send out lovely spam emails to all compromised users.

Summary

We would like to wholeheartedly apologize for this data leak. We take data privacy seriously and we aim to have high standards about privacy and data security. We find ourselves frustrated by the endless data leaks that happen on the Internet on a seemingly continuous basis and work hard to avoid committing these mistakes in our domain. However, we’re also human and we do make mistakes periodically. As explained above, we’re working to improve our systems and processes in order to prevent this from happening again.

We hope that you accept our most sincere apologies for this leak.

Robert Kaye, Michael Wiencek, Nicolás Tamargo and Yvan Rivierre

Picard 2.5.2 released

Picard 2.5.2 is a maintenance release, fixing some bugs and providing minor improvements to the recent 2.5.1 release. Thanks a lot to everyone who gave feedback and reported issues.

The latest release is available for download on the Picard download page.

What’s new?

Bug

  • [PICARD-1948] – ScaleFactorRoundPolicy breaks text rendering on Linux
  • [PICARD-1991] – Case-only changes to file names are not applied on case insensitive file systems on Linux
  • [PICARD-1992] – Case-only changes to file names are not applied on FAT32 and exFAT file systems on Windows 10
  • [PICARD-2001] – Directory drag & drop from file browser to cluster area broken
  • [PICARD-2004] – Metadata changes loaded asynchronously by plugins are reset if file gets matched to track
  • [PICARD-2005] – Modified fields are sometimes not correctly marked as changed when multiple files are selected
  • [PICARD-2006] – “Local files” cover provider does not detect cover files for files already present at release loading time
  • [PICARD-2012] – Loaded files not shown in UI if release MBID is a redirect
  • [PICARD-2014] – Config upgrade from Picard < 1.3.0 to version 2.4 or later fails

Improvement

  • [PICARD-1828] – Allow assigning cover art to multiple selected files
  • [PICARD-1999] – Provide binary distributions for Windows and macOS on PyPI
  • [PICARD-2007] – Disable analyze / audio fingerprinting for MIDI files

The complete list of changes of this and previous releases is available in the changelog. You can also discuss new features or usage on our forums.

MusicBrainz Server update, 2020-11-02

Right after Halloween, this new release of MusicBrainz Server tricks some bugs and treats some improvements, plus some work on the usually terrifying React conversion and updates to handle external links.

A new release of MusicBrainz Docker is also available that matches this update of MusicBrainz Server. See the release notes for update instructions.

Thanks to chaban, darwinx0r, kellnerd, hibiscuskazaneko, jesus2099, lotheric, snartal, and tularion for having reported bugs and suggested improvements. Thanks to grafi_tt, mfmeulenbelt, salorock, and shepard for updating the translations. And thanks to all others who tested the beta version!

The git tag is v-2020-11-02.

Bug

  • [MBS-6666] – Artist credits not renamed from artist edit page unless the artist name is changed
  • [MBS-10281] – Improper encoding of ISE pages
  • [MBS-10829] – Indexed recording search fails to find recording with no length
  • [MBS-11160] – Internal server error pages display empty stack traces
  • [MBS-11161] – Internal server error page sometimes not returned when an error occurs
  • [MBS-11186] – Inconsistent username font-weight for edit owner
  • [MBS-11194] – TypeError: Cannot read property ‘linkTypeID’ of undefined (part 2)
  • [MBS-11204] – ISE: Validation failed for \’Int\’ with value undef

Improvement

  • [MBS-7219] – Only display “Show only standalone recordings instead” when there are standalone recordings to display
  • [MBS-11158] – Document URL link_type integers for release editor seeding
  • [MBS-11177] – Do not show useless “Description:” label in entity type doc boxes
  • [MBS-11185] – Add “is not” operator for relationship type in edit search
  • [MBS-11192] – Add voting-icon for Approved
  • [MBS-11197] – Add validation for Mainly Norfolk links
  • [MBS-11199] – Update 7digital.com URL cleanup

React Conversion Task

  • [MBS-11195] – Convert the artist credit renamer to React

Other Task

  • [MBS-11182] – Remove LyricWiki links from the sidebar
  • [MBS-11189] – Remove PureVolume links from sidebar
  • [MBS-11196] – Add saisaibatake.ame-zaiku.com to “other databases” for instruments
  • [MBS-11200] – Add works to VGMdb autocleanup