The General Data Protection Regulation is a complex EU regulation that stipulates many points for protecting private data of users on the Internet. Even though this is an EU regulation, it has a worldwide impact due to the nature of the Internet. This regulation comes into effect today, May 25, 2018 and is the reason why so many companies have sent you mail in the past few weeks about updating their privacy policies.
The MetaBrainz Foundation with its collection of projects is also affected by this regulation. We’ve been learning and adapting our sites to be compliant with the regulation – sadly this regulation isn’t entirely black and white and there is an incredible amount of room left for interpretation of these rules.
The good news is that this regulation is roughly in line with our established practices: We’ve always held private information in a high regard and applied the sort of rules to ourselves as we wish to have our own private data treated. Luckily, this makes our compliance effort considerably easier. We’ve made two significant changes to how we treat your data and also adopted terminology as used in the GDPR in order to use the same languages that many other sites are now adopting. Please keep reading to find out the exact details of what we are doing to comply.
However, we do ask for your compassion and help in our process of complying with the GDPR. As we already mentioned, the GDPR is a complex set of rules that are not fully clarified yet. We’ve taken action on the steps that are clear to us and we’re following ongoing conversations on points that are in gray zones or unclear to us. We’ve made our best initial effort on compliance and promise to keep working on it as the picture becomes more clear. If you believe that we could improve our compliance, please contact us and let us know what we can do to improve. It would also help us if you could provide concrete discussion or examples to help us understand and take action on your suggestion.
Finally, below is the link to our GDPR compliance statement, implementing the regulations as we understand them and how they affect your data in our ecosystem. Where possible, we provide links for deeper understanding, links for you to examine our relevant code and links to tickets to follow the process of improving our compliance.
It’s been over a year since we last posted about AcousticBrainz, but a lot of work has been going on in the background. This post will give an overview about some of the things that we’ve achieved in the last year.
Our last blog post was neatly titled “What do 650,000 audio files look like, anyway?” Back then, we thought that this was a lot of submissions. Little did we know… I’m glad to report that we now have over 3.5 million submissions, of which almost 2 million are for unique MBIDs. This is a great contribution and we’d like to thank everyone who submitted data to us.
Dataset and model building
MusicBrainz coder Gentlecat returned to participate in Google Summer of Code last year and developed a new tool to let us create datasets and create new computational models. We’re really excited about how this can allow community members to help us increase the quality of the semantic information we provide in AcousticBrainz. We will make another blog post soon explaining how it works.
We presented an academic overview of AcousticBrainz (PDF) at the 16th International Society for Music Information Retrieval (ISMIR) conference in Malaga, Spain. The feedback from the academic community was very encouraging. Many people were interested in the data and wanted to know what they could do with it. We hope that there will be some new projects announced using the data at this year’s conference.
Integration with other data sources
MusicBrainz and AcousticBrainz don’t exist in a vacuum. One important thing that we need to make sure we do is interact with other researchers and products in the same field. To that end, we started AcousticBrainz Labs, a showcase of some of the experiments that we’re working on in AcousticBrainz. The first thing we have published is a mapping between AcousticBrainz and the Million Song Dataset, that we hope people will use to compare these two datasets.
Database upgrades and Data format changes
We’ve just upgraded to PostgreSQL 9.5 (from 9.3), which allows us to use the new jsonb datatype introduced in PostgreSQL 9.4. This change lets us store feature data more efficiently. We also made some changes to the database schema to let us start creating new data from datasets and computation models.
One result of this is that we are creating a new complete data dump, and stopping the old incremental dumps. We are also taking the opportunity to automate this incremental dump process, which is something that a number of people have asked for.
Another change is that the format of the high level JSON data is changing. This is to better reflect some of the complexities that exist in hosting such a large and varied dataset.
Contribute to AcousticBrainz development
We’re always interested in help from other people to contribute data, code, and ideas to AcousticBrainz. Once again, MetaBrainz is participating in Google’s Summer of Code, and AcousticBrainz is a possible project to work on. If you’re not a student you’re still welcome to work with us.