Hot on the heels of our release of the first 650,000 feature files as part of the first release of AcousticBrainz, we are presenting some initial findings based on this dataset.
We thank Emilia Gómez (@emiliagogu), an Associate Professor and Senior Researcher at the Music Technology Group at Universitat Pompeu Fabra for doing this analysis and sharing her results with us. All of these results are based on data automatically computed by our Essentia audio analysis system. Nothing was decided by people. Isn’t that cool?
The MTG recently started the AcousticBrainz http://acousticbrainz.org/ project, in collaboration with MusicBrainz. Data collection started on September 10th, 2014, and since then a total of 656,471 tracks (488,658 unique ones) have been described with essentia. I have been working for a while with audio descriptors and I followed the porting some of my algorithms to essentia, especially chroma features and key estimation. For that reason, I was curious to get a look this data. I present here some basic statistics. I computed them with the SPSS statistical software.
WHICH KIND OF MUSICAL GENRES DO WE HAVE IN THE COLLECTION?
In order to characterize this dataset, I first thought about genre. In essentia, there are four different genre models: trained on the data by Tzanetakis (2001), another one compiled at the MTG (Rosamerica), Dortmund and a database of Electronic music. Far from providing information on the kind of musical genres, these models seem to be contradictory! For example, in the Tzanetakis dataset “jazz” seems to be the most estimated genre, while the proportion of jazz excerpts is very small in the other models.
So in conclusion, we have a lot of jazz (according to the Tzanetakis dataset), electronic music (according to the Dortmund dataset), ambient (according to electronic dataset) and an equal distribution of all generes Rosamerica dataset (which does not include a category for electronic music)….Not very clarifying then! This is definitely something that we will be looking at in more depth.
WHAT ABOUT MOOD THEN?
For Mood characterization, 5 different binary models were trained and computed on the dataset. We observe that there is a larger proportion of non-acoustic music, non-aggressive, and electronic. It is nice to see that most of the music is not happy and not sad! From this and previous study, I would then conclude that there is a tendency in the AcousticBrazinz dataset for electronic music.
The amount of electronic music (compare with the acoustic graph above)
If we check for genre vs mood interactions, there are some interesting findings. We find that Classical is the most acoustic genre and rock is the least acoustic genre (due to its inclusion of electronic instruments):
HOW IS KEY ESTIMATION WORKING?
From a global statistical analysis, we observe that major and minor modes are both represented, and that the most frequent key is F minor / Ab Major or F# minor / A Major. This seems a little strange; A major and E major are very frequent keys in rock music. Maybe there are some issues with this data that need to be looked at.
IS THERE A LINK BETWEEN FEATURES AND GENRE?
I wanted to do some plots on acoustic features vs genres. For example, we observe a small loudness level for classical (cla) music and jazz (jaz), and a high one for dance (dan), hip hop (hip), pop, and rock (roc).
Finally, it is nice to see the relation between equal-tempered deviation and musical genre. This descriptor measures the deviation of spectral peaks with respect to equal-tempered tuning. It’s a very low-level feature but it seems to be related to genre. It is lower for classical music than for other musical genres.
We also observe that for electronic music, equal tempered deviation is higher than for non-electronic music/acoustic music. What does this mean? In simple terms, it seems that electronic music tends to ignore the rules of what it means to be “in tune” more than what we might term “more traditional” music.
IS THERE A LINK BETWEEN FEATURES AND YEAR?
I was curious to check for historical evolution in some acoustic features. Here are some nice plots on the evolution of number of pieces per year, and some of the most relevant acoustic features. We first observe that most of the pieces belong to the period from 1990’s to nowadays. This may be an artifact of the people who have submitted data to AcousticBrainz, and also of the data that we find in MusicBrainz. We hope that this distribution will spread out as we get more and more tracks.
There does not seem to be a large change of acoustic features as year changes. This is definitely something to look into further to see if any of the changes are statistically significant.
We have many more ideas of ways to look at this data, and hope that it will show us some interesting things that we may not have guessed from just listening to it. If you would like to see any other statistics, please let us know! You can download the whole dataset to perform your own analysis at http://acousticbrainz.org/download
What is AcousticBrainz?
The AcousticBrainz project aims to crowd source acoustic information for all of the music in the world and make it available to the public. The goal of AcousticBrainz is to provide music technology researchers and open source hackers with a massive database of information about music.
AcousticBrainz uses a state of the art research project called Essentia (http://essentia.upf.edu/), developed over the last 10 years at the Music Technology Group.
Data generated from processing audio files with Essentia is collected by the AcousticBrainz project and made available to the public under the CC0 license (public domain). In 6 weeks since its inception, AcousticBrainz contributors have already submitted data for 650,000 audio tracks using pre-release software.
Today we are releasing client programs to submit data to the AcousticBrainz server and our first public release containing audio features for over 650,000 audio files.
What data does it have?
AcousticBrainz contains information called audio features. This acoustic information describes the acoustic characteristics of music and includes low-level spectral information such as tempo, and additional high level descriptors for genres, moods, keys, scales and much more. These features are explained in more detail at http://acousticbrainz.org/sample-data
What can I do with it?
We hope that this database will spur the development of new music technology research and allow music hackers to create new and interesting recommendation and music discovery engines. Here are some ideas of things we would like to see:
Improving the state of the art in genre recognition
Analytics on the musical structure of popular music
This is one of the largest datasets of this kind available for research, and the only one of this size that we know of which contains both freely available data as well as the reference source code used to compute the data.
How can I contribute? If you are a music researcher, you can help us by contributing to the essentia project. Go to the essentia homepage to see how you can do this. If you do something cool with the data let us know. We’d like to start a “made with AcousticBrainz” page where we will showcase interesting projects.
If you have any audio files, we would love for you to contribute audio features to our project. You can do this by downloading our submission clients from http://acousticbrainz.org/download. We provide clients for Windows, Mac, and Linux.
We’re back with the schema change release, as promised! We only have a small collection of tickets, but several big things:
Pre-gap tracks and data tracks for CDs (where neither contribute to the discid, and pre-gap tracks have position of 0)
Collections can now be marked with types such as “owned” and “wishlist”, plus some special new types mentioned below.
CDStub data is now replicated.
All entities (except URLs) should now support tagging, as areas, instruments, and series were made taggable.
Events! And, additionally, event collections. All (non-deleted) users should have had an “Attending” and a “Maybe Attending” collection created, with the corresponding collection types.
Upgrade instructions will come in another blog post, though they should be substantially unchanged from past releases. Specifically, we’d like to confirm everything’s working correctly with a specific git commit, and make a new tag, before we post a recommendation, since there’s already been some problems discovered. Some slacker must not have tested this carefully enough (author whistles in an innocent-sounding fashion).
The git commit for this release (sans small fixes that have happened since release earlier today and any others that may need fixing) is v-2014-11-17-schema-change.
[MBS-7638] – CreateIndexes for instruments wrongly looks at label tables
I have released a new version of libmusicbrainz. The main changes in this release are the removal of ‘non-free’ XML parsing code, replacing it with libxml2. N.B. Due to the ABI change, the soname of this library has been bumped. Existing applications will need to be recompiled against the new version. The following are the main changes in this release:
Fix LMB-33 – Handle ‘ended’ element in ‘relation’
Fix LMB-34 – Remove non-free XML parser and replace with libxml2
Add support for cross-compilation and building out of tree
Sadly, our backup database server suffered a hardware failure and we ran out of time to get a replicated database setup after the hardware was fixed. This means that we won’t be able to put the site into read-only mode and will require us to take a full-downtime.
It sucks and we’re not happy about it either, but there is only so much we can accomplish with our limited resources. 😦
As mentioned when the new style process was announced, at a similar time to every server release post we’ll be publishing a list of what’s changed in style during that period.
The first period of two weeks (or three, in this case) under the process has passed, and these are all the style-related issues that have been accepted and implemented during it. Most of them are very small (mostly adding sites to the whitelist for the Other Databases relationship) although a couple are new relationships or relationships being extended to more entities.
No changes to the guidelines themselves have happened during this period.
[STYLE-211] – Allow new allmusic.com release links
[STYLE-250] – Add finnmusic.net to the other databases whitelist
This release was pushed back a week due to scheduling around the GSoC summit and upcoming schema change release, but here it finally is. Editors can take note that more edit types are now auto-edits: adding recording-work relationships, adding/editing aliases, setting track durations, and editing cover artwork. We’ve also added a “Make all edits votable” checkbox to allow edits that are normally always applied automatically (like capitalization changes) to be left open for voting if there’s any dispute or uncertainty. Auto-editors should be aware that this checkbox replaces the one that previously toggled their auto-editor privileges (so it should be left unchecked wherever it was previously left checked).
As part of this release, we’ve deployed some changes that should hopefully prevent slow /ws/2 searches from tying up too many perl processes on our frontends. This may reduce the number of 502s we’ve been seeing lately, but it’s a bit early to pronounce any results. Thanks go to kepstin for suggesting the nginx trickery used here.
More thanks go to chirlu, nikki, and ianmcorvidae for their hard work on today’s release.
The git tag is v-2014-11-03 and the full changelog is below.