Record of the 7th MusicBrainz Summit

MusicBrainzSummit7 is over. It was fun, and it was a lot of work. Here is what we’ve worked out: From the whole range of patchy fix to complete rewrite, we have chosen both extremes: FeaturingArtistStyle will be patched very soon. Then work will start on MusicBrainz 2.0: a complete rewrite of the database schema, the … Continue reading “Record of the 7th MusicBrainz Summit”

MusicBrainzSummit7
is over. It was fun, and it was a lot of work. Here is what we’ve worked out:
From the whole range of patchy fix to complete rewrite, we have chosen both
extremes:

  1. FeaturingArtistStyle will be patched very soon.
  2. Then work will start on MusicBrainz 2.0: a complete rewrite of the
    database schema, the moderation system and the user interfaces. This will
    take a long time.
  3. In the meantime small improvements and fixes to old nuisances will be
    made every 1-2 months.

Continue reading “Record of the 7th MusicBrainz Summit”

Acoustic fingerprints: Is closed source OK?

Ever since my post about TRM hitting its limits, I’ve been in discussions with a reputable company who has offered to let the MusicBrainz community use their fingerprint server in exchange to a free license to the MusicBrainz live-data feed. While I am not ready to reveal who this company is, I do feel that … Continue reading “Acoustic fingerprints: Is closed source OK?”

Ever since my post about TRM hitting its limits, I’ve been in discussions with a reputable company who has offered to let the MusicBrainz community use their fingerprint server in exchange to a free license to the MusicBrainz live-data feed. While I am not ready to reveal who this company is, I do feel that I can trust these folks — this is not the first time we’ve chatted.

The straw-man deal that we’ve put together makes sense for MusicBrainz and this company. Unlike MusicBrainz’ relationship with Relatable, this relationship would be more balanced. Plus, we would not have to maintain the server ourselves. All around I feel good about this proposed deal. There is just one little snag.

They are uncomfortable with open sourcing their client.

While I am not an open source license Nazi, I have received tons of complaints about MusicBrainz using a technology that is not fully open. As a matter of fact, my most unpleasant dealings with the general public have been on this point (that the TRM server is closed source). And I’ve had unreasonable people shout unreasonable things at me over this point. Quite frankly I am not really interested in having to defend my position on this any further, but I fear that not having a working fingerprint solution may be more of a hassle than having to defend a closed source solution.

So, my question to you is this:

  1. Do you value having access to a fingerprint solution as part of MusicBrainz more than MusicBrainz being an end-to-end open solution?
  2. What arguments can we make for having this company open source their client, as Relatable did? I’ve argued the standard open source arguments and I think that there is still a small chance that we can persuade this company to open up. I need to construct a better argument and perhaps meet with them in person to hash this out further. What things should I argue?

Please keep your idealistic everything needs to be open arguments to yourself. I simply won’t bother reading them or responding to them. I really care to see if we can find a balance where we can maximize the value that MusicBrainz presents, even if it means compromising our values slightly. If you’re not ready to make a balanced argument, then please don’t.

NOTE: If we were to start using a closed source fingerprint solution, nothing else would change. None of the existing licenses for MusicBrainz would change. So, keep your pants on and stop frothing at the mouth.

Acoustic fingerprinting at MusicBrainz: The Future

My last post on current state of the TRM fingerprinting solution got quite a bit of response — I was quite amazed by it really. Personally, I think people still put too much emphasis on TRM and what role it plays within MusicBrainz, but without me providing a new tagging solution there aren’t any concrete … Continue reading “Acoustic fingerprinting at MusicBrainz: The Future”

My last post on current state of the TRM fingerprinting solution got quite a bit of response — I was quite amazed by it really. Personally, I think people still put too much emphasis on TRM and what role it plays within MusicBrainz, but without me providing a new tagging solution there aren’t any concrete points to discuss.

Given the feedback I’ve gotten, I’d like to state a reformulated vision with regards to acoustic fingerprinting and tagging here at MusicBrainz. The two points that have received the most feedback concern acoustic fingerprinting and downloading large index files in order to use the tagger.

Acoustic fingerprinting: Since so many people professed their love for TRM and acoustic fingerprinting in general, we will do the following things:

  1. Keep TRM alive.
  2. Work to create an open replacement for TRM. See the musicbrainz-devel mailing list for discussion on this topic and if you would like to help out. The founder of Tuneprint has recently volunteered to help build this new solution and I expect that his presence in this project should stir things up a bit.
  3. When #2 is operational, we will start a gradual migration to the new server. TRM is not going away tomorrow! Got it?

The obvious problem is if #2 does not come to fruition — if you care about TRM and acoustic fingerprinting here at MusicBrainz, you should go check out the discussion on the devel mailing list and lend your hand. If it doesn’t come about and the TRM server stops being useful, then we’ll eventually turn the TRM server off.

Picard & large indexes: The Picard tagger with Lucene support will progress as planned — the only change so far will be that I will provide one machine for use as a centralized lookup server that will not require you to download the massive text index. However, I expect that Picard with Lucene will be a popular tagging tool, and that the server will get overloaded and slow in the space of a few months. Given that, we’ll have complete indexes available for people to download.

I predict loads of people will opt to download the text index since a 250Mb download will be a lot faster than trying to tag their 10,000 file collection on an overloaded server that performs 10 lookups per minute for them.

Thanks for all the feedback!

UPDATE: PLEASE stop telling me how much the large index would cramp your style and how much the fingerprinting has saved you. I know!

General update: What's up with TRM??

This general update is way overdue — a lot of things have been happening behind the scenes and its time to let everyone know where things in the MusicBrainz world are headed. I’ll start off with TRM, since that is hot discussion topic on the musicbrainz-users mailing list right now. The TRM (TRM’s are acoustic … Continue reading “General update: What's up with TRM??”

This general update is way overdue — a lot of things have been happening behind the scenes and its time to let everyone know where things in the MusicBrainz world are headed. I’ll start off with TRM, since that is hot discussion topic on the musicbrainz-users mailing list right now.

The TRM (TRM’s are acoustic fingerprints that MusicBrainz uses to identify music tracks) server is constantly overloaded and can only handle a database size of about 2.2Gb before it crashes. To prevent crashes, we prune the database where we throw out the least used TRMs, which implicitly discards work that our users have done. Not good. In order to make the TRM server perform at some reasonable level of performance, the entire database needs to be kept in RAM. Thus our server has 5GB of RAM and it still can’t keep up. The fact that this problem hasn’t reared its ugly head to the public, is a testament to Dave Evans’ skill in keeping the TRM server ticking.

Furthermore, TRMs have shown themselves not to be as unique as we would’ve liked. For example, take a look at the TRM’s with at least 5 tracks report: 4400 pages (!) of TRMs that I would consider to be sub-optimal. One example TRM (non silence on page 2) has 104 tracks associated with one single TRM. Given this, TRM is not some sort of magical solution that with great authority tells the tagger what metadata to apply to a track. Instead, its best to think of TRM as a system that lets you guess which few dozen tracks a file could be matched to — there is a lot of logic in the tagger that makes up for the shortcomings of TRM.

Thus, TRM has two major problems: its not accurate enough and it doesn’t scale well to the size that MusicBrainz has grown to. The system still functions but I expect it to start breaking down and becoming of less use over time. We have the following options:

  1. Find a replacement for TRM: Relatable doesn’t seem to be in business anymore, or at least they are in deep hibernation. No other companies that I have approached were interested in sharing their technology with MusicBrainz. (For the record, I’ve tried with 3 companies, including a couple of on-site visits in Europe).
  2. Create our own TRM solution: This is an very large endeavour — at least a year if not two, of hard work. I’d rather work to improve MusicBrainz itself, rather than hacking on acoustic fingerprint software.
  3. Throw more resources at TRM: We’re still lacking the funds for more resources, and the same argument in #2 still applies.
  4. Do something else: Find some technology that can replace TRM.

Given my babbling about Lucene, I think its a foregone conclusion that #4 is the way to go. Sometime this fall, I will release a Picard tagger with a lucene text indexing engine to replace the current MusicBrainz Tagger. The benefits of this new tagger will be:

  1. It will distribute the load on the server, since currently a large chunk of the server load goes to supporting tagger users. And a large chunk of tagger users never really contribute data to MusicBrainz or make cash donations to support the project. So, moving that traffic off the main server will allow people who want to edit/vote on the data focus on their work.

    Given that most files in the wild nowadays have some metadata, a text index will work well. Lucene is great at taking crappy data input and coming up with something useful. If TRM gets us into the ballpark and then additional heuristics do the final leg work, Lucene will give us a much better guess to start with than TRM ever did. Thus, overall tagging quality will improve greatly.
  2. A lucene tagger will work much faster than the TRM based tagger ever was. 2-5 seconds per track was not unusual given TRM — with Lucene we’ll see 2-5 tracks per second, if not much faster.
  3. Since we will no longer have to decode files to identify them, it will be easier for us to support new formats. Its less work overall.

This approach also has the following downsides:

  1. It will no longer support identifying completely anonymous files. Files that have no id3 tags and are named test1.mp3, test2.mp3 will simply not stand a chance at identification. I realize that there is great romance associated with this concept, but in reality most people have files that have some metadata in them, and thus will stand a good chance of being identified.
  2. You will need to download a 250MB Lucene index to tag your collection. This is a pretty big hurdle, but if BitTorrent can routinely help people download 650Mb movies off the net, it should help us download distribute our search indexes. After the first release of a Lucene enabled Picard, we will investigate P2P searching methods that will allow people who have no index to use some other people’s indexes (if they allow that).

So, the roadmap for this looks like this:

  1. Release picard 0.5.0 in the next few weeks and start putting it on the main page as an alternative to the MB tagger.
  2. Release picard 0.6.0 with full Lucene support and offer that as the main tagging solution for MB.
  3. When the TRM usage drops because of adoption of Picard 0.6.0, we will start phasing out TRM.

There you have it — thats the current happenings on TRM and how we hope to solve the problems that it presents us with.

German mirror online!

After months of tinkering, with I’m pleased to announce that MusicBrainz now has a mirror in Germany. The de.musicbrainz.org mirror is graciously being sponsored by HousePool Media International Group — many thanks to Carsten Marmulla for working hard over a number of months to find a hardware and bandwidth to support this mirror. Our two … Continue reading “German mirror online!”

After months of tinkering, with I’m pleased to announce that MusicBrainz now has a mirror in Germany. The de.musicbrainz.org mirror is graciously being sponsored by HousePool Media International Group — many thanks to Carsten Marmulla for working hard over a number of months to find a hardware and bandwidth to support this mirror.

Our two mirrors (.de and .nl) are currently underutilized, but the upcoming release of Picard will have support for tagging of mirror servers. We’ll have to encourage users to use the mirrors for tagging, so that the main server can stay available for people wanting to make changes to the database or vote on pending changes.

Summer is over!

Well, almost. I’m back from OSCON, Foo Camp, Burning Man and the Future of Music Conference. Traveling was fun, but I’m ready to wait for the not-so-nice weather and cuddle up with a computer and get some serious MusicBrainz work done. The good news is that the data licensing revenue should start rolling in within … Continue reading “Summer is over!”

Well, almost. I’m back from OSCON, Foo Camp, Burning Man and the Future of Music Conference. Traveling was fun, but I’m ready to wait for the not-so-nice weather and cuddle up with a computer and get some serious MusicBrainz work done.

The good news is that the data licensing revenue should start rolling in within a few weeks, which means that I get to keep working on MusicBrainz full time! Full time and paid — at first it won’t be much of a paycheck, but it should pay the bills. Maybe next year we can work towards a full paycheck — we’ll see.

Here is my todo list for the near future:

  1. Whip mirror servers into shape
  2. Sign more license deals
  3. Get the menu server release out the door
  4. Fix AR bugs, improve related artists, hammer out a few new server features.
  5. Release Picard 0.5.0, libtunepimp and libmusicbrainz — all of these desperately need new releases.

Of course there are lots more things on my todo list, but these are the top 5 items. Stay tuned for more info!

Don't buy from compu-terra.com!

The week before last I started the process of purchasing a new computer for the new MetaBrainz office, so I ordered a motherboard combo, memory and a case from compu-terra.com. For the first 24 hours nothing happened and then they called me to upsell me better memory. Wherever Joe and David came from, I can … Continue reading “Don't buy from compu-terra.com!”

The week before last I started the process of purchasing a new computer for the new MetaBrainz office, so I ordered a motherboard combo, memory and a case from compu-terra.com. For the first 24 hours nothing happened and then they called me to upsell me better memory. Wherever Joe and David came from, I can tell you that those were not their original names — I couldn’t understand a word of the upsell, so I asked them to send me a link to the site that described the memory they were upselling.

6 hours later I got mail describing the memory, but no link to their site. They whole thing started to sound fishy, so I declined. Then they said they’d give me the better memory for free — as long as I gave them a good review. Foolishly I went for that, just to be done with it — I figured that I wouldn’t be getting any better RAM, since I couldn’t identify either the RAM I ordered nor the better RAM.

Then 3 days passed and the parts still had not shipped. I called up and asked what was up — they said the order was shipping that day. The next morning I still didn’t have a tracking number, so I decided that I was fed up and cancelled the order. I told them to cancel the order and that I’d stay on the line while they credited my PayPal debit card. They proceeded to cancel the order and then hang up on me. I called back, once again demanding to have my card credited right then, and they told me it would take 24 – 48 hours to credit the card and then they hung up on me AGAIN.

So I waited 48 hours and no credit showed. No phones were being answered — all I got were full voicemail boxes. I told myself to wait until monday (today) before dealing with this again, and this morning the credit showed up. Finally. I guess I got lucky this time…

So, the moral of the story is: DON’T BUY FROM COMPU-TERRA.COM!

Virtuosa/FunVibes sponsors MusicBrainz

I’m pleased to announce that FunVibes, the makers of the Virtuosa all-in-one jukebox program have sponsored the MusicBrainz project. FunVibes made a $1500 sponsorship contribution to the MetaBrainz Foundation during our first fundraiser. In exchange for this sponsorship, we’re going to have a Virtuosa jukebox button on the MusicBrainz website for six months. MusicBrainz will … Continue reading “Virtuosa/FunVibes sponsors MusicBrainz”

I’m pleased to announce that FunVibes, the makers of the Virtuosa all-in-one jukebox program have sponsored the MusicBrainz project. FunVibes made a $1500 sponsorship contribution to the MetaBrainz Foundation during our first fundraiser. In exchange for this sponsorship, we’re going to have a Virtuosa jukebox button on the MusicBrainz website for six months. MusicBrainz will also now receive affiliate fees for sales of the Virtuosa jukebox generated through the MusicBrainz web site.

Giacomo Bondi Morra, the CEO of FunVibes will also join the MetaBrainz Foundations advisory board, which will be formally created later this year. Finally, FunVibes has issued a press release announcing the sponsorship of MusicBrainz. This extra press will help MusicBrainz gain more exposure for its ongoing fundraiser. (I’ll post a link to the release once I get one)

We hope to bring on more sponsors like FunVibes in order to get the non-profit bootstrapped and on solid financial footing.

The MetaBrainz Foundation launches!

After many months of hard work, the MetaBrainz Foundation has been launched! We have just issued a press release to announce the foundation. I am excited to announce our all-star board of directors: Director Dan Brickley of W3C Director Cory Doctorow of Electronic Frontier Foundation Director Joichi Ito of Neoteny Co. Ltd. Director Lawrence Lessig … Continue reading “The MetaBrainz Foundation launches!”

After many months of hard work, the MetaBrainz Foundation has been launched!
We have just issued a press release to announce the foundation.

I am excited to announce our all-star board of directors:

In the past few weeks a number of people have gone through great lengths to help me launch MetaBrainz. I’d like to thank: Dave Evans, Matthias Friedrich, Alex Dupuy, Gavin Clarke, Don Redman, John Carter, Nikki and Tarragon Allen. I couldn’t have done it without you!

Read on for the full press release!


Continue reading “The MetaBrainz Foundation launches!”

MusicBrainz in ETech Podcast

Last week at ETech I met Ewan Spence and his buddy Crow — Ewan is a crazy Scotsman who plays with hand-puppets and feeds chocolate lover’s his imported Cadbury chocolate (the dodgy US Cadbury just won’t do!). When he is not foisting chocolate on unsuspecting attendees he’s often found recoding podcasts. After lunch one day … Continue reading “MusicBrainz in ETech Podcast”

Last week at ETech I met Ewan Spence and his buddy Crow — Ewan is a crazy Scotsman who plays with hand-puppets and feeds chocolate lover’s his imported Cadbury chocolate (the dodgy US Cadbury just won’t do!). When he is not foisting chocolate on unsuspecting attendees he’s often found recoding podcasts. After lunch one day he cornered me to talk about MusicBrainz a bit — I wasn’t quite mentally prepared to be facing a microphone, so I sound like a complete dumbass at the beginning, but I manage to catch my stride further into the podcast.

Check it out at the Podcastnetwork’s “The Tech Conference Show“. The MusicBrainz segment starts at 17:05 into the recording.