Getting ready for our Next Generation Schema!

After many years of planning, anticipating and gathering resources, we’re finally tangibly close to our Next Generation Schema (NGS).

This blog post is intended as an official notification to everyone in our community and our customers who are using MusicBrainz services right now. Our next generation schema is drastic evolution for MusicBrainz — we’ve included many of the features that our customers and our community has asked for over the past few years. This means that if you’re using the MusicBrainz data right now, you will need to prepare your systems in order to be ready for the switchover when it comes in the fall. Please do not delay examining our new schema — this change is drastic change from our previous schema!

Our current plans are to enter beta testing of the new server on August 31. The exact release date is very much dependent on the results of our beta phase, but I hope to have the release within 60 days of entering beta.

The most important changes that you will need to consider/address:

  • No old (RDF) web service — the old RDF web service will no longer be supported as of NGS.
  • We will provide an XML v1 web service that is backward compatible to our current XML web service.
  • We will also provide an XML v2 web service that will expose new NGS concepts.
  • Postgres 8.3 will be required. Upgrading your old database will not be possible. You will be required to import the first post NGS data-dump in order to upgrade to NGS. Our provided upgrade script (see below) is very useful for testing purposes but not suited for upgrading deployed servers.
  • MBID changes — The MBIDs will be stable and maintained for artists, release-groups and tracks. All of the MBIDs for our current releases will also be kept, but we are changing what we are calling releases. Essentially all release events (with label, date, country and barcode information) will become releases each with their own MBID. This means that we’re adding a whole slew of new MBIDs for the releases that will not be assigned a legacy MBID.

NGS is documented on our wiki — please take some time to read up on our documentation! If you’d like to play with NGS now, follow these steps:

1. Download and install the 20090524 release according to these instructions. For going to NGS, installing a database only install is the perfect approach. Download and import an existing data set.

2. Download the NGS codebase with subversion.

3. Follow the install instructions. Instructions are included for how to migrate the 20090524 data to NGS — please note that the upgrade script may run for quite some time!

There are a lot of changes to the database from the current release! Please note that we’re done with the overall database design — I am not anticipating major changes past this point. However, I do anticipate a few smaller changes as we get closer to our goal. We will keep the schema diagram and documentation up to date with our changes.

If you care to follow our progress getting to NGS, please see our roadmap.

RDF Web Service will cease to exist after the NGS release

The old skool RDF based web service (at /cgi-bin/mq_2_1.pl /cgi-bin/mq.pl /cgi-bin/rdf_2_1.pl and /cgi-bin/rdf.pl) will cease to exist when we release the Next Generation Schema (NGS) release that will go into a beta release on August 31. This web service has been deprecated for three years now, its finally time to put it out if its own misery.

As of this release the Classic Tagger will completely stop functioning. RIP Classic Tagger!

Wiki Migration

Today’s the day – our wiki is being migrated to MediaWiki.  The old “moin” wiki is now read-only (and will remain so, at least for a few months), and is available on oldwiki.musicbrainz.org.  The new wiki, once all the data has been migrated across, will be at the usual address.

As soon as the migration is complete, I’ll switch wiki.musicbrainz.org over to point to MediaWiki.

Unfortunately it won’t be possible to also migrate the user accounts from moin to mediawiki, so regrettably this means that once mediawiki us up, you’ll have to re-create your accounts.  Sorry about that.

Update: the switch has been made – if you have any questions to ask or problems to report about this, please see the WikiMigration page.  Thanks!

New Picard builds for Mac available

Timothy Lee says:

After a month of trying universal builds through macports unsuccessfully, using a tool named ‘unify’ (h/t John) to lipo i386 and ppc arch bins together unsuccessfully, and bugging just about everyone I know to use their i386 tiger machine (I have an i386 leopard and a ppc tiger) unsuccessfully I’ve made the decision to halt my progress on trying to deliver a UB.

What I do have though is two builds. I have one PPC build that was built on a tiger machine and one i386 build that was built on a leopard machine. I have not had a chance to test my i386 leopard build on a tiger machine (read above failure to ‘borrow’ a tiger machine) but there is a chance it may work. If you feel like you have a better handle on the ‘lipo’ process, please, take my builds and smash them together and let me know! (It may be desirable NOT to deliver a UB as
each separate build of Picard is fairly large).

Timothy is looking for feedback on these builds. If you’ve been waiting for a complete version of Picard that includes working PUID generation, then please try these builds:

Picard for OS X Intel i386 (md5)
Picard for OS X PPC (md5)

If you have problems running these please enter a bug report and use the component “Picard Tagger (Mac OS X Packaging)“. Thanks very much Tim and everyone else who has helped along this somewhat frustrating process.

Testing PPC build of Picard

If you have a PPC Mac that runs 10.4/10.5 and have been waiting for a DMG of Picard, please try download and install this version. Please let us know if it works in the comments.

Jon Hermansen and I have been working on building Picard with only MacPorts prerequisites — that is how this DMG has been built. If this install works then we can proceed to work on a Universal Binary that should work on 10.4/10.5. If we can reach that, we should be able to release Mac binaries at the same time as we release binaries for other platforms.

Thanks for all your hard work Jon!

UPDATE: We’ve found a problem with PUID generation and have fixed it — we hope. The above link now points to the updated dmg. May not work on Tiger yet — if you have a Tiger PPC box, please try it and let us know.

Search: Why is it so important?

After many days of tinkering, the new search server has passed its tests and is nearly ready for deployment next week. After my last post on the search services, there were lots of questions, so I’ll give some more history on why I’m working on this now:

  1. The old Lucene based search services worked well, but installing them was a major pain. Installing compilers by hand, sacrificing chickens and hoping that things would work wasn’t my idea of fun.
  2. Lucene has a philosophy of working out of the box without significant tweaks. That’s great if you’re indexing a bunch of text, but indexing music metadata from an SQL database is a bit of a different beast. The usual Lucene tricks didn’t work so well for us, so we couldn’t tweak it to work better for us. Xapian requires a little more tuning out of the box, but our search results are much better now than they were before.
  3. Sending metadata lookup traffic to a service like Xapian is generally a good idea, as a single Xapian server can handle lookup traffic more elegantly than a Postgres database. And adding more search servers is easier than adding more database servers.
  4. Our traffic is growing — I expect us to handle twice as much traffic in July as we did the July before. A lot of this traffic growth is coming from people using our web-service to look up music. If the web-service slows down, the rest of the site slows down as well. So I’m trying to stay ahead of the curve an anticipate when we reach capacity and be able to add more machines as necessary

As of next week, MusicBrainz will have twice as much rack-space (20U’s of space!) and we can finally rack the two new servers that were donated a few months ago. Fortunately due to dropping bandwidth costs, this new space doesn’t really come at a greater expense to us — I expect our hosting costs to stay nearly the same as they are now. (about $1000/mo, btw)

This will allow us to have 3 times the search capacity we have now, which should keep the site working for a while longer. In fall I hope to start moving our web-service to Amazon’s EC2 service, which should allow us to get as much capacity as we need.

As soon as I get the new search services deployed I’m putting my head down and coding the next server update. So, keep your fingers crossed that this process goes smoothly.

Mac OS X Developer for Picard releases wanted

Its clear that I won’t find the time to package up Picard for OS X anytime soon. I’ve put out one Intel based DMG, but haven’t found the time to create a Universal Binary package of Picard. 😦

If you have the following:

  • Knowledge of building Mac OS X Application Bundles
  • Python knowledge
  • Love for Picard
  • Access to Intel and PPC Macs

We would very much like to talk to you. The last item isn’t crucial — I suppose we can get people in the community to test your builds for platforms you have no access to. Please leave a comment if you’re interested in helping out.

Good news for Classic Tagger users

What started out as a joking suggestion has actually extended the life of the Classic Tagger! 🙂

One jokester at the recent summit suggested that we return random TRM values (as opposed to matched acoustic fingerprint ids) and just switch the TRM server off. Turns out, that suggestion was actually brilliant!

Doing this essentially makes every TRM lookup return “I don’t know this one”. But in that case the MusicBrainz server falls back to doing a metadata match (without the acoustic fingerprint). And it turns out that works pretty well all around! And I think some people may prefer this method, since you won’t have to clear up TRM collisions anymore.

So, what does mean for when we switch off the TRM server? The Classic Tagger lives on and may match fewer files than before — life may actually be better once we shut it off! But I think that many people will find it useful still.

Huzzah!

Testing Picard on OS X (Intel)

If you have an intel Mac and would like to try out the first test of a PUID enabled Picard on OS X, please download this DMG and post a comment to let me know if it works.

This DMG:

  • Is for INTEL only
  • Only runs on OS X 10.5 (Leopard)
  • Probably does not have working CD Lookup
  • Might be buggy
  • Is based on the 0.9.0 Picard tarball
  • Is NOT a Universal Binary

If this DMG works for people then I will proceed to try to get 10.4/PPC/disc lookup support working.