Search: Why is it so important?

After many days of tinkering, the new search server has passed its tests and is nearly ready for deployment next week. After my last post on the search services, there were lots of questions, so I’ll give some more history on why I’m working on this now:

  1. The old Lucene based search services worked well, but installing them was a major pain. Installing compilers by hand, sacrificing chickens and hoping that things would work wasn’t my idea of fun.
  2. Lucene has a philosophy of working out of the box without significant tweaks. That’s great if you’re indexing a bunch of text, but indexing music metadata from an SQL database is a bit of a different beast. The usual Lucene tricks didn’t work so well for us, so we couldn’t tweak it to work better for us. Xapian requires a little more tuning out of the box, but our search results are much better now than they were before.
  3. Sending metadata lookup traffic to a service like Xapian is generally a good idea, as a single Xapian server can handle lookup traffic more elegantly than a Postgres database. And adding more search servers is easier than adding more database servers.
  4. Our traffic is growing — I expect us to handle twice as much traffic in July as we did the July before. A lot of this traffic growth is coming from people using our web-service to look up music. If the web-service slows down, the rest of the site slows down as well. So I’m trying to stay ahead of the curve an anticipate when we reach capacity and be able to add more machines as necessary

As of next week, MusicBrainz will have twice as much rack-space (20U’s of space!) and we can finally rack the two new servers that were donated a few months ago. Fortunately due to dropping bandwidth costs, this new space doesn’t really come at a greater expense to us — I expect our hosting costs to stay nearly the same as they are now. (about $1000/mo, btw)

This will allow us to have 3 times the search capacity we have now, which should keep the site working for a while longer. In fall I hope to start moving our web-service to Amazon’s EC2 service, which should allow us to get as much capacity as we need.

As soon as I get the new search services deployed I’m putting my head down and coding the next server update. So, keep your fingers crossed that this process goes smoothly.

Bug tracker in read-only mode for a while

Dave was working on upgrading software on our catch-all server and ran into some problems with plugins for trac, our bug tracking system. Trac is currently up, but the plugin to log-in hasn’t been installed yet, so no one can log into track right now.

Dave will continue working on this in about 8-10 hours of time. Sorry for the inconvenience!

UPDATE: Everything is back to normal now. Thanks Dave!

Discographies database schema review

If you’re interested in Niklas’ Summer of Code project to implement Discography support in MusicBrainz, I would suggest that you follow his blog and read his latest entry: “Database design and a question to users“. Niklas and I have been working on the design of the database tables that will enable his SoC project. We think we’re collecting the right information, but I’m nearly always wrong. So, if you have database design experience, please take a look at this latest post has tell us just how wrong we are. πŸ™‚

Mac OS X Developer for Picard releases wanted

Its clear that I won’t find the time to package up Picard for OS X anytime soon. I’ve put out one Intel based DMG, but haven’t found the time to create a Universal Binary package of Picard. πŸ™

If you have the following:

  • Knowledge of building Mac OS X Application Bundles
  • Python knowledge
  • Love for Picard
  • Access to Intel and PPC Macs

We would very much like to talk to you. The last item isn’t crucial — I suppose we can get people in the community to test your builds for platforms you have no access to. Please leave a comment if you’re interested in helping out.

Blog moved to WordPress

I’ve finally moved our blog to the WordPress blogging system. This should alleviate all of the problems with blog comments that people were experiencing.

If you had a blog account on the old blog and would like to continue using it, please comment below and I will coordinate creating a password for you in this new blog.

If you have trouble using the new blog or find important links that are not redirecting, please create a new bug report.

Come play with a new search engine

Do you have a search bug that really annoys you? If so, please come help me test a new search engine!

I’ve ported our search services to a new text search engine called Xapian. While the indexes are bigger on disk, it is easier to install, much faster to index and probably also faster to search. And, over Lucene it has vastly fewer problems. And you can perform stop word searches!

Come play with it on my dev server! (Never mind the connection being slow and indexes being a few weeks old) Report issues to the usual place please!

Good news for Classic Tagger users

What started out as a joking suggestion has actually extended the life of the Classic Tagger! πŸ™‚

One jokester at the recent summit suggested that we return random TRM values (as opposed to matched acoustic fingerprint ids) and just switch the TRM server off. Turns out, that suggestion was actually brilliant!

Doing this essentially makes every TRM lookup return “I don’t know this one”. But in that case the MusicBrainz server falls back to doing a metadata match (without the acoustic fingerprint). And it turns out that works pretty well all around! And I think some people may prefer this method, since you won’t have to clear up TRM collisions anymore.

So, what does mean for when we switch off the TRM server? The Classic Tagger lives on and may match fewer files than before — life may actually be better once we shut it off! But I think that many people will find it useful still.

Huzzah!

Personal email troubles

The heat wave in California over this past weekend fried the disks in my community mail server and I lost all the email from over the weekend. If you sent me mail over the weekend, (either to rob [fat] eorbit [dork] net or rob [fat] musicbrainz [dork] org) please re-send it so I can respond to it.

Sorry for the hassle and thanks for your understanding.

UK Mirror downtime

The UK mirror will be down from now (8pm BST, Noon PDT, 13 May) until about Noon BST, 4am PDT, 14 May due to scheduled power outage at last.fm where the server is hosted. Sorry for the invconvenience!

The UK mirror will be down from now (8pm BST, Noon PDT, 13 May) until about Noon BST, 4am PDT, 14 May due to scheduled power outage at last.fm where the server is hosted.

Sorry for the invconvenience!