New search servers

I’m happy to say that we’ve now got extra search servers – a failover pair, in fact, so we’re no longer reliant on a single search server. In fact shortly we hope to bring a third server into the pool too. Not bad considering that, so far, we’ve only ever had one (or zero) search servers.

What this means for you is that searches should be faster. It also means that the future performance of the web site is now more assured than it was before – we’re in a substantially better position to handle extra traffic.

Most of the work was done by Robert Kaye; I only helped to polish off the edges ๐Ÿ™‚

Squashing the rise of the sock puppets

We’ve recently seen a rise in Sock Puppets here at MusicBrainz. We’ve observed editors creating separate sock puppet accounts who vote through the edits of the editor in order to get changes through MusicBrainz faster. This practice obviously side-steps our peer-review system, and up until now we’ve had to have other editors go through and follow the trails of naughty editors to clean up after them.

To avoid this from happening continually, we’ve update the main server with a minor patch that requires people to have more than 10 approved edits in order to vote on other people’s edits. This makes creating a sock-puppet account much harder — each sock puppet account created will need to have a lot of work invested in it before it can be useful. We’re hoping that this simple tweak will discourage sock puppeteers.

Call for search server testing

After I gave some history in the last post, I’d like to put out a call for testing for the new search server. In moving from Lucene to Xapian I’ve fixed a number of bugs, some of which have been lingering for a while. Also see the list of bugs we still have open and plan to fix before the release.

If you have a pet-peeve bug that’s been annoying you, please check to see how our new Xapian test server is handling things now. (Please be patient with our the dev server, the box needs an upgrade soon!)

If you are a fluent speaker in Chinese, Japanese, Korean or Thai, please take a moment to look up some artists! We had some problems with searching Chinese text, but I think I fixed it, but I am not proficient in any of the applicable languages, so please help sanity check me!

Unless I find more bugs, this new search server will go into production sometime next week. If you find a bug, please report it to the usual place.

Search: Why is it so important?

After many days of tinkering, the new search server has passed its tests and is nearly ready for deployment next week. After my last post on the search services, there were lots of questions, so I’ll give some more history on why I’m working on this now:

  1. The old Lucene based search services worked well, but installing them was a major pain. Installing compilers by hand, sacrificing chickens and hoping that things would work wasn’t my idea of fun.
  2. Lucene has a philosophy of working out of the box without significant tweaks. That’s great if you’re indexing a bunch of text, but indexing music metadata from an SQL database is a bit of a different beast. The usual Lucene tricks didn’t work so well for us, so we couldn’t tweak it to work better for us. Xapian requires a little more tuning out of the box, but our search results are much better now than they were before.
  3. Sending metadata lookup traffic to a service like Xapian is generally a good idea, as a single Xapian server can handle lookup traffic more elegantly than a Postgres database. And adding more search servers is easier than adding more database servers.
  4. Our traffic is growing — I expect us to handle twice as much traffic in July as we did the July before. A lot of this traffic growth is coming from people using our web-service to look up music. If the web-service slows down, the rest of the site slows down as well. So I’m trying to stay ahead of the curve an anticipate when we reach capacity and be able to add more machines as necessary

As of next week, MusicBrainz will have twice as much rack-space (20U’s of space!) and we can finally rack the two new servers that were donated a few months ago. Fortunately due to dropping bandwidth costs, this new space doesn’t really come at a greater expense to us — I expect our hosting costs to stay nearly the same as they are now. (about $1000/mo, btw)

This will allow us to have 3 times the search capacity we have now, which should keep the site working for a while longer. In fall I hope to start moving our web-service to Amazon’s EC2 service, which should allow us to get as much capacity as we need.

As soon as I get the new search services deployed I’m putting my head down and coding the next server update. So, keep your fingers crossed that this process goes smoothly.

Discographies database schema review

If you’re interested in Niklas’ Summer of Code project to implement Discography support in MusicBrainz, I would suggest that you follow his blog and read his latest entry: “Database design and a question to users“. Niklas and I have been working on the design of the database tables that will enable his SoC project. We think we’re collecting the right information, but I’m nearly always wrong. So, if you have database design experience, please take a look at this latest post has tell us just how wrong we are. ๐Ÿ™‚

Mac OS X Developer for Picard releases wanted

Its clear that I won’t find the time to package up Picard for OS X anytime soon. I’ve put out one Intel based DMG, but haven’t found the time to create a Universal Binary package of Picard. ๐Ÿ™

If you have the following:

  • Knowledge of building Mac OS X Application Bundles
  • Python knowledge
  • Love for Picard
  • Access to Intel and PPC Macs

We would very much like to talk to you. The last item isn’t crucial — I suppose we can get people in the community to test your builds for platforms you have no access to. Please leave a comment if you’re interested in helping out.

Come play with a new search engine

Do you have a search bug that really annoys you? If so, please come help me test a new search engine!

I’ve ported our search services to a new text search engine called Xapian. While the indexes are bigger on disk, it is easier to install, much faster to index and probably also faster to search. And, over Lucene it has vastly fewer problems. And you can perform stop word searches!

Come play with it on my dev server! (Never mind the connection being slow and indexes being a few weeks old) Report issues to the usual place please!

Amazon ASIN cleanup

Back in prehistoric times, MusicBrainz used to use an automated script for matching releases to Amazon’s ASINs. But quickly people interjected and demanded to be able to use ARs to associate ASINs to releases. So, we added support for using ARs, but we never got rid of the old system and that has caused a … Continue reading “Amazon ASIN cleanup”

Back in prehistoric times, MusicBrainz used to use an automated script for matching releases to Amazon’s ASINs. But quickly people interjected and demanded to be able to use ARs to associate ASINs to releases. So, we added support for using ARs, but we never got rid of the old system and that has caused a few bugs over time.

I’ve written a script that takes the first step in removing the old amazon ASINs and converts them to AR links. I’ve run this script on my test server musicbrainz.homeip.net — please go to that server and pick your favorite ASIN screw up and let me know if its working.

If this script turns out to work well, I can run it on the main server later this week in order to remove these pesky ASIN issues.

Thanks!

UPDATE: This script has been run on the main server and all Amazon ASINs should now be user editable. No new links have been generated — we’ve only converted old style ASIN matches to ARs so that our editors can make changes.

Testing a new libofa release

libofa, the Open Fingerprint Architecture library had many issues in the last release and quite a few of them were cleaned up, including the notorious incorrect output bug on Intel Macs. I’ve put together a test release of 0.9.4 to see if people can build this tarball without major problems. If you’ve had issues with … Continue reading “Testing a new libofa release”

libofa, the Open Fingerprint Architecture library had many issues in the last release and quite a few of them were cleaned up, including the notorious incorrect output bug on Intel Macs. I’ve put together a test release of 0.9.4 to see if people can build this tarball without major problems.

If you’ve had issues with previous builds of libofa, please build and install this tarball. Report any bugs/problems to the usual place, please.