Main server updated

We just completed pushing the latest changes out to the main servers! We had a bit of a bumpy ride to roll out the upgrade — we’re noticing quite a few problems with collections right now and the Last Update feature brought our database server to its knees. As a result, we’ve disabled the Dashboard — we’ll re-enable it once we figure out what the problem is.

If you encounter a problem with the server, please file a bug report and select the 2008-11-23 version. Also, please check the open bug list to see if your problem has been reported before.

For a complete list of things that changed for this release, please see the release page on the wiki.

This massive release was brought to you by the tireless efforts of: Luks, Murdos, Djce, Jugdish, Acid2, Niklas and myself. Loads and loads of good testing came from Voiceinsideyou and Nikki. Thanks to everyone who helped with this release!

Also, if you’ve used Jugdish’s enhanced voting GreaseMonkey script, please disable it as it may cause problems since that functionality was included in this server release.

Discography, ratings, enhanced voting, dashboard, timeline and related tags now on test server

Murdos has been busy merging the various development branches into trunk — thanks for your work. I’ve updated the test server with the latest codebase. Come check out the latest new features:

(to log in use your normal log in name and password ‘mb‘)

We’ve got more bug fixes coming in the next couple of weeks. Also, the next server release has been scheduled for 2008-11-23. As usual, please report bugs to our bug tracker.

UPDATE: I had to clear everyone’s collections because of a bug. That’s fixed now — please start over again.
UPDATE2: Due to complaints by stodgy brits, Music Newz is now called Music News. 🙂

NGS: From here to there

[ Before reading this post, make sure to read the previous NGS related post ]

The question that is on my mind right now is how to build a coherent roadmap that gets us from the mb_server codebase that we’re running today to the NGS codebase, complete with new edit system. The factors that play into this are:

  1. mb_server codebase: This is the codebase that we’re running today. We’re updating it one more time this year and then early next year we hope to move to the Template Toolkit work.
  2. Template Toolkit: This is Oliver Charles’ work to clean up our codebase. Template Toolkit is available for perl and looks like it will be available for Python soon. Our hope is the clean up the codebase so that we’re ready to take on more developers to help with the development — especially as we move closer to NGS.
  3. NGS playground: See the previous post for details on this.
  4. NGS proper: This is the finished NGS that we roll out onto the MusicBrainz servers.

Finally, the BBC has been keen on getting what they are calling Cultural Identifiers. This name is a bit of a misnomer — essentially it would be the release related portions of NGS. Release groupings that allows us a more product centric approach to managing releases. Right now we list and identify releases with different track layouts as totally separate releases, even though they ought to be properly related. The BBC wishes this work to happen sooner than later and have indicated that they would be willing to sponsor this work.

That’s awesome, right??

Well, yes. But there is one problem. In the last post we concluded that we should move to NGS in one fell swoop. And now the BBC would like us to take an intermediate step? As much as we agreed that moving to NGS in one step, I think we must work with our most visible partner. Since we are severely resource constrained (we have just enough money to hire a part time University student right now) I feel compelled to find a way to get the BBC what they want as soon as possible while accepting money from them to boost our development funds. Taking money from the BBC may allow us to accelerate our development schedule towards NGS. But at the same time, it may slow us down getting to NGS.

I’m very much looking for feedback on how to best make this happen and how to best accomplish all these goals. Do you think that adding an intermediary step in exchange for funds from the BBC is an acceptable compromise?

Next generation schema: Where we are today

Sorry for the delay in continuing my MusicBrainz Server Roadmap updates — we’ve been wrestling with some server configuration nightmares…

I’ll start by giving some background on the Next Generation Schema (NGS). People have been calling for an improved schema that can intelligently handle classical music, proper artist attribution and support for packages of releases (among many other things). In 2005 we held a summit in Germany where we laid down some groundwork for the requirements for a new schema and created a first rough draft schema. Holes were quickly poked into that schema, but this let us find the weak points and let us do a better job of designing the schema for the next attempt. The next attempt was an invite only summit held in London in 2007, where we created a schema that should be pretty close to what we’re actually going to put in place.

Since then we’ve been debating how to make NGS a reality. And to say the least, its been a real pain so far. The change from the existing schema to NGS is a very large project and will be far from trivial. For instance, do we migrate to NGS in one step or take a few steps? Can we keep our existing edit system or do we need to rewrite it from scratch? Now that we have the schema done, what should the user interface look like? How can we make it simple for those users who want to do simple data changes? How can we allow more expert users to make all sorts of detailed changes while keeping the user interface simple? What would the best tools for building a new UI be?

To answer these questions, Lukáš has been working on the NGS playground in his spare time. The NGS playground is Lukáš’ attempt to answer these questions. So far, he has answered two questions:

  1. We should move to NGS in one step. Moving to NGS in multiple step steps will cause too many headaches and too much lost work. Each step would require extensive work to glue the old portions to the new portions and these glue layers would later be discarded. Overall, not a very efficient means of moving forward.
  2. We need to dump the edit system and start over. We’ve learned a lot about how to do an edit system right — and the existing system won’t cut it moving forward. Its time to start over.

In the process, Lukáš is trying out a bunch of new tools to see how they fare. He’s also working in Python to see if that is the proper language to move forward with. While I haven’t run the NGS playground yet, Lukáš has the schema finished except for the Works concept, which is pretty straightforward. Also, I believe that the script to convert an existing database to the new format is also done. However, Lukáš’ work is only a playground — its an attempt to see how we can pull off the user interface for NGS. That’s NGS in a nutshell today.

In my next post I will continue this thread with thoughts on how we go from today’s mb_server and the Template Toolkit work and move towards the NGS work that Lukáš is doing. For now, if you’re a computer geek who is interested in looking at NGS, please take a look at the NGS playground. There isn’t much documentation yet — so consider this a “some assembly required” project. We’re interested to hear your comments on the playground — please feel free to post them here.

Unplanned Downtime

It looks like the main web site has dropped off the ‘net – most likely the server has crashed. I’ve asked the good people at Digital West to reboot the server. Please bear with us… hopefully we’ll be back up soon!

Update: the server is back (so it was down for just over an hour).

Update 2: We have two new servers on order to bring much needed redundancy to the site. So, the next time our flaky web server crashes the site should keep running. We hope to have those machines in rotation early in September. (read: after Burning Man. 🙂 )

New search servers

I’m happy to say that we’ve now got extra search servers – a failover pair, in fact, so we’re no longer reliant on a single search server. In fact shortly we hope to bring a third server into the pool too. Not bad considering that, so far, we’ve only ever had one (or zero) search servers.

What this means for you is that searches should be faster. It also means that the future performance of the web site is now more assured than it was before – we’re in a substantially better position to handle extra traffic.

Most of the work was done by Robert Kaye; I only helped to polish off the edges 🙂

Squashing the rise of the sock puppets

We’ve recently seen a rise in Sock Puppets here at MusicBrainz. We’ve observed editors creating separate sock puppet accounts who vote through the edits of the editor in order to get changes through MusicBrainz faster. This practice obviously side-steps our peer-review system, and up until now we’ve had to have other editors go through and follow the trails of naughty editors to clean up after them.

To avoid this from happening continually, we’ve update the main server with a minor patch that requires people to have more than 10 approved edits in order to vote on other people’s edits. This makes creating a sock-puppet account much harder — each sock puppet account created will need to have a lot of work invested in it before it can be useful. We’re hoping that this simple tweak will discourage sock puppeteers.

Call for search server testing

After I gave some history in the last post, I’d like to put out a call for testing for the new search server. In moving from Lucene to Xapian I’ve fixed a number of bugs, some of which have been lingering for a while. Also see the list of bugs we still have open and plan to fix before the release.

If you have a pet-peeve bug that’s been annoying you, please check to see how our new Xapian test server is handling things now. (Please be patient with our the dev server, the box needs an upgrade soon!)

If you are a fluent speaker in Chinese, Japanese, Korean or Thai, please take a moment to look up some artists! We had some problems with searching Chinese text, but I think I fixed it, but I am not proficient in any of the applicable languages, so please help sanity check me!

Unless I find more bugs, this new search server will go into production sometime next week. If you find a bug, please report it to the usual place.

Search: Why is it so important?

After many days of tinkering, the new search server has passed its tests and is nearly ready for deployment next week. After my last post on the search services, there were lots of questions, so I’ll give some more history on why I’m working on this now:

  1. The old Lucene based search services worked well, but installing them was a major pain. Installing compilers by hand, sacrificing chickens and hoping that things would work wasn’t my idea of fun.
  2. Lucene has a philosophy of working out of the box without significant tweaks. That’s great if you’re indexing a bunch of text, but indexing music metadata from an SQL database is a bit of a different beast. The usual Lucene tricks didn’t work so well for us, so we couldn’t tweak it to work better for us. Xapian requires a little more tuning out of the box, but our search results are much better now than they were before.
  3. Sending metadata lookup traffic to a service like Xapian is generally a good idea, as a single Xapian server can handle lookup traffic more elegantly than a Postgres database. And adding more search servers is easier than adding more database servers.
  4. Our traffic is growing — I expect us to handle twice as much traffic in July as we did the July before. A lot of this traffic growth is coming from people using our web-service to look up music. If the web-service slows down, the rest of the site slows down as well. So I’m trying to stay ahead of the curve an anticipate when we reach capacity and be able to add more machines as necessary

As of next week, MusicBrainz will have twice as much rack-space (20U’s of space!) and we can finally rack the two new servers that were donated a few months ago. Fortunately due to dropping bandwidth costs, this new space doesn’t really come at a greater expense to us — I expect our hosting costs to stay nearly the same as they are now. (about $1000/mo, btw)

This will allow us to have 3 times the search capacity we have now, which should keep the site working for a while longer. In fall I hope to start moving our web-service to Amazon’s EC2 service, which should allow us to get as much capacity as we need.

As soon as I get the new search services deployed I’m putting my head down and coding the next server update. So, keep your fingers crossed that this process goes smoothly.