Server Troubles

Recently the server has been hit by patches of instability – large load spikes, running out of memory, and processes getting killed here and there. When the most recent out-of-memory condition occurred (last night) the SSH server was one of the processes which got killed, which is why the server had to be rebooted a little while ago.

I’m fairly sure I more or less know what’s been causing the problems, and have made a few changes to try to reduce the chance of it happening again.

One of the worst causes of the problem is looking up a TRM with a large number of tracks. The worst TRM by far for this is the “silence” TRM, with (currently) over 900 tracks. As a result I’ve had to, for now at least, disallow lookups on this TRM – doing so will now simply return an error. Sorry 😦 Maybe it can be made to do something more helpful in future.

The other change is that if you do a lookup on any TRM which has more than 100 tracks then only 100 of those tracks will be returned. However so far there are no TRMs (except “silence”) with over 100 tracks, so this won’t affect anyone, yet. As the data grows, it will though.

Sorry for any inconvenience caused (hey, I’m apologising again. This is getting to be a habit). But I’m sure you’d rather have a server which doesn’t keep crashing and locking us all out. Hey ho.

Blog comments disabled

Sorry, but I’ve got better things to do with my time than continually delete spam comment runs from this blog (and the administrative UI doesn’t make it very easy to delete large numbers of comments). So, pending some effective method of protection against the spammers being found, I’ve disabled comments on this blog.

For now, if you want to discuss any of the entries here, you can use the mb-users mailing list. Sorry for any inconvenience.

Mopping Up the Pink Stuff

Over the last few days we’ve had something like 200 spam comments posted to this blog, so I’ve been forced to invest a few hours of my time deleting those comments, and trying to find a way to solve the problem in a more permanent, automated way in the future.

We’ve got some comment filtering in place now, which should mean that the spam comments at least only get seen by the blog admins (thus they don’t get spidered by Google et al, which is presumably the reason for spamming in the first place; and it also means that you, dear reader, aren’t bothered by them either).

The blog software does have provision for blocking by IP address, but like e-mail spam it tends to arrive not from one or two sources, but from a multitude of machines, so having a simple list of banned IPs is never likely to be practical. However I took the list of IP address seen in the last couple of runs of comment spamming, and cross-checked them against some well-known DNS blacklisting services, traditionally used to protect against e-mail spam. About half of the blog spamming IPs were listed in those blacklists, so if anyone happens to know of a MoveableType plugin which can do DNSBL lookups, please let me know!

Without going into any detail, I’ve taken a few other measures to protect against this problem; it shouldn’t have broken anything, but if it has, please let me know about that too 🙂

Server Updates

“Add Disc ID” moderations, and Annotations.

Changes mainly of interest to MusicBrainz Users

“Add Disc ID” Moderations

Whenever a disc ID is added to an existing album, it is now tracked
via an “Add Disc ID” moderation.  This applies both to disc IDs added
via the “CD lookup” interface (in which case the moderation is credited to
whoever performed the lookup), and also to those added as a result of a
FreeDB lookup (which fall under the “FreeDB” moderator). 
“Add Disc ID” moderations are not used in the case where
an album and a disc ID are added at the same time.


Annotations allow you to add notes to artists and albums. 
See How Annotations Work
and the Annotations FAQ
Thanks to Matthias Friedrich for building the foundations of this feature.

Bugs and RFEs Closed

Dave Evans

Slowly updating the Documentation

Alex has been copying the documentation from the website into the Wiki. The idea behind this is that the documentation on the website is partially dated and structured in a way that does not reflect the way new users come to MusicBrainz any more.

I have now joined him in his effort of RestructuringTheDocumentation. This is the second restructuring that this Wiki experiences (the first was the general RestructuringTheWiki).

Continue reading “Slowly updating the Documentation”

Lucene based tagging update

I previously mentioned that Lucene rocks — well, that is not giving it enough credit. I’m working on the guts to a Lucene enabled Picard tagger, and in doing so I have created a simple script that chewed through a given set of mp3 files and attempts to match them up with MusicBrainz.

My friend Vee once gave me a CD full of hip-hop music to give to my GF. I took one look at it and stared in shock! What a mess — not many id3 tags, mostly no album names at all. Lots of friends vs friendz problems — much slang used in inconsistent ways. Ick!

I ran this through the old tagger a while back and it matched roughly 30% of the tracks. I’ve been using this set of files to tune the new tagging engine and once things got cached into memory, it chewed through over 100 files in under 7 seconds:

60% matched: 64 files matched, 41 files with suggestions, 1 files not matched.

60% !! Check the results for yourself!

And of the 41 files that have suggestions at least 80% of them have the correct match in the top 3 closest matches. I’m floored — it works so well, and there are a number of improvements still left to make. The downside? You need the 700Mb lucene index on your hard drive. That’s going to be more than 250Mb to download. 😦 I’ll have to work out the right combination of BitTorrent, caching, and P2P solutions to tackle that minor issue.

But this is really stunning!

Saturday Morning News

Wow, that’s a lame title.  I couldn’t think of anything better :-/  This is a general-purpose update as to what I’ve been doing of late, so no single title seemed appropriate, except “What Dave’s Been Doing Lately” (and I think the title I finally went with is more punchy).

Read on for general babble about what I’ve been doing lately, and a random thought about the best way of getting feedback about server development.

Continue reading “Saturday Morning News”

Lucene web service

In the last two weeks I managed to combine working on MusicBrainz, creating a new open source project and earning money to pay the bills! This is quite rare these days, so I am pleased all around.

As some of you may know, I have been doing contract work for CD Baby. When Derek, the owner and lead geek at CD Baby, asked me what MusicBrainz does for searching, I launched into a long cheerleading rant about Lucene. I managed to convince Derek that Lucene is the way to go, and to convince him to sponsor the open source development of the new Lucene Web Service. Luckily Derek agreed that as long as the project was going to be available under the BSD license that he would agree to open source the work.

Triple cheers for Derek and CD Baby please!

So, the web service is now done and I’ve applied for a new project on SourceForge — once that is approved, I will release the source code for everyone to check out. I’ll post another message here when that is complete.

If you’d like to check out the working web service, try this link.