Lucene rocks!

I’ve been playing with the Lucene text indexing system (in particular, I’m playing with PyLucene, which is a GCJ compiled version of Lucene with Python bindings). Lucene does text searching really well and its fast!

Eventually I’d like to use Lucene to power the MusicBrainz searches as was as building a copy of it into Picard. Picard? Yes! Lucene is so good, that you can give it a track title and chances are its going to find the right track. My idea is this:

  1. Cluster new files and determine which artists these files cover.
  2. Download and cache the metadata for the artists locally, and build a lucene index of it.
  3. Throw each of the tracks at lucene to see what it can match.
  4. If nothing matches, maybe do a full DB search via the web service or do a TRM calculation.

I’m excited by this — the proof of concept looks fabulous. Executing it on the full scale where things are getting cached and locally indexed, is going to be a fair amount of work. Unfortunately.

But, this gives me hope that Picard will have some serious brainz under the hood. πŸ™‚

Two steps forward, one step back

I’ve been trying to set up various environments to install Picard (Mac OS X 10.3, Gentoo linux, and Windows XP via VirtualPC on the Mac). It’s been a rather trialing process…

I’ve been trying to set up various environments to install Picard (Mac OS X 10.3, Gentoo linux, and Windows XP via VirtualPC on the Mac). It’s been a rather trialing process…

Continue reading “Two steps forward, one step back”

First tax-exempt application filed

I’ll jump in right now and update you on my progress.

I just dropped the FTB3500 tax-exempt application to the State of California into the mail. This application is one of the two big ones that took many weeks of preparing and creating budget forecasts for the next two years. Budgets are not my strength, but our Treasurer helped me with this process and we got it done. Next up is the biggest and most dreaded form — the 1023 application to the IRS.

I’ve also got the first cut at the MetaBrainz web site created — this site will detail everything about the non-profit including all donations and finances, board of directors and other non-profit stuff. Of course the new web-site is not going to be public until we’re ready to announce every last detail of the new non-profit. Stay tuned!

Oh, yeah — I also created this blog this week. Maybe tomorrow I can start hacking on advanced Picard features.

Welcome to the MusicBrainz community weblog!

We recently started discussing setting up a blog for MusicBrainz contributors to post information about the work they are doing in order to keep the community up to date. Having gotten no negative feedback on the idea, I proceeded to make it happen.

I really like Movable Type — its a great piece of software, and SixApart agreed to donate license for Moveable Type — that’s a $249.95 value! We can now publish this blog and add up to 35 users to this blog. I think that should suffice for the immediate future. πŸ™‚

Thank you very much to Mena, Ben, Mie and Barak at SixApart! You guys rock and so does your software!

If you are a MusicBrainz contributor and would like to get an account to post this webblog, please send me some mail and I’ll set it up.

Server Updates

Better Disc ID support, UTF-8 support for FreeDB, several tweaks to the automoderator election system and the usual miscellaneous bunch of bug fixes and other changes.

Changes mainly of interest to MusicBrainz Users

Revamped Support for Disc IDs

Duplicate disc IDs are now allowed.  As a pleasant
side-effect you can now
search for albums based on FreeDB ID
as well as by disc ID
You can also inspect the disc ID details
much more closely than before.

FreeDB

FreeDB Moderations, (the mechanism whereby MusicBrainz automatically
imports data from FreeDB with no human intervention),
has been turned off – no more “FreeDB mods”. 
You can still do a manual FreeDB import
if you like.

Also, MusicBrainz now uses “FreeDB protocol 6”, which means much better
Unicode support when importing albums.

Automoderator Elections

All the e-mails which the system sends to the mb-automods mailing list
(when a candidate is nominated, when voting opens, etc) are now “Cc”d to
the candidate (assuming the candidate has entered an e-mail address). 
Previously it was possible, indeed probable, that the candidate had no idea
they were being nominated, and indeed accepted, right up until they got the
“Welcome to the mb-automods mailing list” e-mail.

At the end of an election, if the nominee was accepted, the system can now
make the successful candidate into an Automod without needing manual
intervention from the server administrators.  It can’t yet subscribe
the new auto-moderator to the “mb-automods” mailing list, however.

While voting is open, the tally of votes cast so far is now hidden to all
apart from the proposer, the seconders and the candidate.

If you are not logged in, the automod voting page now displays a more
helpful message than before.

Special “system” users (ModBot etc) can no longer be nominated for
auto-moderator status.

A small typo was fixed in the e-mails sent by The Returning Officer.

TRM Statistics

Ever since TRMs were introduced to MusicBrainz, we’ve kept a count of
how many times each TRM has been looked up by the Tagger (or similar apps). 
The problem was, we had no idea how often each TRM was then used to tag
each associated track.  So when looking at a TRM joined to several
tracks, working out which was the “most used” track was a matter of guesswork.

This release introduces the ability to count uses of TRMs (i.e.
against a specific track), as well as lookups.  You can see this
on the track detail page,
although all the “use” counts will all start out as blank (i.e. zero). 
In fact we’ll be keeping month-by-month lookup counts and use counts for
each TRM, so in theory you’ll be able to see how the tagging “popularity”
for each song rises and falls over time (although all you can see on the web
is the running total).

Other Changes

A bug was causing the artist search index to become
corrupted.  The bug has been fixed and the index has been rebuilt.

When albums are imported from FreeDB, and the ModBot adds a note
giving the URL of the original FreeDB data, that note is no longer
mailed to the original moderator.

When adding an album, both album attributes now default
to “not known” instead of “Album, Official”.

You can now make a case-change edit on artist aliases
(previously it erroneously complained that there was a conflicting
alias).  The code which adds and renames aliases has been made more
robust.

The edit artist page now includes a “copy” button (copies the name
into the sortname field).

“Guess Case” has been tweaked again: it no longer adds a space
after “.” if the next character is “.” (because we don’t want “…” to
become “. ..”).  It doesn’t check for a sub-title split when
inside parentheses (fixes “Album Title (Disc 1: Disc Name) bug). 
It converts “reprise” to lower case when within parentheses. 
It strips spaces after “(” or “[“, and before “]” or “)”.

The MusicBrainz data dumps now include Amazon cover art URLs.

The tagger search page is no longer fooled by extra whitespace
around your search query.

New and Changed Documentation

Changes mainly of interest to MusicBrainz Server Programmers

The INSTALL file has been further updated,
describing the installation process more fully and more helpfully than ever
before πŸ™‚

InitDb.pl now does a much better job
of creating the replication function.  You can use
--with-pending=FILE to tell it where to find pending.so.

Various scripts (MBImport.pl, ImportReplicationChanges,
LoadReplicationChanges) now check the status of completed sub-processes more
carefully.

FixLength.pl now runs every
night.  It has been made more robust too – it reports any errors it
encounters and keeps running, and it also shows the IDs of any albums it
can’t fix.

Only one instance of LoadReplicationChanges
is now allowed at a time.

ProcessReplicationChanges
now deletes “pending” data as it is processed, which means that if the
script is stopped (or crashes) we can safely restart where we left
off.  Several options have also been added to help with
troubleshooting.

Bugs and RFEs Closed

Dave Evans