I’ve been playing with the Lucene text indexing system (in particular, I’m playing with PyLucene, which is a GCJ compiled version of Lucene with Python bindings). Lucene does text searching really well and its fast!
Eventually I’d like to use Lucene to power the MusicBrainz searches as was as building a copy of it into Picard. Picard? Yes! Lucene is so good, that you can give it a track title and chances are its going to find the right track. My idea is this:
- Cluster new files and determine which artists these files cover.
- Download and cache the metadata for the artists locally, and build a lucene index of it.
- Throw each of the tracks at lucene to see what it can match.
- If nothing matches, maybe do a full DB search via the web service or do a TRM calculation.
I’m excited by this — the proof of concept looks fabulous. Executing it on the full scale where things are getting cached and locally indexed, is going to be a fair amount of work. Unfortunately.
But, this gives me hope that Picard will have some serious brainz under the hood. 🙂
I’ll jump in right now and update you on my progress.
I just dropped the FTB3500 tax-exempt application to the State of California into the mail. This application is one of the two big ones that took many weeks of preparing and creating budget forecasts for the next two years. Budgets are not my strength, but our Treasurer helped me with this process and we got it done. Next up is the biggest and most dreaded form — the 1023 application to the IRS.
I’ve also got the first cut at the MetaBrainz web site created — this site will detail everything about the non-profit including all donations and finances, board of directors and other non-profit stuff. Of course the new web-site is not going to be public until we’re ready to announce every last detail of the new non-profit. Stay tuned!
Oh, yeah — I also created this blog this week. Maybe tomorrow I can start hacking on advanced Picard features.
We recently started discussing setting up a blog for MusicBrainz contributors to post information about the work they are doing in order to keep the community up to date. Having gotten no negative feedback on the idea, I proceeded to make it happen.
I really like Movable Type — its a great piece of software, and SixApart agreed to donate license for Moveable Type — that’s a $249.95 value! We can now publish this blog and add up to 35 users to this blog. I think that should suffice for the immediate future. 🙂
Thank you very much to Mena, Ben, Mie and Barak at SixApart! You guys rock and so does your software!
If you are a MusicBrainz contributor and would like to get an account to post this webblog, please send me some mail and I’ll set it up.