MusicIP open sources a small server extension

I’m pleased to announce that MusicIP has just open sourced a small extension to the MusicBrainz server! MusicIP contracted me to write a set of SQL scripts that would take their mirror of the MB database and create an extra table that stores the first release date for each track. As you may know we … Continue reading “MusicIP open sources a small server extension”

I’m pleased to announce that MusicIP has just open sourced a small extension to the MusicBrainz server!

MusicIP contracted me to write a set of SQL scripts that would take their mirror of the MB database and create an extra table that stores the first release date for each track. As you may know we have this for albums, but we haven’t had (or needed) this for the track level.

If you’d like to check out this extension, you can find it here. Take a look at the README file to see how this should be used.

Please note that this code is checked into the RELEASE_20060712 branch — once we’re finished with dev work on this branch we will merge it back into the trunk.

Big thanks to Wendell Hicken and Matthew Dunn of MusicIP!

Technorati Tags: ,

To search aliases or not?

In ticket #1731 we’re currently discussing the merits of having the artist aliases searched by default. Compare these two searches: Search without aliases for “Jennifer” Search with aliases for “Jennifer” The perfect match for Jennifer is half-way down the page of results if aliases are included. Jennifer is all the way at the top (where … Continue reading “To search aliases or not?”

In ticket #1731 we’re currently discussing the merits of having the artist aliases searched by default. Compare these two searches:

  1. Search without aliases for “Jennifer”
  2. Search with aliases for “Jennifer”

The perfect match for Jennifer is half-way down the page of results if aliases are included. Jennifer is all the way at the top (where it should be) when the aliases are not searched.

So, why does this happen?

Take for instance the top hit of the with aliases search: Jennifer Paige. She has an alias for “Jennifer Page“, so when Lucene ranks the search results, the word Jennifer appears twice, which is a better match than when the word appears only once. This disturbed our users and it plain feels wrong to me.

Then I tried to play with Lucene’s term boosting functions. Take this query:

aritst:jennifer^10 sortname:jennifer^10 alias:jennifer^0.0000001

In English, it says to search for Jennifer in artist names and sortnames and to make these hits 10 times more “important” than normal hits. It also says to search aliases and make hits from this vastly less important than normal hits. The result, Jennifer is at the top as we want. But, what happens when we search for Bjork (not Björk)?

We get this mess where Björk is the last search hit with a score of 0. (this is not the best example since the next version of search will automatically find Björk when searching for Bjork, but it still illustrates the problem)

As you can see, tweaking the searches to make things better one way, will make other searches worse. Do you think it its more important to search aliases by default or to have better search results by default?

Technorati Tags: ,

Call for bug-fix testing

We’ve fixed a number of bugs (321 of them as of right now!) and we should be ready to go for another bug-fix update of the main server this weekend. To avoid the mess we had last time, please take some time in the next few days to check the test server to see how … Continue reading “Call for bug-fix testing”

We’ve fixed a number of bugs (321 of them as of right now!) and we should be ready to go for another bug-fix update of the main server this weekend. To avoid the mess we had last time, please take some time in the next few days to check the test server to see how things are looking. Please test now and not after the release!

I would also like to remind people that if you find a bug to report it via our bug tracker. Please do not mail bug reports or mention them in IRC in hopes of having them fixed. Also, if you don’t report a bug before the release, please don’t scream your head off if the issue hits the main servers when we do the release.

Go test on the staging server and report bugs or look at the list of closed bugs.

Thanks to Stefan for fixing all these bugs!

Technorati Tags: ,

Usability fixes live on server

A number of you had reported various usability issues with the new server update — we just updated the server with some more fixes. For the run-down of what we changed, please see the list of recently closed bugs. Thanks to all those who reported bugs and helped us sort out these issues! Technorati Tags: … Continue reading “Usability fixes live on server”

A number of you had reported various usability issues with the new server update — we just updated the server with some more fixes. For the run-down of what we changed, please see the list of recently closed bugs. Thanks to all those who reported bugs and helped us sort out these issues!

Technorati Tags: , ,

Main server update updated

We’ve gotten a slew of bug reports overnight as to what was wrong with the latest release. Keschte and I worked hard to address a number of these issues and I’ve done a mini update on the server. Check out this list of recently closed bugs to see what we did. Thanks to all those … Continue reading “Main server update updated”

We’ve gotten a slew of bug reports overnight as to what was wrong with the latest release. Keschte and I worked hard to address a number of these issues and I’ve done a mini update on the server. Check out this list of recently closed bugs to see what we did. Thanks to all those who reported bugs!

The broken tagger issues should also be resolved now.

Technorati Tags: ,

Main server updated

We just updated the main server with the long awaited new release: Not really new, or not so obvious features The MusicBrainz web pages have been polished and cleaned to be web standards compliant. We proudly serve the pages in XHTML 1.1, CSS 2.0 compliant mode. The user interface has been given a face lift … Continue reading “Main server updated”

We just updated the main server with the long awaited new release:

Not really new, or not so obvious features

  • The MusicBrainz web pages have been polished and cleaned to be web standards compliant. We proudly serve the pages in XHTML 1.1, CSS 2.0 compliant mode.
  • The user interface has been given a face lift and a cleaner look.
  • We have made great steps on the terminology changes, which had hung in limbo for quite some time. From now on, the term ”album” is reserved for release attributes. The albums are now called ”releases”. The same is true for the term ”moderation”: we replaced this with the more appropriate term ”edit”.
  • Lots of static pages have been replaced with WikiDocs pages. This means that most of the documentation pages displayed on this very server are no longer static html pages, but are instead content transcluded from the MusicBrainz Wiki. For each of these pages, a given revision is labeled as “official content”, and is displayed inside the MusicBrainz site. (For example, compare http://musicbrainz.org/doc/ContactUs to http://wiki.musicbrainz.org/ContactUs). If the content is not labelled as official, it can still be browsed in the same way, but will be labelled as unofficial content. WikiDocs is a huge project, and some parts will always be in the works. Please talk to the Wikizens if you’d like to contribute!

New features

  • The “Release Editor” (nicknamed “The edit page to rule them all”) replaces the FreeDB Import, the CD-Lookup as well as the “Edit all” workflows. It supports all the edit types usually applied to core entities (artist/release/tracks), including single artist conversions, various artist conversions, but also explicit changes to one track artist of a single artist release. This will allow us to more flexibly edit releases to conform to the new ReleaseArtistStyle, all in one go.
  • Timestamps on edit notes.
  • Diff display of changed titles on the voting (previously: moderation) pages.
  • Edits are linked to their respective documentation.
  • Edits show all the parent objects that are relevant for the current object (example: on a track title edit, the artist and the release are shown in the header).
  • Pending edits are shown and explained in more detail on the edit pages. Links are provided to review the pending edits.
  • The help blurbs on the editing pages were extended, and now include links to official WikiDocs documentation relevant to the current EditType.
  • GuessCase classical mode.
  • The Indexed Search was extended to include additional fields for the Artist Search. Users can choose to include artist names, sort names and alias’ for more precise search results.
  • The Indexed Search results have an additional row which allows to add the entity to the Create Relationships list. This should allow to enter relationships much faster now.

Improved features

  • Lots of fixes for the GuessCase function.
  • Numerous others: please see the closed tickets for the current Milestone and Version.

Many thanks to Keschte and everyone else who contributed and tested for your hard work on this release!

Technorati Tags: , ,

New Server Release

It has been rumoured for quite some time now, and I think that the new server release is ready for beta-testing. Please jump in, and help finding the remaining bugs. If you find any, file them to the XHTML 1.1 Milestone, and owner to yours truly. This is a significant update to the look and … Continue reading “New Server Release”

It has been rumoured for quite some time now, and I think that the new server release is ready for beta-testing. Please jump in, and help finding the remaining bugs. If you find any, file them to the XHTML 1.1 Milestone, and owner to yours truly.

This is a significant update to the look and feel of MusicBrainz — many pages and workflows have changed and there are bound to be a number of bugs. We’ll need people to jump in help testing if we want to get this release out soon.

See what has changed: Release Notes
Test Server (as usual): test.musicbrainz.org
Bug Tracker (as usual): bugs.musicbrainz.org

For right now, we’re not specifying a release date — we need to get more eyes looking at this new release before we can nail down a date. So, please jump in and help test!!

What's up with those pesky 502 errors?

There are two web servers running on the main web server machine. The first web server is light and handles all the content that is simple, such as static pages, images and the like. Anything that requires more intelligence, such as talking to the DB, gets passed to the second web server, which is designated … Continue reading “What's up with those pesky 502 errors?”

There are two web servers running on the main web server machine. The first web server is light and handles all the content that is simple, such as static pages, images and the like. Anything that requires more intelligence, such as talking to the DB, gets passed to the second web server, which is designated for these heavy requests.

The light server will wait for a specified time for the heavy server to finish its job — currently 120 seconds. If the heavy server hasn’t finished the job in that time, the light server gives up and and returns you the dreaded 502 error. The DB server will unfortunately continue to chug on the query and finish executing it as requested — cancelling an existing query is hard to do, and often times its better to let the server just run its course.

The gut reaction might be to say: “Why not stick around longer and wait for the results, if the DB is going to crank them out anyway?” Problem is that if we do this, the light web server is sitting idle doing nothing while waiting for the DB/heavy server to finish its job. The light server can give up and instead spend its time better doing things it can accomplish in a reasonable amount of time — like serving smaller requests for others. With this setup, the overall system favors the less intensive requests and thereby increasing the overall number of queries that were successfully handled. If we stopped and waited for the DB/heavy server to finish its stuff, we would pretty quickly clog up the web server with requests that are sitting idle, doing nothing. And that clog would then prevent any further connections to the web server and the whole site comes to a halt.

If you want a visual representation of what is going on, check the load graphs for dexter, our DB server. Any load greater than 4.0 and the DB server is no longer running optimally. We’re fine right this second, but in 10 minutes time?

So, what are we doing about this?

  1. Optimize the server code so that the user cannot make these intensive requests
  2. Spread the DB load across multiple replicated slave servers.
  3. Partition the database so that we can have multiple master servers. For instance, we could have one DB server that handles all the edits and one that handles the data. Maybe one that handles TRMs and PUIDs. This way each machine does less work, but this is a lot of work to code for mb-server devels.
  4. Find someone to give us a beefy database server with 12GB – 16GB of RAM

So, next time you’re aching for new mb-server features, please keep in mind that we’re spending a lot of time just keeping the service running smooth. Our income isn’t great enough yet that we can hire people to maintain the site AND hire people to hack on new features. In the meantime, Dave Evans and I will focus on keeping things running and hard working folks like Keschte are working on new features for the server. Overall we’re still moving forward, just a lot slower than we care for.

What can you do?

  1. Help us solve our DB issues if you’re a DB person.
  2. Help us write more mb-server code.
  3. Most important of all, make a donation!!
  4. Bug your rich friends to donate to MusicBrainz so we can buy a beefy database server. 🙂

Technorati Tags: , ,

Guess case for classical music

Keschte (g0llum) says: This concerns mostly the classical editors. I’ve finally taken my time to develop the requested guess case mode for the classical style guidelines. These are mostly regular expressions which cover most of the cases that require tedious manual editing. You’ll find some of the examples I’ve worked with in the header of, … Continue reading “Guess case for classical music”

Keschte (g0llum) says:

This concerns mostly the classical editors. I’ve finally taken my time to develop the requested guess case mode for the classical style guidelines. These are mostly regular expressions which cover most of the cases that require tedious manual editing. You’ll find some of the examples I’ve worked with in the header of, please go to the sandbox and try out your titles. Feel free to enter any issues you find into the bug tracker.

Cheers, and have fun testing!

–keschte

Technorati Tags: ,

New fingerprinting technology available now!

I’m pleased to announce that as of right now, we have a new acoustic fingerprint provider! MusicBrainz has teamed up with MusicIP (formerly Predixis) and has integrated their MusicDNS service into MusicBrainz’ Picard Tagger. Version 0.7.0-beta1 is available for download right now! This partnership with MusicIP promises to be beneficial for both MusicBrainz and MuiscIP. … Continue reading “New fingerprinting technology available now!”

I’m pleased to announce that as of right now, we have a new acoustic fingerprint provider! MusicBrainz has teamed up with MusicIP (formerly Predixis) and has integrated their MusicDNS service into MusicBrainz’ Picard Tagger. Version 0.7.0-beta1 is available for download right now!

This partnership with MusicIP promises to be beneficial for both MusicBrainz and MuiscIP. The rough overview of our new relationship looks like this:

MusicIP provides:

  • Free fingerprint lookup services for official MusicBrainz projects. The fingerprint services are entirely hosted by MusicIP, which removes the burden of hosting service that is only tangential to our mission.
  • A GPL/APL licensed fingerprinting client library (Open Fingerprint Architecture Library aka libofa) that is ready to integrate into new applications today!
  • Other projects that wish to integrate fingerprinting services into their applications will need to sign up with the MusicDNS service. This service is free for non-profit projects (musicdns.org), and price-tiered for commercial projects such that even small startups have access. For more details, please visit MusicDNS.org.
  • $10,000 held in escrow for MusicBrainz, plus contractual commitment to supply hardware resources should MusicIP exit the fingerprinting business. This is designed to allow us to continue the service should they decide to stop providing the service.
  • a 10% cut on all income earned from fingerprinting queries where MusicBrainz metadata is provided via the MusicDNS service.
  • 800K acoustic fingerprint ids (PUIDs) that which are already loaded into our DB.
  • Travel, lodging and registration costs for myself to attend the SXSW

    conference, where all of this is being announced and released.
  • Allow us to exhibit in the MusicIP booth at SXSW this week. Including displaying our logo!

MusicBrainz provides:

  • One free live data feed for MusicIP’s use.
  • The right for MusicIP to sub-license the MusicBrainz data to their customers as part of their product offering — at full list price. MusicIP takes no cut.
  • Community support of the Open Fingerprint Architecture library. Many of the exact details on where the source code will live still need to be worked out over the next few weeks.

As you can see, the deck is stacked much in our favor. MusicIP has gone above and beyond the call of duty to setup this relationship. We’re all very excited by this new partnership, since it extends our reach into the commercial realm and welcomes MusicIP into the open source world. We are also announcing a partnership with the Creative Commons where MusicBrainz will now be able to track Creative Commons licenses.

Due to all of this, there has been a lot of frantic development at MusicBrainz over the last 8 weeks. Moving to a new colocation facility, more bandwidth, more servers, a new web service and a new text search were all in preparation for today. As you may have noticed, the MusicBrainz service has been a little bit more spotty as we’ve worked hard to push out new features and move to the new colo. The good news is that we’ve brought more database servers online to help spread the load to more machines as people come to investigate our new version of the Picard Tagger. Hopefully the web site should still respond well even if the replicated servers are working hard to handle the new web service traffic for Picard 0.7.0. Once I return from SXSW, I’ll be focusing on getting the service stable, better documented and generally ready for the future.

Last, but certainly not least, I would like to thank Relatable for the use of their TRM fingerprint technology. Without Relatable MusicBrainz would’ve never been able to grow as fast as it has. I appreciate everything that Relatable has done for MusicBrainz, but MusicBrainz has simply outgrown TRM and it is time to move on. We will continue to provide the TRM service for another 6 months from today. If you have an application that uses TRM, please visit MusicDNS.org today to find out how you can migrate to this new fingerprinting service.

I bet there will be tons and tons of questions. I will batch up the questions and the post up follow up messages to try and respond to your questions.

Technorati Tags: ,