User agent based throttling is now live

Yesterday we talked about rolling out our throttling based on User-Agent strings. A few minutes ago we pushed this feature live on our servers so now the updated rules are in effect. python-musicbrainz/0.7.3 users are now allowed 500 requests every 10 seconds and every single one of these requests is constantly being used. No surprise here. 🙂

For the exact details on what is throttled and how to get around your application being throttled, see our rate limiting documentation.

Current web service rate limiting documentation

We’ve just added a page that documents what we’re currently blocking on our Web Service. We hope to lift the block on python-musicbrainz/0.7.3 tomorrow and instead throttle the number of requests it can make in a given period of time.

I’ll post another entry once we’re done with making those changes.

Dear python-musicbrainz/0.7.3 application, we need to talk!

An application that uses our python-musicbrainz/0.7.3 client library has been putting undue load on our servers all at once. This application looks up something at MusicBrainz at 03:00UTC causing our servers to be overloaded at that time each day.

To protect our servers from being overloaded we’re going to block this application from 3:00 UTC – 4:00 UTC. We’re hoping that this will alllow us to identify the application and start a dialog with the application authors. Once we have established communication with the authors and worked up a plan to fix this, we’re going to release the block.

We really dislike blocking applications, but if applications are being inconsiderate of our resources, we’re left with few options. We hope to hear from the application authors soon so we can resolve this issue. Also, we’re moving forward with our plans to require User-Agent strings that properly identify applications using our service to fix this problem going forward.

If you are the author of said application, please leave a comment with information on how we can get in touch with you.

Release editor service interruption

In an effort to mitigate/fix MBS-3379 we need to restart the service that keeps the session information for our release editors. We’re going to do that tomorrow Saturday October 28 at Noon PDT, 3PM EDT, 8PM London, 9PM Amsterdam. If at this time you have a release editor open, submitting your edits will fail and you will need to start your edits over again.

Sorry for the inconvenience.

Web service user-agent string blocking reminder

I would like to remind Web Service users that on 16 November we’re going to block generic User-Agent strings from accessing our web service. Earlier we said:

The User-Agent string needs to identify the application and the version of the application that is making the request; having a generic User-Agent string like “Java/1.6.0_24″ or “PHP/5.3.4″ does not allow us to properly identify the application making the requests.

IMPORTANT: 6 Months after we release NGS (Nov 16th) we’re going to start blocking common generic User-Agents strings, so please make sure that you send us a proper User-Agent header as part of your request.

You have been warned. 🙂

The FreeDB gateway has been updated to NGS!

I’m pleased to announce that the FreeDB gateway, which lets you fetch MusicBrainz data via FreeDB enabled applications, has been updated to use the NGS database. As of now, its updating with new data from the main MusicBrainz server and you should be able to look up new CDs.

To use the FreeDB gateway, set the FreeDB server in your application to freedb.musicbrainz.org on port 80. For more information, please take a look at our wiki page for the gateway.

Thanks to Lukas for porting this code to NGS!

Record traffic, new server in NGS rotation

We’re currently seeing record levels of traffic right now. Since the beginning of August our traffic has gone from 9M hits per day to 14M hits per day today, which is a significant increase in traffic. Fortunately our servers were able to handle the extra load, but we started getting near capacity.

Yesterday I started working on taking one of the servers from the classic site and moving it into the NGS cluster. We finished this a couple of hours ago and we now have a maximum limit of 227 queries per second (qps).

If you find any spurious troubles with the site in the next day or so, please open a ticket.

Google uses MusicBrainz data in some of its searches!

Earlier this week I met with Shawn Simister, who works on Google’s Freebase project (former from MetaWeb) to touch base about how MusicBrainz is being utilized inside of Google. MusicBrainz represents a large chunk of the music data in Freebase and in turn the Freebase data is used as one of the sources of data for Google’s search.

Shawn explains this in more detail:

You can actually see a couple areas where we’re using the Freebase music data publicly. First, in the structured refinements in search. If you search for lady gaga albums and scroll to the bottom to see “Album searches for Lady Gaga”. Also you can see videos clustered by topic in YouTube Topics and many of the topics are music-related.

It’s important to keep in mind that Musicbrainz is just part of the solution. It’s a pretty big part of Freebase music data and therefore its likely to be a pretty big component in these results but as you know the search results team at Google is pretty secretive about what all goes into the results page so even I can’t tell for certain when they’re using Freebase/Musicbrainz data for any given result.

I think it’s important that people don’t mistake this as a one-to-one relationship between Musicbrainz data and Google results because there are quite a few steps in between but there’s definitely a strong connection there and we really appreciate everything that the Musicbrainz community is doing and hope that Musicbrainz community continues to grow.

I find this tremendously exciting to hear, since I proposed a very similar thing to Google many years ago. While this idea was rejected back in the day, I’m excited to see that Google is now using our data for it searches. Every person who has ever contributed to MusicBrainz should be proud!

Thank you to everyone and thank you Shawn for shedding some light on this!

The EchoNest releases Echoprints: The open source fingerprint era has begun

Amplifind, the company that operates the MusicDNS/PUID service, recently sold its intellectual property and the PUID service will be going away eventually. It is exactly this reason why we’ve been uncomfortable relying on closed source fingerprinting software to make MusicBrainz tick.

Fortunately, I’m pleased to announce that the open source fingerprint era has begun!

Lukáš Lalinský has been working on acoustid for months now and today the EchoNest, in conjunction with 7Digital and MusicBrainz, has issued a press release that announces Echoprint, their fully open source fingerprinting solution.

The source code has already been released on github: echoprint-server and echoprint-codegen.

We’re pleased to announce preliminary support for Echoprint in MusicBrainz on our echoprint test server. The Echoprint system works similarly to how PUIDs work in MusicBrainz right now. You can use echoprints anywhere you can use PUIDs on the Echoprint test server. The version 2 of our XML web service (on the echoprint test server) now supports submitting and fetching Echoprints. To submit an Echoprint, refer to our web service documentation and example page and use echoprint wherever you’d use puid. For instance, to submit an Echoprint to the test server, POST an XML document like the one below to the /ws/2/recording resource:

<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
    <recording-list>
        <recording id="e97f805a-ab48-4c52-855e-07049142113d">
            <echoprint-list>
                 <echoprint id="TRN5NGX1187AB4F786"/>
            <echoprint-list>
        <recording>
    <recording-list>
<metadata>

Although it remains to be seen when the Echoprint system will be mature enough for inclusion on the live MusicBrainz servers, going forward MusicBrainz will only support fully open source fingerprint solutions, starting with Echoprint and acoustid. We are saying no to more closed source solutions, which have never worked out well for us. MusicDNS/PUID is now officially end of life and should not be used anymore in new development.

We look forward to working with the EchoNest closely to finish up the development of the Echoprint system and to fully integrate it into MusicBrainz when it matures.