An application that uses our python-musicbrainz/0.7.3 client library has been putting undue load on our servers all at once. This application looks up something at MusicBrainz at 03:00UTC causing our servers to be overloaded at that time each day.
To protect our servers from being overloaded we’re going to block this application from 3:00 UTC – 4:00 UTC. We’re hoping that this will alllow us to identify the application and start a dialog with the application authors. Once we have established communication with the authors and worked up a plan to fix this, we’re going to release the block.
We really dislike blocking applications, but if applications are being inconsiderate of our resources, we’re left with few options. We hope to hear from the application authors soon so we can resolve this issue. Also, we’re moving forward with our plans to require User-Agent strings that properly identify applications using our service to fix this problem going forward.
If you are the author of said application, please leave a comment with information on how we can get in touch with you.
How do you want to block it if the application is unknown? Is the IP address always the same?
hrglgrmpf: We’re blocking that User-Agent string, which is all we know of the application. Well, we know that they use our library, but that is it.
A wild guess: someone is running a complete collection update via cron on Ubuntu 11.10 using albumidentify (https://github.com/albumidentify/albumidentify), which in turn uses python-musicbrainz2 version 0.7.3.
That could make sense. I suspected Ubuntu 11.10, since the pattern of our traffic growth makes sense in that context.
Ugh, python-musicbrainz2 doesn’t seem to allow setting a custom user agent string (would probably work with monkey-patching though). So essentially, you’d be blocking everything that uses the current version of python-musicbrainz2. However a repeating time pattern hints at something cronnable and it’s using Python, in which case albumidentify is most certainly one of very few candidates (apart from some custom software), so I guess you’ll find the evil one soon enough.
Yep, to every single one of those points. 😦
Why are you even talking about blocking it? Just throttle its speed such that the load isn’t be a problem anymore. For example you could add a sleep(3) to each of its requests. As long as it’s not making an infinite number of parallel requests, that should slow it down enough.
intgr, as far as I know, they want to block it so the person will actually notice, come and say “hey, it’s me, what should I do instead”. Will it work? No idea.
intgr: If we throttle them, then we’re quietly providing bad service. We’d rather identify the problem, have a chat and go back to providing good service asap.
This is probably obvious, but do you realize that you are blocking every single application that uses python-musicbrainz2?
Luks: Yes, I realize that.
It’s not me.
Just hit this as I happen to try and check a release during the special time period. Can this really not be stopped by blocking this bad user’s IP?
At some point you might have to move to a Developer ID like mechanism, like Google, MS and others are doing. Although I do not really like that, it still is the only way to properly identify which application (or application author) is abusing / overloading your service and effectively block it…
Luke: or just block the default user agents of the various MusicBrainz client libraries.
Unless there are also problems with people spoofing user agent strings, then requiring client registrations sounds a bit over the top.
Have a look at:
https://github.com/rembo10/headphones/issues/386
I just wasted 30 mins to find out that the block on python-musicbrainz is now applicable all day long.
Do you still think it’s albumidentify at fault?
Craig: We rolled out throttling support a few minutes ago. See this doc for more information: http://musicbrainz.org/doc/XML_Web_Service%2FRate_Limiting
Did you ever identify the application that was causing all of the requests?