MetaBrainz Blog – Page 51 – MetaBrainz Foundation Community Blog

Sophie Goossens joins the MetaBrainz board of directors (and more!)

I’m pleased to announce that Sophie Goossens, an attorney in London, has joined the board of directors of the MetaBrainz Foundation. Sophie specializes in intellectual property law and has ties to the European Commission, which makes her a great addition to our board of directors.

Welcome to our board of directors, Sophie!

Sophie replaces Carol Smith who decided to move on from the board after leaving her position as the head of Google’s Summer of Code program. Carol joined us in late 2009 and has held the position as treasurer & secretary since then. Two years after joining us, she became a full director in early 2011.

Thank you for everything you’ve done for MetaBrainz in the past 6+ years, Carol!

Last, but not least, we needed to fill the Secretary/Treasurer slots that were vacated by Carol. Luckily for us, our business development manager Christina Smith stepped up to those duties and was voted onto the board back in February. (Now that all of these changes are complete, we can publicly speak about them.)

Thank you for taking on these two positions, Christina. I’m also quite happy that we’ve preserved the balance of people with the last name Smith in our board. 🙂

Thanks Sophie, Carol and Christina!

Help! Is there a Lucene doctor in the house?

UPDATE: Thanks to user selckin in the #lucene IRC channel for quickly solving this for us! Hopefully we can put this fix into production later today!

As our regular readers may know, we’ve been having lots of troubles with our lucene based search servers. Over the past few days we’ve spent a fair amount of time, tuning, debugging and otherwise trying to troubleshoot our setup. We’ve fixed and identified a number of problems, but most importantly we feel that we’ve identified the core issue: Our servers are simply overloaded.

Under normal conditions we find our servers loaded to about 25% – 35% CPU — things look good and we don’t think we have a capacity problem with our servers. Then a slow query comes in that starts to slow things down. Much like a traffic jam that evolves out of thin air, one slow query can make a giant mess for everyone.

We’ve started timing our queries and most of the time, they can be measured in milliseconds. However, when things get bad, they may take up to 7-8 seconds. Our upstream web servers time out on the search request after about 5 seconds in order to prevent traffic from getting backed-up. What we need to do next is to limit the duration that a lucene query can run and terminate it after the timeout.

I’ve started looking at this and quickly realized that this is much more of a job than adding a simple timeout parameter to the search call. We’re currently using this search function from IndexSearcher:

public TopDocs search(Query query, int n);

Ideally I would like to add a way to timeout queries after 3 seconds. So far, I’ve discovered that we could use

public void search(Query query, Collector results)

with a TimeLimitedCollector. The old call returns TopDocs and our code assumes that we have a TopDocs object from which to cull our search results. Having stared at the docs for lucene for a while, I haven’t found an way to convert the data in TimeLimitedCollector and convert it to TopDocs. It doesn’t make sense to me. 🙁

How does one do this? Sadly, we have no Java programmers on our team, so we’re quite a bit out of our league here. Is there an easier way to do this? Would someone be willing to write this code for us and submit a PR? We’d find some really good chocolate and send it to you if you do!

More info on our project:

This project provides critical search functions for MusicBrainz.
The source lives here
My attempt at converting this code to use the TimeLimitedCollector is here: https://bitbucket.org/metabrainz/search-server/commits/ce00b13b799c1e69e24fa87299342144ec481674

We are using Lucene 4.10.4 on a custom codebase that pre-dates SOLR — we have a new SOLR project to replace this one, but it isn’t quite done yet. (Again, not having Java programmers is a bit of a problem for us).

Any tips, explanations or pull requests would be deeply appreciated! Chocolate reward offered!

Thank you!

State of the Onion: MetaBrainz

In the past few weeks we’ve been hit with several traffic increases to MusicBrainz which is putting considerably more strain on our aging infrastructure than we’re happy with. If it seems that we’re not doing anything about it, that is because we’ve been busy behind the scenes trying to keep things moving forward. This sometimes doesn’t leave us a lot of time to keep the public informed on our work. Hopefully this blog post will fix this in the short term:

In 2011 we started to make plans to move MusicBrainz hosting into the cloud, but then out of the blue we were donated a pile of machines. There were so many machines that I postponed the cloud plans and prepared the donated machines for service. That has carried us for 4+ years with almost no hardware cost, which was really great. The plan was to move to the cloud sometime around 2015, but then I spent most of 2014/2015 dealing with conflicts in the team, putting us seriously behind schedule while our hardware decayed.

On top of that, we’ve recently had some “bad luck”. We have had some disrespectful commercial customers hit us really hard and we had to find and block them. We have had unexpected traffic spikes and when trying to address these unexpected traffic spikes, we had two more machines fail on us. These were the donated machines that we kept in reserve for just this moment. The loss of two machines caught us short on capacity to handle the increased demands on our servers.

So, now we face the tough question: Do we buy expensive hardware that we might use for 6 months (~$5000) or do we try and save the money and tough it out? I’d rather not spend so much money on such short term use if we can avoid it. We’re going to try and move to a new hosting facility somewhere in the EU, since that is where most of our users are.

Moving to a new hosting facility has an incredible number of dependencies that Christina (our Biz Dev manager), Zas and I have been working through. It may not seem like we have a plan, but we do, and we’re incredibly busy trying to make the plan happen. To give you a taste of what we’re up against:

We want to move our hosting to Europe and have a business presence in Europe in order to reduce the costs and inefficiencies of being a solely US based business. A lot of our traffic, customers and contractors are in the EU and it simply makes sense to have a presence here.
To establish a presence in the EU I needed local help to help with the business matters as well as researching and establishing an EU organization. So I needed to find a Biz Dev manager and that person is Christina.
Once Christina was on board she researched our options about what was suited for us. Getting that process moving involved getting certified documents from California, board approval for spending funds to establish the organization, EU labor law research, (and we needed to swap a board member, too!), hiring help to establish the org. and generally navigating the Spanish bureaucracy. (See this only slightly exaggerated short film for some clues of our ordeal.)
Once the org. had been established we needed to convince the bank to open a bank account for us. The draconian US banking laws extend worldwide and the local bank had to ensure that they were not opening themselves up to thousands of $$$ in accounting hassles just to allow a tiny non-profit to open a bank account. We finally have a bank account and have started paying our contractors with it!
At the same time we’re also working to set up an office for the growing team here in Barcelona. That required a byzantine process that barely started when you sign the lease. Getting power, internet and water set up has taken a frustratingly long time. Had I known how long, I would have stayed at my co-working space for a while longer while addressing hosting issues.
While Christina has been focused on the hardcore paperwork, Zas is keeping the site running, which itself requires many heroics. Zas and I have started planning the move to the EU hosting provider. We’ve got a 5-page document that collects some of the open questions and requirements around this process: https://docs.google.com/document/d/16KNm4KksNwz29Opk1aILOMtCmPIeXFuxxUjMoPT3th0/edit#heading=h.dpfvoz1idcro. Right now Zas and Bitmap are here in Barcelona and we’re going to work on establishing a formal plan for moving to the new hosting company. We’re currently comparing hosting company offerings – see what we’ve collected so far if you care to follow along. The amount of work required to make this happen is making my head hurt. (A special shoutout to KodeStar, lead developer of FanArt.tv, for providing a lot of useful feedback about our various options.)
While Christina, Zas and I have our hands full, Bitmap and Gentlecat continue to release new features and work on the schema change. Not to mention all the contributions from Freso and Reosarevok to keep the community happy and polite while we deal with less than optimal site conditions. That said, I am really happy and proud of my team, trying to keep things running in sub-optimal conditions.

This is just a snapshot of everything that is happening behind the scenes that will culminate with the goal of moving to a new hosting company and being set up in the EU. And mind you, we’re doing this with a minuscule budget trying to be careful of how we spent our money.

Important: Schema change delayed to May 23

With our ongoing hosting issues due to massive traffic increases and failing hardware we’ve been too distracted trying to manage those issues to finish all of the testing for the schema change release that was scheduled for today.

We deeply regret having to do this, but we’re going to delay the schema change release by a week. It is now scheduled for May 23, 2016. This week long delay will give us a chance to further tweak our server configuration (more on this in the next blog post) and to test the schema change release in much more detail.

We are, however, going to upgrade our database server to Postgres 9.5 either later today or tomorrow. During this upgrade we are going to employ a back-up database server and keep MusicBrainz running in read-only mode with a slightly reduced overall capacity (I’m sure everyone know what that means by now). This upgrade should have no other effects on our downstream data users.

We will give people plenty of notice before we start the postgres upgrade via our site banner and via our Twitter account (@musicbrainz).

Sorry for the continued drama affecting our services — we’re working hard to keep things together!

Important information about the May 16 schema change release

In the past few weeks we’ve been hit with massive increases in traffic and a couple of hardware failures. Trying to maintain a decent service quality in light of both of these events have taken a lot of time of our team and we don’t feel 100% confident about the schema change release tomorrow.

Fortunately, the entire team will be together in one place tomorrow. The first thing we’re going to do is review the current state of affairs and decide how to tackle the Postgres upgrade and the release. As soon as we have our plan put together, we will post an updated blog entry with all of the needed details. But, we may very well delay the release by 24 hours.

However, we found that we ran out of time on one feature: MBS-6024: Support more than one barcode on same release. This one ticket will not be included in the upcoming release. We’re really sorry for letting that one issue slip — sorry for any inconvenience this may cause you.

Temporary autoeditor election procedure

Hello people,

In former days, new autoeditor elections were announced on the autoeditor mailing list that all autoeditors were automatically subscribed to. However, all our mailing lists, including the autoeditor one, died in September 2015 when the server they were hosted on took its last breath. This effectively halted the election of new autoeditors. It was always the plan that our new forums should be able to handle this, but our recent issues have meant development on the features necessary to completely replace the autoeditor mailing list has been slow. Thus Nicolás (reosarevok) and I had a talk about how to handle elections going forward, and we came up with this procedure:

The proposer nominates the editor normally (starting an election), and then adds a post in the Autoeditors category of the forum linking to the election and asking for seconders. Like with the mailing list, this should also contain the proposer’s reasoning for proposing the candidate.
Nicolás (reosarevok) will then mail out to all autoeditors with a link to the election topic (and possibly a link to this blog post).
The proposer, Nicolás or me (or another autoeditor, if they’re faster) will update the thread once a 2nd seconder is found and the voting has started, and again when the voting has ended and the results are in.

We have added most autoeditors who are already signed up on the new forums to the @MB_Autoeditors group, but not all autoeditors have signed in to Discourse yet, and some have spaces or other “weird” characters in their username that make Discourse not able to parse them. If you find that you’re not in the @MB_Autoeditors group and you think you should be, please write a message to me (Freso) or Nicolás (reosarevok) via MusicBrainz and ask us to add you to the group. (Sending your message via MusicBrainz will let us know that it is indeed you/your account, so please don’t poke us on IRC or elsewhere about it.)

This is all obviously intended to be temporary, just until we’re able to get the process fully automated again. If you have any Ruby experience/know-how and would like to help out, please check out OTHER-248 (and possibly OTHER-254). There’s also MBS-8836 on the MusicBrainz server side for the Perl-istas.

Let us know if you have any concerns or questions about this (reminder: temporary) approach, either in the comments or on the forums. I personally hope it’ll work well enough to carry us through for a while longer until everything is ready.

GSoC 2016 students and projects

Google announced the final list of Google Summer of Code 2016 students and their projects yesterday. The list of MetaBrainz’ projects can be seen at our page on the GSoC site, but just for good measure, here’s the rundown:

MusicBrainz: Jeff Weeks (weeksio) returns to finish up the SOLR search server. We’re really hoping that this will be the end of our current search server woes. He will be mentored by the German duo of Ulrich Klauer (chirlu) and Rob Kaye (ruaok).
MusicBrainz Picard: Rahul Raturi (rahulr) will be working on improving searching MusicBrainz from within Picard, mentored by MusicBrainz’ senior developer Michael Wiencek (bitmap).
BookBrainz: Max Prettyjohns (QuoraUK) is going to try and take on adding gamification to our fledgling book/literature database. He will be supervised by the BookBrainz project leads and lead developers Ben Ockmore (LordSputnik) and Sean Burke (Leftmost).
ListenBrainz: Pinkesh Badjatiya (armalcolite) has pledged to tackle adding a much requested feature for our youngest project: implementing a Last.FM compatible submission API. Robert Kaye (ruaok) will be the one guiding him along.
AcousticBrainz: Daniele Scarano (hellska) will be spending the summer writing a toolkit for creating datasets, which should help researchers using AcousticBrainz. He will be mentored by MetaBrainz software engineer Roman Tsukanov (Gentlecat).; Kartik Gupta (kartikgupta0909) has set out to create an offline client for computing AcousticBrainz dataset evaluations. Alastair Porter (alastairp), the AcousticBrainz project lead, will be their mentor.; Goran Cetusic (cetko), our final student of this year, will be exploring how AcousticBrainz data can be utilised within Google’s BigQuery storage under the guidance of Alastair Porter.

Congratulations and good luck to all our students! We’re looking forwards to following your progress over the summer and see what you end up with. 🙂

For all the students that applied but did not get accepted: we appreciate your applications, and even if you did not make the cut this year, we hope that you will stick around and apply with us again next year when we know you better – and you know us better.

For now, let the community bonding… begin! 🙌

Announcing python-musicbrainzngs, release 0.6

From the better late than never department…

After more than 2 years we’ve finally released version 0.6 of python-musicbrainzngs, a library for accessing the Musicbrainz webservice from python.

After such a long time we have perhaps too many new changes to describe. Some major changes include:

Better handling of authentication private user collections
Support for loading all types of user collections (artist, event, place, recording, release, work)
Work attributes
Support for the Cover Art Archive
Support for Events, Instruments, Places, and Series

And numerous other bug fixes and small changes. See the CHANGES file for more information.

This release contains contributions by Alastair Porter, Corey Farwell, Ian McEwen, Jérémie Detrey, Johannes Dewender, Pavan Chander, Rui Gonçalves, Ryan Helinski, Shadab Zafar, and Wieland Hoffmann. Thank you everyone!

The new version can be downloaded from github, pypi, or installed with pip

https://github.com/alastair/python-musicbrainzngs/releases/tag/v0.6
https://pypi.python.org/pypi/musicbrainzngs
pip install musicbrainzngs

Server update, 2016-04-04

This is a small bug-fix release while we work on finishing the May schema change update. Thanks go to reosarevok and ethus3h for their patches this time around. The git tag is v-2016-04-04 and you can find the complete changelog below.

Bug

[MBS-8850] – No events tab for tags
[MBS-8861] – Vertical spacing off on editor profile if “last login” is missing (account admins only)
[MBS-8874] – Editing an entity sometimes shows it as a possible duplicate of itself
[MBS-8886] – Header menus should work without JavaScript

Improvement

[MBS-8591] – Increase pagination item count

Deprecating MBIDs

This post is an April Fools joke. Rest assured, we have no intention of changing the MBID system that MusicBrainz currently uses.

But, like all good parody news items, there is an element of truth behind this post. The announcement of the Echo Nest API shutdown is real, and with this change you will no longer be able to use the Echo Nest IDs to look up information. This particularly hurts users of the Million Song Dataset, which maps each track to the Echo Nest ID. The new Spotify API isn’t even providing any compatibility api or ID mapping, leaving users to look up 1 million Spotify IDs in the remaining months that the the old Echo Nest api will remain available.

At MusicBrainz, we understand the importance of a stable identifier system. That’s why, 16 years ago, we picked these unwieldy-looking UUID identifiers – that have since proven to have stood the test of time, with room to continue growing. You can look up an MBID made 16 years ago, today – and it will still work another 16 years in the future.

Hello all,

Following Echo Nest’s bold announcement that Echo Nest ids are being replaced by Spotify IDs, we figured it was time to make our own ID change public as well – MBIDs were a fantastic idea 16 years ago, but let’s face it, they’re not the most beautiful thing around, so our MBIDs will now also be replaced by Spotify IDs to help with a proper mapping across tools. Anything without a Spotify mapping will simply get purged. This should greatly simplify the data we have and remove any doubt for some releases whether they exist or not – if they’re on Spotify, they clearly exist!

We would like to commend Echo Nest on their brave leadership in this, giving us the courage to move on from our ancient heritage and try new things. With the speed technologies evolve in this digital age, it can be hard to keep up with things and keep things fresh, but Echo Nest is showing the way forward, and we’re delighted to be able to follow so quickly in their path.

I hope you all will welcome this bold move by our team. We hope to have it ready by next schema change. We know we’re excited! 😀

PS. No, we will not provide a mapping between MBIDs and the new Spotify IDs. We trust our data users to be capable to set things up on their own. Happy hacking! 🙂