MusicBrainz Search Overhaul

Hello people o/, samj1912 here.

I am extremely glad to announce that we are finally launching our Solr search on the MusicBrainz beta server!

Just a little history before I announce the new features and toys you get to play with:

Solr started as something that could replace our existing search infrastructure. If you have been a MusicBrainz user for a while, you might know that our search has quite an indexing latency and it takes as much as 3 hours for new edits to show up in the search results. In part because updating the search index involved doing an entire re-index of the database. With the high latency and the resources it took, the current search server left much to be desired.

Another area that our current search lacked in, was showing popular results and search ranking. Searching for a famous artist or place returned results that contained a lot of noise, and more often than not, contained results that weren’t relevant to what the user had in mind when they searched for it.

These were the two major problems that motivated us to shift to a better infrastructure for our search needs.

Thus, MB-Solr was born.

It has been in development for quite some time now. The coding for the project started with Mineo back in 2014 and was carried forward by Jeff Weeksio in GSoC 2015. But due to lack of development resources and other, more pressing needs, the project was put on a hold for a while, until Roman started working on it. However, he left MetaBrainz before he could finish this work, so when I joined the MetaBrainz team, the first and foremost task that was assigned to me was getting Solr working and ready for production.

After struggling with multiple moving parts and services, tons of issues with maintaining compatibility with our existing web-service API, rowing up and down multi-threading/processing hell, learning just enough about information retrieval to get our search relevance on point and countless hours sifting through Solr documentation to get our Solr cluster fine-tuned and running fast enough to keep up with our web traffic… we are finally here.

I am pretty sure I would’ve rage-quit dozens of times during this last year if I was doing this all alone.

As such, we have our trusty sysadmin Zas to thank for taking care of all the deployment needs and making sure Solr was well-tested (believe me we toyed with Solr like little kids in a sandbox) and wasn’t going to fail and wake him up 3 AM in the morning with red alerts all over. Mineo, Bitmap and Yvanzo were there, with much-needed code reviews and help with all things Solr and MusicBrainz. Our style leader Reosarevok, and CatQuest helped us test our new search relevance configuration. And of course, we had our BDFL, Rob over-seeing things and whipping them into shape (with chocolate and mismatched socks of course).

Anyway, here’s what you are here for:

New features/improvements

  • (Almost) Instantaneous search-index updates – Edit something and immediately see it in the search results. Say goodbye to that note you used to see below the search telling you that you have to wait. Who likes waiting anymore – seriously, it’s 2018.
  • Better search results – We wanted to make sure you were getting the right Queen and London as the top result. You can finally link your favorite artist to London, UK as opposed to London, Arkansas. Don’t believe me? Go try it out.
  • Less load on our servers – Meaning we can serve more of your requests, faster. Getting tired of waiting for tagging your bajillion songs in Picard? Well, you still gotta wait, but less so, now that we are better equipped to handle your requests.

What has stayed the same

  • WS/2 Search API – We know you devs hate doing that extra work to maintain your applications’ compatibility with that one site that changes its API on a whim. Well, we wouldn’t want you to spend those hours following that one int to float change that broke everything ever. As such we have worked hard to make sure that Solr doesn’t change any of our WS/2 search schema.

What’s gone

  • WS/1 Search API – We deprecated WS/1 back in 2011. With the new search servers in place, there are only 3 words for those still using it after WS/1 being deprecated 7 years – ‘poof, it’s gone’. The service still works on our main website, but its search functionality will be phased out soon, while the entire service will be discontinued in August 2018 as announced earlier.

Now, you must be thinking there is some catch, some slip. Well so do I, which is why we are releasing this beta for you to test the heck out of our new search over at the MusicBrainz beta site. If you haven’t used it before, worry not – it has all your personalizations and all our cool music metadata from our main site. You should feel at home. (Note: The MusicBrainz beta site works on the live data. Any edits you make on the MusicBrainz beta site will also be reflected on the main site.)

So please! Go check it out!

If you feel you aren’t getting what we promised you or you want more of those shiny new features or that this blog was too long or like a TV commercial, feel free to complain at our Ticket Tracker for Solr. You get your promised features bug-free and our devs get to earn their living. It’s a win-win.

Happy testing!

Picard 2.0 beta2 announcement

Hello people,

Thank you so much for reporting bugs in our Picard 2.0.0beta1 release. We fixed most of the critical bugs that you guys and gals reported. You can find the beta2 release with the fixes here – Picard 2.0.0.beta2

If you have been following our Picard related blogs, you will know that we decided to release a new stable version of Picard before the beginning of the summer.

To help us, advanced users, translators and developers are encouraged to:

Note – If any of you are seasoned Windows/macOS devs and have experience with PyInstaller, we need some help with PICARD-1216 and PICARD-1217. We also need some help with code signing Picard for OSX. Hit us up on #metabrainz on freenode for more information. We will be very grateful for any help that you may offer!

A simplified list of changes made since 1.4 can be read here.

Be aware that downgrading from 2.0 to 1.4 may lead to configuration compatibility issues – ensure that you have saved your Picard configuration before using 2.0 if you intend to go back to 1.4.

ListenBrainz winter 2018 beta testing

After many more months of hacking on core infrastructure and improving our codebase, we’re finally ready to have more people come and help us test the latest beta version of ListenBrainz. Also, we’ve recently reached a milestone of the 100th million listen in our database!

We’ve made a some internal changes to the project (that took quite a bit of effort):

  • Improve hosting setup that allows us to run both the production and beta version of the site at the same time. This means that any data submitted to the beta site will be submitted to the master listens database and will be available in the BigQuery data set as well. We are mimicking the setup that MusicBrainz has — the beta site use a live database so that testing the service can work with live data.
  • Improve internal container setup to allow for both dumping the listen data and private data for complete backups.
  • Improve the speed with which we process incoming listens.

These internal changes will allows us to move to more frequent updates of ListenBrainz in the future! More important are the changes to the site that are user visible:

  • Statistic infrastructure: We’ve created an infrastructure for creating graphs of user’s listening behaviour. So far we’ve only got an all-time top-artists graph to illustrate our setup, but soon we will work to create more graphs. Currently graphs will be generated every Monday starting at 0:00 UTC, if you logged in into your LB account during the last 30 days. If you haven’t logged in recently, you can request the calculation of your stats from your profile page.
  • Automatic data dumps: Now the ListenBrainz data will be dumped and synced to our FTP site twice a month. Currently this is scheduled for the 1st and the 15th of every month. The dumps will start being generated at 04:00 UTC and then copied to our FTP site and it will take a number of hours for the data dumps to appear on the FTP sites. Our documentation details how this data dump can be consumed.
  • Documentation improvements: Quite a few documentation bits have been improved since our last release, including better documentation on the Last.fm compatible API that ListenBrainz exposes.
  • Static page improvements: We’ve done some rearranging of our static pages and navigation bar to reflect the latest changes, including updating the data page and our roadmap page.
  • Listen count on home page: The home page now shows the current listen count.

If you’re interested in helping us test, please use the beta site and test everything you can see. See if anything misbehaves and if you do spot any problems, please report them to our bug tracker! Hopefully we can push this live next week.

NB: The beta site is connected to the live database, so any listens you submit to it, will be part of your official ListenBrainz listen history!

Picard 2.0 beta announcement

Hello people,

We saw a flurry of updates to Picard these last few months and I am happy to announce that Picard 2.0 is finally in beta. You can find it here – Picard 2.0.0beta1

If you have been following our Picard related blogs, you will know that we switched up our dependencies a bit. What this means is that Picard should look better and in general feel more responsive.

We also decided to release a new stable version of Picard before the beginning of the summer.

To help us, advanced users, translators and developers are encouraged to:

A simplified list of changes made since 1.4 can be read here.

Be aware that downgrading from 2.0 to 1.4 may lead to configuration compatibility issues – ensure that you have saved your Picard configuration before using 2.0 if you intend to go back to 1.4.

 

 

ListenBrainz enters Beta stage

I’m pleased to announce that we released our first official beta version of ListenBrainz yesterday! As you may know, ListenBrainz is our project to collect, preserve and make available, user listening data similar to what Last.fm has been doing, but with open data.

In 2015 a small group of hackers gathered in London to hack on the first version of ListenBrainz alpha. We threw together a pile of new technologies and released the first version of ListenBrainz at the end of the weekend. In the end, we didn’t really like the new technologies (Cassandra, Kakfa) as both ended giving us a lot of problems that never seemed to end.

In 2016 we embarked on a journey to pick new technologies that we liked better and ended up setting on InfluxDB and RabbitMQ as backbones to our data ingestion pipeline. These tools were a good match for us, since we were already using them in production! Sadly, MetaBrainz’ move to our new hosting provider ended up sucking up any available time we had to devote to the projects, so progress was made in fits and starts.

Earlier this year Param Singh expressed interest to help with the project in hopes of joining us for a Google Summer of Code project. He started submitting a never ending stream of pull requests; slowly the project started moving forwards. Together we brought the codebase up to our current standards and integrated it into the workflow that we use for all of the MetaBrainz projects.

We proceeded to prepare the next version to be released at MetaBrainz’s new hosting facility and started a never ending series of tests. We kept pounding on the data ingestion pipeline, trying to find all of the relevant bugs and ways in which the data flow could get snagged. Finally the number of reported bugs relating to data ingestion dropped to zero and we managed to import 10M listens (a listen is a record of one song being played)!

That was our cue for promoting our pre-beta test to a full beta and unleashing it onto our production servers at our new hosting facility. Today we cleaned up the last bits of the release and we are ready for business!

What does this new release bring for you, the end users? Sadly, only a few new things, since most of the work has gone into building a stable and scalable system. We do have a few new things in this release:

  • Incremental imports from Last.fm — now you don’t have to do a full import any time you wish to import your latest listens from Last.fm. The importer knows when you last did and import and will work accordingly.
  • Last.fm compatible submission interface — with some system configuration changes you can submit your listens directly to ListenBrainz from any application with Last.fm support. (more info here)
  • Last.fm file import — if you have an old skool Last.fm zip file with your listening history backed up, you can now import it.
  • User data export — you can now download your own listens straight from the site, no waiting required.
  • Adaptive rate limiting on the API — our server now uses a modern rate limiting system. For details, see our API docs.

The good news is that Param is now working on his Summer of Code project that will add a lot of graphs and other critical elements for making use of this new data set. We hope to release new features on an ongoing basis from here on out.

Most importantly, we want to publicly state that ListenBrainz is now ready for business! We don’t plan to reset the database from here on out — this is the real deal and we plan to safeguard and make this database available as soon as we can. If you have hesitated with sending your listen histories to ListenBrainz in the past,  you should now feel free to send your listen information to us! If you are an author of a music player, we ask that you consider adding support for ListenBrainz in your player!

In a follow-up blog post I am going to write about how to start using ListenBrainz now — at the very least use it to back-up your Last.fm listening history!

If you find bugs with our latest release, please report them to our issue tracker. If you’re interested in this project and have questions for us, why not come and pop into our IRC channel or ask a question on our community forum?

P.S. The alpha version of ListenBrainz is still around.

P.P.S. We’ll have another cool announcement very shortly! Stay tuned!