GSoC 2018: A way to associate listens with MBIDs

Hi, I’m Kartikeya Sharma, a postgrad student at National Institute of Technology, Hamirpur. I’ve worked on the project MessyBrainz as a student developer for GSoC 2018. Robert Kaye mentored me during this GSoC programme. The goal of my project is associating MBIDs to MSIDs and clustering together the MSIDs which represents the same MBID. The MBIDs represent MusicBrainz Identifier. It is an Universally Unique Identifier that is permanently assigned to each entity in the MusicBrainz database, MSID represents MessyBrainz Identifier which is associated with each unique recording, artist_credit and release in MessyBrainz database. In simple words MSIDs represents unclean metadata whereas MBIDs represent clean metadata.

This blog post summarizes the work that I did in my project, which was divided into three parts.

Processing the data already in MessyBrainz database

The first part involves creating clusters using the MBIDs already present in the MessyBrainz database. This involves creating clusters for recordings, artists, and releases. To implement this part I created the following three PRs #37, #41, and #44.

After that, I began to work on the second part of this which involves creating clusters using the artist MBIDs and release MBIDs and names fetched from MusicBrainz database. I needed to access MusicBrainz database, for that, I first had to work on BrainzUtils to have methods to access MusicBrainz database to fetch artist MBIDs using recording MBIDs and release name and release MBIDs using recording MBIDs. The part to fetch artist MBIDs was done during the community bonding period in PR #14 at BrainzUtils and to fetch releases I created PR #18 at BrainzUtils during GSoC coding period. After that, I created a PR to create clusters using the fetched artist MBIDs #47 and another one to create clusters using releases fetched #49.

I did write around 60 tests which proved to be vital in making sure that the code does what it’s supposed to do.

Processing the data as it is inserted into the MessyBrainz database

Creating clusters for the data inside the database requires a lot of resources. So, it was better to create clusters as recordings are inserted into the database but, even this type of clustering is not efficient. So, to cluster these recordings first these recordings are sent to rabbitMQ server and from that, these are sent to a clustering script which runs in a different container and runs continuously and clusters the incoming recordings. That way it does not slow down the process of submitting recordings to the database. For this I created PR #50.

Create endpoints to access MSIDs and MBIDs

I created two API endpoints in PR #51.One endpoint is to fetch MBIDs and MSID using an MSID. Another endpoint is to fetch MSIDs using an MBID. This way end users can access MBIDs and MSIDs which may be used for calculating different stats.

Apart from that with the help of my mentor, I did setup a VM to test the above code on the MsB datadump. This task had some challenges: first I had to create indexes for various fields to speed up the process of clustering. Without indexing, it would have taken approximately 37 days but after creating indexes on various fields It just took 3 hours. I found out that PostgreSQL does allow to create indexes on functions too which came into use while creating artist_credit clusters for which I created a custom function. Indexes were created in PR #53. When I ran the clustering code on a VM on which the whole MessyBrainz datadump was present I found out that we have fields in recording_json table which are supposed to store MBIDs but were pointing to empty strings. This was not supposed to happen initially as ListenBrainz is the only source of data for MessyBrainz currently. Submissions to MessyBrainz are restricted from users directly and ListenBrainz does validate listens for that. So, those recordings must have been inserted before that validation was present. To solve the problem I created PR #52.

The summer was a great learning experience for me. I started slowly as things were messy at the start. As at the start everything wasn’t crystal clear to me, I wasn’t sure on how exactly to write scripts that manipulate database and did write the scripts in the most trivial way possible. Here I was doing a query for every single MBID to first check if it’s present in the recording_cluster tables and if not then cluster the recording. Which is conceptually correct but not efficient by any means. And this could be done by executing a single query on the recording_json table to fetch only those recording MBIDs that are not present in recording_redirect table as those are unclustered. That way we don’t have to process the recording MBIDs that have been already processed making the process of clustering efficient.

With time I got an understanding of how clusters are created and how to handle anomalies. Such as James Morrison. In the end, the definition of anomaly can be put as an MSID represents an anomaly if it points to different MBIDs in entity_redirect table (entity can be artist_credit, recording, and release).

Work to be done ahead

The project is still in its initial stages and requires a lot of work to be done before moving it into production. We still need to write integration tests for ClusterWriter and API endpoints. After that, we can work on the Additional Ideas that I proposed in my proposal. We need to figure out some way to associate MBIDs to MSIDs for the artists, recordings, and releases where no MBIDs are present. This does not seem like a trivial task with so many anomalies to take care of.

Last three months have been a great experience for me. I would like to thank Robert Kaye, Param Singh, and Alastair Porter who helped me to solve a lot of problems that I encountered during the entire period. Working on their suggestions and reviews I was able to write good quality code which was efficient as well. The work culture at MetaBrainz inspired me a lot. At MetaBrainz we have weekly IRC meetings where we get to know what others are doing at the organization and also get a place to tell what we did in our past week. I would like to thank MetaBrainz and Google for giving me this chance to get involved in open source on such a cool project. The association of MSIDs to MBIDs can be used by ListenBrainz as stats are calculated on MSIDs which can then be mapped onto MBIDs which represents clean metadata. I would like to work on the project further because of the learning opportunities that are present in the project.

Delhi Mini-Summit 2018

Rob, Suyash, Param and I met in the bustling city of Delhi where “horns are applied very liberally” (it is a very noisy city!) for a mini summit. Some may even call it elaborate break-out sessions on ListenBrainz and CrtiqueBrainz. We had discussions over a span of two days over laptops and notebooks, riding on bumpy roads in tuk-tuks and over spicy chicken biryanis. Here is a summary of all that we discussed:

ListenBrainz
Data Visualizations
We started Day 1 with graphs for ListenBrainz. After a long marathon of heavy development weightlifting tasks by Param and Rob (how do we work with BigQuery correctlty?), we are finally at a stage, where we can have some really cool amazing visualizations out of our dataset. What will they be? Where will they be? How will we implement them? Can our community pitch in with requests and maybe even play around with code?

After scrounging through a lot of other websites which do music-y data visualizations, and the few responses on our user survey, we started listing various ideas, and went through ideas on our community forum. We ended up dividing the data visualizations (from now on, called graphs) into two categories:

User specific graphs: showcasing a user’s listening history and taste
Site-wide graphs: showcasing the overall listening patterns on ListenBrainz

We had to make some tricky calls based on technical constraints, but overall, for starters, we decided some cool user graphs. We have detailed 6 of them over the summit:

  1. Listening history of a user: how much have you listen-ed, what you have you listened too, listen counts, etc
  2. Your top artitsts
  3. Your tracklist (listen history)
  4. How much music did you explore
  5. Which artists are trending in what parts of the worlds
  6. Listener count across the world

All these graphs will be available over different time durations (last week, month, year) and will also have handles to manipulate them. They will also have tools to easily share them on social media networks. We think, our community will really enjoy tracking their listening history with these. We also discussed a few ideas of how we can create a sandbox so our community can pitch in with ideas, vote on ideas and send pull requests for new graphs. More on that later, as we get there!

Rating System
If you are listening to a tracklist while working over something, how possible it is that you will rate a track saying “This is 3.5? This is 4.2? That is 5 stars!” So you see, ratings on ListenBrainz are tricky. It is very dynamic and interactive in real time, unlike other dear *Brainz projects, so we think that a Last.fm-like rating i.e like and dislike makes sense for ListenBrainz. There was also some discussion about where the ratings should reside — is CritiqueBrainz the correct place?

Home Page
We worked on redesigning the “My Listens” page as well the home page. We now plan to include, apart from the graphs, an infographic explaining how ListenBrainz works and things you can do with it! I will further detail out the mockup later this week.

Potential Roadmap
After almost two days of discussions, we could chalk up a rough roadmap for ListenBrainz, which include data visualizations, ability to rate/like tracks, create collections, follow users, and more. This also includes encouraging cross brainz pollination!

CritiqueBrainz
With Suyash around (he worked on Critique Brainz as part of GSoC last year, and has been actively involved since), there were obviously a lot of discussions on reinvigorating the project. We discussed quite a few ideas, which included innovating ways of writing and sharing reviews, sharing it on social media, cross *brainz interactions, a few UI changes, etc. We’re considering allowing Quick Reviews that, like Twitter, are limited to 280 characters. What do you think? Suyash has written down his ideas for the same and would love some feedback from the community!

MessyBrainz
With all these talks, a critical need to build some matching and clustering infrastructure was highlighted. Rob has written a possible roadmap for the project trying to compose his thoughts!

And of course! We couldn’t let Rob’s first visit to India be all about work. After the sunset, we went exploring the city of Delhi. That included rides in tuk-tuks, spicy chicken biryanis, shopping for some colorful clothes and definetly, the Indian chaat 🙂

All in all, it was a very productive mini summit and definitely made us all, more excited to start working on the ideas we discussed. We will keep you updated and post more soon!

food-01.jpg
Some A lot of Indian food!
IMG_20180322_211308.jpg
The troope at India Gate
IMG_20180323_195125.jpg
Param is really into (a lot of) selfies.

ListenBrainz release 18 March 2018

We received so few bug reports on the beta release of the ListenBrainz web site, that we decided to push those changes live and start working on new features. This release is substantially unchanged from our beta release.

The user facing changes that were released include:

  • Statistic infrastructure: We’ve created an infrastructure for creating graphs of user’s listening behaviour. So far we’ve only got an all-time top-artists graph to illustrate our setup, but soon we will work to create more graphs. Currently graphs will be generated every Monday starting at 0:00 UTC, if you logged in into your LB account during the last 30 days. If you haven’t logged in recently, you can request the calculation of your stats from your profile page.
  • Automatic data dumps: Now the ListenBrainz data will be dumped and synced to our FTP site twice a month. Currently this is scheduled for the 1st and the 15th of every month. The dumps will start being generated at 04:00 UTC and then copied to our FTP site and it will take a number of hours for the data dumps to appear on the FTP sites. Our documentation details how this data dump can be consumed.
  • Documentation improvements: Quite a few documentation bits have been improved since our last release, including better documentation on the Last.fm compatible API that ListenBrainz exposes.
  • Static page improvements: We’ve done some rearranging of our static pages and navigation bar to reflect the latest changes, including updating the data page and our roadmap page.
  • Listen count on home page: The home page now shows the current listen count.

We also made some internal/hosting changes that you can read about in our beta release blog post. The release from Friday has been tagged with v-2018-03-18.

Thanks to all those people who helped us put the beta site through its paces.

ListenBrainz winter 2018 beta testing

After many more months of hacking on core infrastructure and improving our codebase, we’re finally ready to have more people come and help us test the latest beta version of ListenBrainz. Also, we’ve recently reached a milestone of the 100th million listen in our database!

We’ve made a some internal changes to the project (that took quite a bit of effort):

  • Improve hosting setup that allows us to run both the production and beta version of the site at the same time. This means that any data submitted to the beta site will be submitted to the master listens database and will be available in the BigQuery data set as well. We are mimicking the setup that MusicBrainz has — the beta site use a live database so that testing the service can work with live data.
  • Improve internal container setup to allow for both dumping the listen data and private data for complete backups.
  • Improve the speed with which we process incoming listens.

These internal changes will allows us to move to more frequent updates of ListenBrainz in the future! More important are the changes to the site that are user visible:

  • Statistic infrastructure: We’ve created an infrastructure for creating graphs of user’s listening behaviour. So far we’ve only got an all-time top-artists graph to illustrate our setup, but soon we will work to create more graphs. Currently graphs will be generated every Monday starting at 0:00 UTC, if you logged in into your LB account during the last 30 days. If you haven’t logged in recently, you can request the calculation of your stats from your profile page.
  • Automatic data dumps: Now the ListenBrainz data will be dumped and synced to our FTP site twice a month. Currently this is scheduled for the 1st and the 15th of every month. The dumps will start being generated at 04:00 UTC and then copied to our FTP site and it will take a number of hours for the data dumps to appear on the FTP sites. Our documentation details how this data dump can be consumed.
  • Documentation improvements: Quite a few documentation bits have been improved since our last release, including better documentation on the Last.fm compatible API that ListenBrainz exposes.
  • Static page improvements: We’ve done some rearranging of our static pages and navigation bar to reflect the latest changes, including updating the data page and our roadmap page.
  • Listen count on home page: The home page now shows the current listen count.

If you’re interested in helping us test, please use the beta site and test everything you can see. See if anything misbehaves and if you do spot any problems, please report them to our bug tracker! Hopefully we can push this live next week.

NB: The beta site is connected to the live database, so any listens you submit to it, will be part of your official ListenBrainz listen history!

ListenBrainz Alpha disappearing in 30 days

Since we released the beta of ListenBrainz six weeks ago, people have moved over and imported their listen histories onto the beta site, which is great. While we think that everyone who needs to migrate listens off the old server has already done so, we’re going to give people another 30 days in case anyone hasn’t gotten around to it yet.

If you’ve never submitted original listens to the alpha server, this does not concern you! In fact, if this blog post is confusing to you, it probably means that you’re not affected by us turning off the alpha server on 18 October, 2017.

Thanks!

P.S. We’ve collected 50M listens on the beta site!

 

Expanding our team

As the world comes back to life after the summer break, we’re making some changes and expanding our team. First, Roman Tsukanov has decided to not renew his contract with us. During his tenure with MetaBrainz, Roman adopted and released CritiqueBrainz and also wrote our new MetaBrainz web page, which is helping us bring in new supporters. His contributions have been far from trivial — thank you for your efforts, Roman!

Due in part to the new MetaBrainz web site, we’ve got more financial support than ever, and this allows us to replace Roman with two engineers! I’m please to announce that we’re hiring two of our Summer of Code students who just completed the program:

Sambhav Kothari AKA samj1912: Sambhav started hacking on Picard earlier this year and knocked Picard out of dormancy, working towards a new release and then making Picard his Summer of Code project. He completed his project with flying colors and is working towards a major upgrade of Picard. On the MetaBrainz team he is going to look after the new search infrastructure and the maintenance and bug fixing of our Web Service in addition to hacking on Picard. A full plate, for sure!

Param Singh AKA iliekcomputers: About the same time that samj1912 arrived, Param arrived. He expressed interest in working on ListenBrainz — he too dove right in and started making improvements. ListenBrainz had quite a ways to go before he could aim to make a Summer of Code project out of it. Param and I embarked on a journey to revamp and improve the stability of ListenBrainz, which culminated in us releasing the new ListenBrainz beta a few weeks ago. Since then he’s been focusing on his Summer of Code project, which is also now complete. On the MetaBrainz team Param will be looking after ListenBrainz and also the new MetaBrainz web site.

Both Param and Sambhav will officially start working on the MetaBrainz team starting October 1, but I strongly suspect we’ll see them around and hacking on the projects as has become the norm this year.

Welcome aboard Sambhav and Param!

 

GSoC 2017: Hacking on ListenBrainz

Namaste!

I am Param Singh, an undergraduate at the National Institute of Technology, Hamirpur, India, and I worked on ListenBrainz over the summer as part of the Google Summer of Code program. I started contributing code to ListenBrainz in January 2017 and have been working on new features and bug fixes since. I’ll be writing about the work I did and my experience working on LB in this blog post.

After a few of my patches had made it in and I was comfortable with the ListenBrainz codebase (which was a really nice example of software architecture for me), I talked with the LB team about what possible contributions I could make over the summer, and we decided that a Google BigQuery based statistics system is something that would be useful to have in ListenBrainz after we release a beta and have listen data that is permanently archived. I made a proposal for adding statistics to ListenBrainz which got accepted! During the community bonding period, we decided to try to get a solid and stable beta of ListenBrainz released before starting with the relatively large code additions that would be required by my project proposal. We tracked issues that we wanted fixed before a release in the MetaBrainz ticket tracker here. This work of fixing release blocking issues went into the coding period and we decided to continue working on a solid beta instead of adding new features for the time being.

I started with fixing bugs and adding new features to get a beta released as soon as possible. Some cool stuff I worked on during this time was dockerizing MessyBrainz (see PR here), migrating the codebases of MessyBrainz and ListenBrainz to Python 3 (PRs here and here) and improving the startup resilience of various parts of ListenBrainz to make sure that the server is able to self-heal (partially) if some part of it like RabbitMQ goes down (ticket here).

Later on, I did a big refactor of the LB code so that adding new modules would be easier in the future (PR here). I also spent a lot of time fixing bugs in our listen deduplication. Relevant pull requests for this are here and here.

Another feature I added to ListenBrainz while working on the beta was incremental imports. Earlier, LB didn’t keep track of previous imports of a user and did a full Last.FM import every time. However, now we keep track of the last time each user imported listens and only import new data since then. The PR adding incremental imports is here.

My mentor, Robert Kaye (ruaok) set up a test instance of the ListenBrainz server that was used by the community and as the community kept throwing their data at us, bugs kept popping up. A particularly weird bug caused LB to lose data for users with special characters in their usernames. The PR to fix this took a lot of time to create.

We kept on fixing bugs for a long time and the biggest thing I took away from this period of GSoC was the Ninety-ninety rule: «The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.» This summer has drilled this into my mind.

As soon as the beta was released, I started with writing code for statistics, making schema changes (PR here) and adding some user stats (PRs here and here). I’ll be continuing on the stats work after Summer of Code. The basic foundation of stats is mostly done and soon I’ll start with showing statistics to the users.

By the end of the official GSoC coding period, I have made 266 commits in the ListenBrainz codebase and have opened a total of 111 pull requests. The current production ListenBrainz running on https://listenbrainz.org has 253 commits by me, most of which were made during the GSoC period.

Over the summer, I have fallen in love with the MetaBrainz community and have learned a lot of stuff. I’m really looking forward to adding more features to ListenBrainz soon, so that the data that the community is contributing becomes useful to everyone. I loved working on a really cool open-source project like ListenBrainz this summer and am very thankful to Google for providing me this opportunity. I would encourage everyone reading this to give the ListenBrainz beta a try and contribute to ListenBrainz if possible.

ListenBrainz data is live on BigQuery!

We’re pleased to announce that in cooperation with Google, we are live streaming our ListenBrainz data to Google’s BigQuery service!

ListenBrainz is a project is that has the potential to gather a lot of data quickly, which would require us to have a Big Data infrastructure, which can be expensive. In an effort to use our available cash wisely, we began to look around for ways to take advantage of other infrastructures with lower costs.

Two years ago at the Google Summer of Code mentor summit I met with a representative from the BigQuery team who said that Google was happy to host any public data set for free! I immediately took them up on this offer and started a conversation.  With much time passed, we finally managed to get the data set live!

If you wish to play with the data, please do!

https://bigquery.cloud.google.com/table/listenbrainz:listenbrainz.listen

You’ll need a Google account to log in with — once you’re logged in, every user gets 5TB of query traffic free per month. That is quite a lot for how large this dataset is currently. The schema for this table is defined here and what the data elements mean are defined in our API docs. To get you started, I’ve written a few sample queries:

BigQuery uses an SQL like syntax, so if you know some SQL then diving right in should be easy. The queries above should give you an idea of what you can do with this data. Now, please know that currently we have approaching 30M listens, so the dataset is still quite small. We’re very much interested to see what sort of things people can come up with in the near future.

Finally, some notes about openness and proprietary software: Given that we have limited resources, we aim to make the most things happen with the services that are at our disposal. Google has been extremely generous to us over the years and we’re very pleased to have access to BigQuery now.

That is not to say that we’re putting all of our eggs in one basket or forcing people to use BigQuery. Our InfluxDB database hosted on our own servers keeps the master archival copy of our listen data. Soon we hope to make dumps of this data available for anyone to download and play with using whatever tools they would like. With this setup we are not fully reliant on Google for keeping this project alive. We’re glad to have their support, but should circumstance change, we can find another BigData solution and load our master archival copy there.

Now, go play with this very promising data and post some of your favorite queries in the comments!

 

 

ListenBrainz enters Beta stage

I’m pleased to announce that we released our first official beta version of ListenBrainz yesterday! As you may know, ListenBrainz is our project to collect, preserve and make available, user listening data similar to what Last.fm has been doing, but with open data.

In 2015 a small group of hackers gathered in London to hack on the first version of ListenBrainz alpha. We threw together a pile of new technologies and released the first version of ListenBrainz at the end of the weekend. In the end, we didn’t really like the new technologies (Cassandra, Kakfa) as both ended giving us a lot of problems that never seemed to end.

In 2016 we embarked on a journey to pick new technologies that we liked better and ended up setting on InfluxDB and RabbitMQ as backbones to our data ingestion pipeline. These tools were a good match for us, since we were already using them in production! Sadly, MetaBrainz’ move to our new hosting provider ended up sucking up any available time we had to devote to the projects, so progress was made in fits and starts.

Earlier this year Param Singh expressed interest to help with the project in hopes of joining us for a Google Summer of Code project. He started submitting a never ending stream of pull requests; slowly the project started moving forwards. Together we brought the codebase up to our current standards and integrated it into the workflow that we use for all of the MetaBrainz projects.

We proceeded to prepare the next version to be released at MetaBrainz’s new hosting facility and started a never ending series of tests. We kept pounding on the data ingestion pipeline, trying to find all of the relevant bugs and ways in which the data flow could get snagged. Finally the number of reported bugs relating to data ingestion dropped to zero and we managed to import 10M listens (a listen is a record of one song being played)!

That was our cue for promoting our pre-beta test to a full beta and unleashing it onto our production servers at our new hosting facility. Today we cleaned up the last bits of the release and we are ready for business!

What does this new release bring for you, the end users? Sadly, only a few new things, since most of the work has gone into building a stable and scalable system. We do have a few new things in this release:

  • Incremental imports from Last.fm — now you don’t have to do a full import any time you wish to import your latest listens from Last.fm. The importer knows when you last did and import and will work accordingly.
  • Last.fm compatible submission interface — with some system configuration changes you can submit your listens directly to ListenBrainz from any application with Last.fm support. (more info here)
  • Last.fm file import — if you have an old skool Last.fm zip file with your listening history backed up, you can now import it.
  • User data export — you can now download your own listens straight from the site, no waiting required.
  • Adaptive rate limiting on the API — our server now uses a modern rate limiting system. For details, see our API docs.

The good news is that Param is now working on his Summer of Code project that will add a lot of graphs and other critical elements for making use of this new data set. We hope to release new features on an ongoing basis from here on out.

Most importantly, we want to publicly state that ListenBrainz is now ready for business! We don’t plan to reset the database from here on out — this is the real deal and we plan to safeguard and make this database available as soon as we can. If you have hesitated with sending your listen histories to ListenBrainz in the past,  you should now feel free to send your listen information to us! If you are an author of a music player, we ask that you consider adding support for ListenBrainz in your player!

In a follow-up blog post I am going to write about how to start using ListenBrainz now — at the very least use it to back-up your Last.fm listening history!

If you find bugs with our latest release, please report them to our issue tracker. If you’re interested in this project and have questions for us, why not come and pop into our IRC channel or ask a question on our community forum?

P.S. The alpha version of ListenBrainz is still around.

P.P.S. We’ll have another cool announcement very shortly! Stay tuned!

GSoC ’16 + ListenBrainz = fun :)

Hello,

I am Pinkesh Badjatiya and I have been working on ListenBrainz as part of GSoC ’16. I was largely involved in implementing the most requested features in ListenBrainz.
I began my journey with MetaBrainz not long before the Final Organization list was out. I started with MusicBrainz but moved quickly to ListenBrainz, and have been working on it since then.

About the project

The project consisted of creating a proxy scrobbling API similar to last.fm’s which could be used by existing desktop clients to submit listens to listenbrainz.org. I submitted my initial idea, that involved creating a new API along with few other optional features that were very much required (import, export, etc.).
The project made its way through the approval process, and I worked with ruaok (my mentor) & alastairp to get important things done. Yey!

Here are some of the snapshots of the my journey with ListenBrainz.

API_compat

ListenBrainz already had its own API which can be used to fetch/submit listens but all the existing clients that support scrobbling to last.fm use the ws.audioscrobbler.com’s API. To add support for these clients, I ended up creating a proxy API, api_compat (as in “compatible API”), that translates every request that is sent to “api.listenbrainz.org/2.0/” in the native format. This is an additional API which can be used along with the existing native ListenBrainz’s API.

This was largely the main goal of my project proposal. The instructions for scrobbling using Audacious are attached along with the source code.

Import lastfm-backup

The import page now allows users to import listens from the last.fm scrobbles or from the backup file which was downloaded from the older version of the last.fm website.
import_backup On successful import of listens from backup, you’ll get the following notification.import_success

Export listens

This allows users to export the listens from the listenbrainz.org website. This is useful for users who want to keep track of their listen history offline as well.
The export feature can be accessed from the drop-down menu.

export_dropdown_menuexport_page

Playing Now

With the support for API-Compat, the support for currently playing song was needed. This keeps the currently playing song on the website in sync with your favourite player.
playing_now

Import scraper uses audioscrobbler API

I also worked on updating the import scraper which now use the ws.audioscrobbler.com‘s API allowing users to import without opening their last.fm profiles. This also provides other useful track information to ListenBrainz.

Migrate to PostgreSQL

Another important change to ListenBrainz was how it stored listens. We moved from using Cassandra to PostgreSQL. Cassandra was fast and effective but getting more information other than the user’s listens (ex. generating statistics) was not possible. So we switched to Postgres + Redis. This opened more possibilities for future.

Experience

After 3.5 months, I ended up with 15 merged and 3 closed PR’s and a bunch of features for ListenBrainz that improved its look and feel.

My pull requests: https://github.com/metabrainz/listenbrainz-server/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Apinkeshbadjatiya%20

I have worked on quite a lot of varied things in the past 4 months. A lot of them were actually not the part of the GSoC proposal but they were done largely in the same timeline or were optional targets, so I suppose they would count significantly towards GSoC.
I worked largely with alastairp, ruaok and Gentlecat. Gentlecat helped improve my coding style by providing feedback on my PR’s. I worked with alastairp and ruaok regarding the ideas/suggestions on how to address a problem and its possible solutions. It was a interesting experience working with the community and getting to know about MetaBrainz. Now that my understanding of the project and the community has increased, I look forward to making some great contributions!

Conclusion

In short, ListenBrainz went through a hell lot of changes in the past 4 months. If you were waiting for it to improve before using it, then now is the time that you should try it. I bet you’ll love its new look and you won’t be disappointed. 😀