GSoC’22: Personal Recommendation of a track

Hi Everyone!

I am Shatabarto Bhattacharya (also known as riksucks on IRC and hrik2001 on Github). I am an undergraduate student pursuing Information Technology from UIET, Panjab University, Chandigarh, India. This year I participated in Google Summer of Code under MetaBrainz and worked on implementing a feature to send personal recommendation of a track to multiple followers, along with a write up. My mentors for this project were Kartik Ohri (lucifer on IRC) and Nicolas Pelletier (monkey on IRC)

Proposal

For readers, who are not acquainted with ListenBrainz, it is a website where you can integrate multiple listening services and view and manage all your listens at a unified platform. One can not only peruse their own listening habits, but also find other interesting listeners and interact with them. Moreover, one can even interact with their followers (by sending recommendations or pinning recordings). My GSoC proposal pertained to creating one more such interaction with your followers, sending personalized recommendations to your followers.

Community Bonding Period

During the community bonding period, I was finishing up working on implementing feature to hide events in the feed of the user and correcting the response of missing MusicBrainz data API. I had also worked on ListenBrainz earlier (before GSoC), and had worked on small tickets and also had implemented deletion of events from feed and displaying missing MusicBrainz data in ListenBrainz.

Coding Period (Before midterm)

While coding, I realized that the schema and the paradigm of storing data in the user_timeline_event that I had suggested in the proposal won’t be feasible. Hence after discussion with lucifer in the IRC, we decided to store recommendations as JSONB metadata with an array of user IDs representing the recommendees. I had to scratch my brain a bit to polish my SQL skills to craft queries, with help and supervision from lucifer. There was also a time where the codebase for the backend that accepted requests from client had a very poorly written validation system, and pydantic wouldn’t work properly when it came to assignment after a pydantic data-structure had been declared. But after planning the whole thing out, the backend and the SQL code came out eloquent and efficient. The PR for that is here.

{
     "metadata": {
        "track_name": "Natkhat",
        "artist_name": "Seedhe Maut",
        "release_name": "न",
        "recording_mbid": "681a8006-d912-4867-9077-ca29921f5f7f",
        "recording_msid": "124a8006-d904-4823-9355-ca235235537e",
        "users": ["lilriksucks", "lilhrik2001", "hrik2001"],
        "blurb_content": "Try out these new people in Indian Hip-Hop! Yay!"
    }
 }

Example POST request to the server, for personal recommendation.

Coding Period (After midterm)

After the midterm, I started working on creating the modal. At first my aim was to create a dropdown menu for search bar using Bootstrap (as most of the code had bootstrap rather than being coded from scratch). But after a while I consulted with lucifer and monkey and went for coding it from scratch. I had also planned to implement traversing search results using arrow keys, but the feature couldn’t be implemented in due time. Here are screenshots of what was created in this PR.

Accessing menu button for personal recommendation

A modal will appear with a dropdown of usernames to choose for personal recommendation

Modal with usernames chosen and a small note written for recommendation

If one has grokked my proposal, they might already notice that the UI/UX of the coded modal is different from the one proposed. This is because while coding it, I realized that the modal needs to not only look pretty but also go well with the design system. Hence the pills were made blue in color (proposed color was green). While I was finishing up coding the view for seeing recommendations in the feed, I realized that the recommender might want to see the people they had recommended. So, I asked lucifer and monkey, if they would like such feature, and both agreed, hence this UI/UX was born:

Peek into the feed page of recommender

What a recommendee sees

Special thanks to CatQuest and aerozol for their feedbacks over the IRC regarding the UI/UX!

Experience

I am really glad that I got mentorship from lucifer and monkey. Both are people whom I look up to, as they both are people who are not only good at their crafts but are also very good mentors. I really enjoy contributing to ListenBrainz because almost every discussion is a round table discussion. Anyone can chime in and suggest and have very interesting discussions on the IRC. I am very glad that my open source journey started with MetaBrainz and its wholesome community. It is almost every programmer’s dream to work on projects that actually matter. I am so glad that I had the good luck to work on a project that is actually being used by a lot of people and also had the opportunity to work on a large codebase where a lot of people have already contributed. It really made me push my boundaries and made me more confident about being good at open source!

GSoC 2021: Pin Recordings and CritiqueBrainz Integration in ListenBrainz

Hi! I am Jason Dao, aka jasondk on IRC. I’m a third year undergrad at University of California, Davis. This past summer, I’ve been working with the MetaBrainz team to add some neat features to the project ListenBrainz.

Continue reading “GSoC 2021: Pin Recordings and CritiqueBrainz Integration in ListenBrainz”

Kartik Ohri joins the MetaBrainz team!

I’m pleased to announce that Kartik Ohri, AKA Lucifer, a very active contributor since his Code-in days in 2018, has become the latest staff member of the MetaBrainz Foundation!

Kartik has been instrumental in rewriting our Android app and more recently has been helping us with a number of tasks, including new features for ListenBrainz, AcousticBrainz as well as breathing some much needed life into the CritiqueBrainz project.

These three projects (CritiqueBrainz, ListenBrainz and AcousticBrainz) will be his main focus while working for MetaBrainz. Each of these projects has not had enough engineering time recently to sufficiently move new features forward. We hope that with Kartik’s efforts we can deliver more features faster.

Welcome to the team, Kartik!

Playlists and personalised recommendations in ListenBrainz

Just in time for Christmas we are pleased to announce a new feature in our most recent release of ListenBrainz, the ability to create and share your own playlists! We created two playlists for each user who used ListenBrainz containing music that you listened to in 2020. Check out your lists at https://listenbrainz.org/my/recommendations. Read on for more details…

With our continuing work on using data in ListenBrainz to generate recommendations, we realised that we needed a place to store lists of music. That sounded like playlists to us, so we added them to ListenBrainz. As always, we did this work in the public ListenBrainz repository. You can now create your own playlists with the web interface or by using the API. Recordings in playlists map to MusicBrainz identifiers. If you’re trying to add something and can’t find it, make sure that it’s in MusicBrainz first.

Once you have a playlist, you can listen to it using our built-in BrainzPlayer, or export it to Spotify if you have an account there. If you have already linked your Spotify account to ListenBrainz you may have to re-authenticate and give us permission to create playlists on your behalf. Playlists can also be exported in the open JSPF format using the ListenBrainz API.

Over the last year we’ve started thinking about how to use data in MetaBrainz projects to generate recommendations of new music for people to listen to. For this reason, we started the Troi recommendation framework. This python package allows developers to build pipelines that take data from different sources and combine it in order to generate recommendations of music to listen to. We have already developed data sources using MusicBrainz, ListenBrainz, and AcousticBrainz. If you are a developer interested in working on recommendations in the context of ListenBrainz we encourage you to check it out.

Now that we can store playlists we needed some content to fill them with. Luckily we have some great projects worked on by students over the last few years as part of MetaBrainz’ participation in the Google Summer of Code project, including this year’s work on statistics and summary information by Ishaan. Using Troi and ListenBrainz statistics, we got to work. Every user who has been contributing data to ListenBrainz recently now has two brand new 2020 playlists based on the top recordings that you listened to in 2020 and the recordings that you first listened to in 2020. If you’re interested in the code behind these playlists, you can see the code for each (top tracks, first tracks) in the troi repository.

If you’re a long-time user of ListenBrainz you may be familiar with the problem of matching your listens to content in MusicBrainz to be able to do things with it. We’ve been working hard on a solution to this problem and have built a new tool using typesense to provide a quick and easy way to search for items in the MusicBrainz database. You are using this tool when you create a playlists using the web interface and search for a recording to add. This is still a tech preview, but in our experience it works really well. Thanks to the team at typesense for helping us with our questions over the last few weeks!

This work is still in its early days. We thought that this was such a great feature that we wanted to get it out in front of you now. We’re happy to take your feedback, or hear if you are having any problems. Open a ticket on our bug tracker, come and talk to us on IRC, or @ us. Did we give you a bad jam? Sorry about that! We’d love to have a conversation about what went well and what didn’t in order to improve our systems. In 2021 we will start generating weekly and daily playlists for users based on your recent listens using our collaborative filtering recommendations system.

Merry Christmas from the whole MetaBrainz team!

GSoC 2020: Manage your listens better with ListenBrainz

Hey! My name is Shivam Kapila (shivam-kapila on IRC) and I am a final year undergrad at National Institute of Technology Hamirpur. I have been working on the ListenBrainz project this Summer as a participant of the Google Summer of Code program. The past four months were full of fun, hacking and loads of music!!

Landing into the MetaBrainz Community!

My journey with MetaBrainz began in late January this year, when I introduced myself to the community. My first PR improving the developer documentation was by adding parts connected with setting up the Spark infrastructure on a local setup along with consolidating and improving bits of documentation. I delved into real code while implementing front end components for Deleting Listens. Over the next few months, I fixed various bugs like making the Importer Modal responsive, fixing the DB setup scripts, fixing pagination issues while browsing listens, handling stat calculation errors in the Spark Reader and flushing user stats when they delete their listens.

As a GSoC applicant, I proposed to add various Listen Management features like love/hate (aka feedback) and deleting individual listens in ListenBrainz. I also proposed a new design for the Listens page. This involved a lot of designing and research, going through UI/UX design guidelines and tuning colors, shades and shadows till we arrived at a presentable and subtle design.

And finally I onboarded the GSoC train 🙂 .

Bonding with the community

I had been a part of the community since January so I was familiar with how things work in ListenBrainz. So I decided to contribute to the TimescaleDB migration where we moved our primary listen store from InfluxDB to TimescaleDB, opening up a ton of features for us to work on. Here is the final migration PR containing the commits of my contribution.

I also contributed to easing the testing infrastructure for devs to test the patches on their local setups. Following this I upgraded the postgres-client to PG12 version when we migrated to Postgres 12. I also fixed a minor font bug on the profile page.

The GSoC journey begins

Laying the base

As the official coding period began, I started working on my proposed tasks. The first question was: how to store the feedback? So I began implementing the database changes to store the recording feedback and applying the necessary changes in production. Following this I added a Python module to interact with the database and implemented a Pydantic model to validate the feedback records before they are stored in the database or served over the API. Then I added the necessary APIs to store and fetch the feedback for a given user or recording. This was followed by improving the efficiency of the DB module.

I also worked on dumping the recording feedback in the ListenBrainz public dumps. Since ListenBrainz had migrated the stats calculation infrastructure from Google BigQuery to Apache Spark I also removed the BigQuery references from the ListenBrainz website. Now that the timescale migration work became stable, I began working on Delete a Listen feature.

Pulling out the front end brushes

Now that the base was ready for us to work on, I started working on the React components so that the feedback and deletion feature could actually be presented on the website. Around the same time, the Timescale release day was also getting near, so I helped with a few tests and finished up the work for deleting listens. The front end components also started looking good and we were ready to associate the back end with them.

Rectifying & Reactifying

It’s high time and the final phase started. Now that we were ready with a few components we needed some tweaks in some production components to make them subtle. Hence I shot an improvement PR to tweak some shadows, adjust some fonts, adjust heights of the components, sticking the footer to the bottom, and reactify the loading spinner. Then came the Listen Count Card denoting the number of listens for a user. Following this we moved to Card based design for displaying listens.

This was followed by the much awaited feedback controls and now we can love/hate the songs from our listen collection. Isn’t this amazing! There were some needed minor tweaks needed to handle the ‘playing now’ listens correctly. At the same time, following the MetaBrainz guidelines to write quality code, I worked on making the SQL queries more readable. Then came the much awaited Delete a Listen feature and now we can finally get rid of the embarrassing listens!!

I also addressed some high priority tasks like giving the users an option to download their submitted feedback as JSON. We noticed some UI glitches and then came three back to back PRs to update feedback control shades, improving the listen time text and smoothing up the deletion animation. This is how the listen list looks like:

List of listens

What’s next??

Oh, now comes the time when we talk about the current scenario. The tasks currently on my radar are adding cover art support so that the page looks more alive and improving the Spotify imports to only import listens that were listened by the user after the latest Spotify listen we have for them.

After this I aim to work on the recommendation stuff that’s being actively pursued by the team. Also Mr_Monkey and me had been working on some design concepts for the All New ListenBrainz. I am pretty excited to work on it. Wanna take a sneak peek?

A new fam

The journey with MetaBrainz has been so amazing, that I am so tempted to stick here. I feel ecstatic to be a part of GSoC with the best org 🙂 . The best part is – it’s never all about code. There’s a lot to gain. Each day marked gaining maturity and thinking more and more like a real developer. I started feeling at ease with the communicate → code → integrate chain. It really feels fortunate to be a part of the MetaBrainz family where everyone is a ping away ❤ .

GSoC marks the kickstart of my journey with MetaBrainz and I will be here lurking on IRC, shooting PRs to make the projects more and more awesome.

Heartiest Gratitude

GSoC 2020: Adding Statistics and Graphs for ListenBrainz Users and Community

Hey everyone! I am Ishaan Shah (ishaanshah), a sophomore at International Institute of Information Technology – Hyderabad, India. This summer, I worked on ListenBrainz as a participant in Google Summer of Code ’20. My project involved generating statistics and visualisations for users using Apache Spark. This blog is an overview about the work I did and my experience working with ListenBrainz.

I started contributing to ListenBrainz in January 2020. My first PR was for LB-179, a small Quality of Life improvement to the LastFM importer. My first major contribution was porting the LastFM importer to ReactJS. Over the next two months, I continued working on the frontend, where I mainly worked on improving the frontend infrastructure by adding support for automated testing, porting the codebase to TypeScript and standardising the frontend code using ESLint and Prettier.

After making a few patches, I understood how ListenBrainz worked and got comfortable with the codebase. I decided to make a proposal for adding statistics to ListenBrainz using Apache Spark. While writing the proposal, I referred to many other websites, blogs, as well as community discussions for different ideas about statistics which could be added. After some research, I narrowed down on the specific graphs and statistics that I wanted to calculate during GSoC.

Community Bonding Period

Since I had been working with the MetaBrainz community since January, I was familiar with how things worked in the community. So we decided to use the Community Bonding Period for fixing and updating the Top Artists charts for a user. The first task that I took up was to add an API endpoint for fetching the Top Artists data for a user programmatically. Until then, I had mostly spent my time working on the frontend, this task helped me in getting familiar with the backend architecture. Next, I worked on porting the Top Artist graph from d3 to nivo – a charting library built with ReactJS and d3. The Top Artists graph only supported All Time statistics before. I worked on adding support for more time ranges. This was the first time I worked with Apache Spark and the PR for this took quite some time, but it was essential that we got it right as most of the statistics we built further would use a similar workflow. After we were satisfied with the overall flow of the data from our Spark cluster to the web server, I started working on showing the stats for different time ranges on the website. Although this task seemed easy at first, it took much longer than expected. We encountered some bugs and received some user feedback when we deployed the graph to production. The rest of this period was spent on incorporating the user feedback and fixing the bugs.

Top Artist shown on the Charts page
Top Artists

First Coding Period

We now had a somewhat stable pipeline for calculating the stats and sending them to the server. I started working on the backend for Top Releases stats for a user. We ran into memory issues when calculating these stats on the cluster, so I spent some time finding the cause of the issue and realised that we were collecting the results all at once which was causing the driver to run out of memory. I fixed this by collecting the results for each user separately and tweaking some RabbitMQ parameters to make sure that messages aren’t dropped while sending them to the server (PR #897). After this, I added Top Recordings for a user. Now we had a brand new Charts page that displayed the user’s Top Artists/Releases/Recordings for different time ranges. Next I started working on temporal statistics for a user i.e, number of listens in a past time range. The query that I wrote for calculating this data turned out to be pretty inefficient for larger datasets. So I ended up writing two versions of the same query: one for large datasets and one for smaller ones. While working on displaying these stats on the frontend, I tried various representations of the data. I finally settled on displaying the data as bar graphs, as shown on this report view.

Listening Activity shown on Reports page
Listening Activity

Second Coding Period

I added two more graphs in this period: Daily Activity and Artist Origins. The Daily Activity graph shows the number of listens a user has at a particular time of the day. I implemented the query for calculating this data in a slightly different way compared to the Listening Activity query. This change improved the query speed significantly. I had some trouble finding a correct way to represent this data. My mentor helped me in this by suggesting the usage of a Heatmap, and the results turned out to be pretty good.

Daily Activity shown on Reports page
Daily Activity

Next, we worked on the Artist Origins graph, which provides an insight into the geographical diversity of a user’s musical taste. I had a lot of help from the ListenBrainz team for this graph and I couldn’t have done this graph without their help. This was by far the most interesting stat that I worked on during the project. Furthermore it laid a general framework to calculate statistics using the data from MusicBrainz. After deploying this map on production, we received feedback from the users that the map looked plain for most of them and there wasn’t much colour difference between different regions. This happened because people generally tend to listen more songs from their home country, so there is a huge difference between the country with maximum artists and average number artists from other countries. We fixed this issue by changing the colour scale from linear to logarithmic.

Comparison between linear and log scale in Artist Origins Map

Final Coding Period

We now turned our attention towards calculating some stats for the whole website. We decided to make a graph for the Top Artists over different time ranges. We thought that this would be relatively easy given that we had already done something similar for individual users before. However we hit an unexpected bump; the data we were calculating was not accurate, mainly because of various different sources of the artists and some minor changes in the artists’ name or metadata resulted in a different entry with a different listen count for the same artist. Moreover, we found a couple of users spamming our website for self promotion and we did not have a solid way to deal with this. Around this time, my college resumed and the amount of time I could dedicate to LB reduced severely. So we decided to use the remaining time to work on improving the frequency at which stats are updated. I have an open PR (#1052) for doing this at the time of me writing this blog and we should be able to implement this functionality in the near future.

Artist Origins shown on reports page
Artist Origins

Experience

The past 4 months have taught me a lot of things. I learnt new technical concepts everyday. I started writing code as a developer rather than a programmer. I understood the importance of proper unit and integration testing (even though it was my least favourite part while adding a new functionality). I also found it much easier to talk and interact with people both online and in real life. Frequent deployments of new features to production helped us a lot. We were able to catch bugs when we still had some context over the code written and also received feedback from the users about how we could improve the new features added. It also kept me motivated to keep working on new graphs and statistics and gave me a sense of satisfaction when I saw them on the production server. I also learnt that things don’t always go the way we expect them to. More often than not, you will run into some bumps while adding new features so it is better to keep some extra time to deal with these issues.

GSoC gave me a wonderful opportunity to work with some amazing people from all over the globe. I was not able to complete all the graphs that I had planned for this summer, but I do plan to continue working on ListenBrainz to add more statistics and new features.

Special Thanks

  • Param Singh (iliekcomputers) for being an amazing mentor and helping me whenever I was stuck on an issue.
  • Robert Kaye (ruaok) for providing some really insightful feedback and the MusicBrainz data that was required for calculating the Artist Origin map.
  • Nicolas Pelletier (Mr_Monkey) for helping me with the frontend for the user Charts page and providing some amazing tips for ReactJS.

Migration to TimescaleDB complete!

Yesterday I posted about why we decided to make the switch to TimescaleDB and then later in the day we actually made the switch!

We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.

In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.

The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!

Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!

ListenBrainz moves to TimescaleDB

The ListenBrainz team has been working hard on moving our primary listen store from InfluxDB to TimescaleDB, and today at UTC 16:00 we’re going to make the switch.

We were asked on Twitter as to why we’re making the switch — and in the interest of giving a real world use case for switching, I’m writing this post. The reasons are numerous:

Openness: InfluxDB seems on a path that will make it less open over time. TimescaleDB and its dependence on Postgres makes us feel much safer in this regard.

Existing use: We’ve been using Postgres for about 18 years now and it has been a reliable workhorse for us. Our team thinks in terms of Postgres and InfluxDB always felt like a round peg in a square hole for us.

Data structure: InfluxDB was clearly designed to store server event info. We’re storing listen information, which has a slightly different usage pattern, but this slight difference is enough for us to hit a brick wall with far fewer users in our DB than we ever anticipated. InfluxDB is simply not flexible enough for our needs.

Query syntax and measurement names: The syntax to query InfluxDB is weird and obfuscated. We made the mistake of trying to have a measurement map to a user, but escaping measurement names correctly nearly drove one of our team members to the loonie bin.

Existing data: If you ever write bad data to a measurement in InfluxDB, there is no way to change it. I realize that this is a common Big Data usage pattern, but for us it represented significant challenges and serious restrictions to put simple features for our users into place. With TimescaleDB we can make the very occasional UPDATE or DELETE and move on.

Scalability: Even though we attempted to read as much as possible in order to design a scalable schema, we still failed and got it wrong. (I don’t even think that the docs to calculate scalability even existed when we first started using InfluxDB.) Unless you are using InfluxDB in exactly the way it was meant to be used, there are chances you’ll hit this problem as well. For us, one day insert speed dropped to a ridiculously low number per second, backing up our systems. Digging into the problem we realized that our schema design had a fatal flaw and that we would have drastically change the schema to something even less intuitive in order to fix it. This was the event that broke the camel’s back and I started searching for alternatives.

In moving to TimescaleDB we were able to delete a ton of complicated code and embrace a DB that we know and love. We know how Postgres scales, we know how to put it into production and we know its caveats. TimescaleDB allows us to be flexible with the data and the amazing queries that can be performed on the data is pure Postgres love. TimescaleDB still requires some careful thinking over using Postgres, it is far less than what is required when using InfluxDB. TimescaleDB also gives us a clear scaling path forward, even when TimescaleDB is still working on their own scaling roadmap. If TimescaleDB evolves anything like Postgres has, I can’t wait to see this evolution.

Big big thanks to the Postgres and TimescaleDB teams!

State of the Brainz: 2019 MetaBrainz Summit highlights

The 2019 MetaBrainz Summit took place on 27th–29th of September 2019 in Barcelona, Spain at the MetaBrainz HQ. The Summit is a chance for MetaBrainz staff and the community to gather and plan ahead for the next year. This report is a recap of what was discussed and what lies ahead for the community.

Continue reading “State of the Brainz: 2019 MetaBrainz Summit highlights”

GSoC 2019: An open-source music recommendation engine

Give me music that I like.

When you start discovering yourself, just know that you are at the right place and with the right people.
MetaBrainz is the one for me!

I am Vansika Pareek (pristine__ on IRC), an undergraduate student at National Institute of Technology, Hamirpur, India. I have been working on the ListenBrainz-Labs project for MetaBrainz as a participant in Google Summer of Code ’19. The end of GSOC’19 is a beginning for me. Cheers!

How it all started?

Continue reading “GSoC 2019: An open-source music recommendation engine”