We are now running a copy of InfluxDB and a copy of TimescaleDB at the same time — in case we find problems with the new TimescaleDB database, we can revert to the InfluxDB database.
In the process of migrating we got rid of a pile of nasty duplicates that used to be created by importing from last.fm. We also got rid of some bad data (timestamp 0 listens) that were pretty much useless and were cluttering the data. If you find that you are missing some data besides some duplicates, please open a ticket.
The move to TimescaleDB allows us to create new features such a deleting a listen (which should be released later this summer) and various other features that because the underlying DB is much more flexible than InfluxDB. However, right this second there are no real new features for end users — more new features are coming soon, we promise!
Thank you to shivam-kapila, iliekcomputers and ishaanshah — thanks for helping with this rather large, long running project!
We’ve just finished pushing a new release to the production server for ListenBrainz. We’ve spent quite a long time working on this because we needed to completely revamp how we were generating user statistics and that process is now finally complete and live. The other good news on user statistics it that we now have a generalized framework for creating them and that should make it much easier to create more user statistics going forward. We’ve triggered the stats engine to produce updated top artist statistics for everyone and those should update for users automatically sometime later today.
This release also includes an improved importer from last.fm, moving it to react and making it more friendly on a mobile device. This particular feature hasn’t been super well tested, so if you find a problem, please submit a bug report.
Next, if your listening history is screwed up for some reason, you can now delete all listens and start over, perhaps with a clean import from last.fm.
Finally, this release includes a pile of security updates to make the overall system more secure, but users shouldn’t notice anything different.
Thank you to iliekcomputers, Mr_Monkey, ishaanshah[m], shivam-kapila, pristine__ and everyone else who was involved in creating this update!
Following up on our release from last week, we found a number of minor problems in production that were really hard to spot on our test setup. Sometimes you need to have real data flowing through your system before you can find the real problems.
The following pull requests were merged and released just now:
As promised, here is another blog post about the exciting new Follow page. The goal of this page is to finally make use of the data we collect in ListenBrainz and expose a new feature designed to let our users discover more music.
To use this new feature, you’ll need to link your Spotify account to ListenBrainz. Ideally you should give permission to record your listens and to play Spotify content. But if you’re not ready to dive into recording your listens, start with playback first. N.B. In order to really take advantage of this new feature, you’ll need a premium Spotify account.
Then head over to the recent listens page and hover over the tracks that are listed there. If the user listened on Spotify, then a play button will appear and you can listen to the track. Please note that playing from this page will interrupt whatever you’re already playing on Spotify. If you find that a user is listening to interesting music and you’d like to follow the user, head to the follow page and use the Follow Users section to add this user to your follow list.
When a user in your follow list finishes listening to a track, that track will appear as a line in the Playlist. In theory, you’ll be able to keep listening to what your followed users are playing: the player will attempt to play as many tracks as it can play and to keep the music going. The player also has a previous and next track button that allows you to easily skip tracks that you don’t like. Our team has found this feature exciting and to some extent even has started DJing for each other!
We’re pushing into new territory trying to offer music discovery features and trying out new features that we’ve not seen before. Expect bugs, missing features, and reactions of “why didn’t they do X?”. To be honest, we’re not entirely happy with it and we know that there are features missing. But we felt it important to push this out in order to start getting feedback from you — and we are also excited about the Spotify integration! That said, please continue reading and if you feel that we screwed something up, please open a ticket!
Also, keep in mind that we’re pushing against the tide of the music industry. Established players want to keep everything closed, controlled and in their silo (Apple Music, Tidal, etc). Spotify is slightly more open and allows us to record user’s histories and music playback from web pages, so we focused on working on Spotify first.
This has the unfortunate side-effect of making these new features useful only if you have a premium Spotify account, and following users who are not on Spotify is useless: we don’t know how to play this content. This blows — we know it and we hate it ourselves. But we needed to start with something to show what we’re trying to do and to generate some interest. If people are interested, we can start working in supporting more services and making more of the music in our pages playable.
Finally, the recording user’s listens API endpoint at Spotify has an annoying tendency to fall behind sometimes, which means that the flow of listens from Spotify slows or stops altogether, which is… less than ideal. We’re prodding Spotify to keep the bits flowing if at all possible, but know that all of this is a work in progress.
In fact, the release has already generated a flurry of fixes that we’ll push live before too long. A lot of these sorts of fixes are for problems that you can only see when real-live data flows through the data pipelines: these are tricky features to debug!
Please play with the follow feature and tell us what you think! If you know other services that we can use to play music from the data we have available, please comment! If you find bugs or have suggestions for how we make these features better, please open a ticket!
Have fun and discover some new music,
The ListenBrainz Team
For the past few months we’ve been working on enabling ListenBrainz to record your Spotify listening history automatically and we’ve just now released this feature! If you would like ListenBrainz to record your Spotify listening history automatically (and make it public!), go here to link your Spotify account to ListenBrainz. We’ll take care of the rest!
We would like to encourage as many users as possible to record their listening histories in ListenBrainz. With the data we collect and safeguard for you, we will soon start building more music discovery features. Please help our mission and go connect your account now!
This release also adds two new pages: Recent listens and the “follow” page. The recent listens page shows the most recent listens that we’ve saved in ListenBrainz for any user. This is a convenient way for you to discover other users who are currently listening to music.
The follow page is the new feature that we’re really excited about — it allows you to listen to the music that other people are currently listening to — pick a number of users to follow and their recent listens will appear on the page. The new embedded Spotify player can start playing the music as the listens roll in. This allows you to follow your friends and learn about music that they love! We’re going to write another blog post that talks more about the follow page and how we plan to improve that going forward — stay tuned for that.
This release also re-organizes the menu layout a little, moving the most useful features so that they’re easily accessible. Behind the scenes we’ve upgraded to using Python 3.7, starting using some portions of React for our user interface and also found ourselves amazed that this release included 646 commits! We hope to go to a more regular schedule of releases from here on out — this was a big push for us with a lot of infrastructure improvements that were needed.
This release would not have been possible if Monkey (from BookBrainz) didn’t come and help us write the UI for the follow feature. Monkey, iliekcomputers and myself worked relentlessly for weeks trying to push out some exciting features that show off the first steps for what we have planned for ListenBrainz. We’re quite excited for this release and we hope that you’ll enjoy the follow page and discover new music!
We’ve been working on a system to import listens automatically to ListenBrainz from Spotify and we’ve recently deployed it to the ListenBrainz beta site. We would really appreciate it if you could help us test it out!
Please note that this is still beta software, there is a (very small) chance that we might miss a listen or two. So if you’re using this, please make sure that ListenBrainz is not the only service where you’re archiving your listens.
Another thing to note is that importing the same listens from two different sources such as Last.FM and Spotify may cause the creation of duplicates in your listen history. If you opt into our automatic Spotify import, please do not use the Last.FM import or submit listens from other ListenBrainz clients. This is a temporary limitation while we find better ways to deduplicate listens.
One of the first rites of passage when working on a new project is creating your development environment. It always seems simple, but sometimes there are bumps along the way. The first activity I did to begin contributing to ListenBrainz was create my development environment. I wasn’t successful with the documentation in the README, so I had to play around and work with the project before I was even running it.
The first part of this post details how to set up your own development environment. Then, the second half talks about the solution I came up with and my first contribution back to the project.
Hi, I’m Kartikeya Sharma, a postgrad student at National Institute of Technology, Hamirpur. I’ve worked on the project MessyBrainz as a student developer for GSoC 2018. Robert Kaye mentored me during this GSoC programme. The goal of my project is associating MBIDs to MSIDs and clustering together the MSIDs which represents the same MBID. The MBIDs represent MusicBrainz Identifier. It is an Universally Unique Identifier that is permanently assigned to each entity in the MusicBrainz database, MSID represents MessyBrainz Identifier which is associated with each unique recording, artist_credit and release in MessyBrainz database. In simple words MSIDs represents unclean metadata whereas MBIDs represent clean metadata.
This blog post summarizes the work that I did in my project, which was divided into three parts.
Processing the data already in MessyBrainz database
The first part involves creating clusters using the MBIDs already present in the MessyBrainz database. This involves creating clusters for recordings, artists, and releases. To implement this part I created the following three PRs #37, #41, and #44.
After that, I began to work on the second part of this which involves creating clusters using the artist MBIDs and release MBIDs and names fetched from MusicBrainz database. I needed to access MusicBrainz database, for that, I first had to work on BrainzUtils to have methods to access MusicBrainz database to fetch artist MBIDs using recording MBIDs and release name and release MBIDs using recording MBIDs. The part to fetch artist MBIDs was done during the community bonding period in PR #14 at BrainzUtils and to fetch releases I created PR #18 at BrainzUtils during GSoC coding period. After that, I created a PR to create clusters using the fetched artist MBIDs #47 and another one to create clusters using releases fetched #49.
I did write around 60 tests which proved to be vital in making sure that the code does what it’s supposed to do.
Processing the data as it is inserted into the MessyBrainz database
Creating clusters for the data inside the database requires a lot of resources. So, it was better to create clusters as recordings are inserted into the database but, even this type of clustering is not efficient. So, to cluster these recordings first these recordings are sent to rabbitMQ server and from that, these are sent to a clustering script which runs in a different container and runs continuously and clusters the incoming recordings. That way it does not slow down the process of submitting recordings to the database. For this I created PR #50.
Create endpoints to access MSIDs and MBIDs
I created two API endpoints in PR #51.One endpoint is to fetch MBIDs and MSID using an MSID. Another endpoint is to fetch MSIDs using an MBID. This way end users can access MBIDs and MSIDs which may be used for calculating different stats.
Apart from that with the help of my mentor, I did setup a VM to test the above code on the MsB datadump. This task had some challenges: first I had to create indexes for various fields to speed up the process of clustering. Without indexing, it would have taken approximately 37 days but after creating indexes on various fields It just took 3 hours. I found out that PostgreSQL does allow to create indexes on functions too which came into use while creating artist_credit clusters for which I created a custom function. Indexes were created in PR #53. When I ran the clustering code on a VM on which the whole MessyBrainz datadump was present I found out that we have fields in recording_json table which are supposed to store MBIDs but were pointing to empty strings. This was not supposed to happen initially as ListenBrainz is the only source of data for MessyBrainz currently. Submissions to MessyBrainz are restricted from users directly and ListenBrainz does validate listens for that. So, those recordings must have been inserted before that validation was present. To solve the problem I created PR #52.
The summer was a great learning experience for me. I started slowly as things were messy at the start. As at the start everything wasn’t crystal clear to me, I wasn’t sure on how exactly to write scripts that manipulate database and did write the scripts in the most trivial way possible. Here I was doing a query for every single MBID to first check if it’s present in the recording_cluster tables and if not then cluster the recording. Which is conceptually correct but not efficient by any means. And this could be done by executing a single query on the recording_json table to fetch only those recording MBIDs that are not present in recording_redirect table as those are unclustered. That way we don’t have to process the recording MBIDs that have been already processed making the process of clustering efficient.
With time I got an understanding of how clusters are created and how to handle anomalies. Such as James Morrison. In the end, the definition of anomaly can be put as an MSID represents an anomaly if it points to different MBIDs in entity_redirect table (entity can be artist_credit, recording, and release).
Work to be done ahead
The project is still in its initial stages and requires a lot of work to be done before moving it into production. We still need to write integration tests for ClusterWriter and API endpoints. After that, we can work on the Additional Ideas that I proposed in my proposal. We need to figure out some way to associate MBIDs to MSIDs for the artists, recordings, and releases where no MBIDs are present. This does not seem like a trivial task with so many anomalies to take care of.
Last three months have been a great experience for me. I would like to thank Robert Kaye, Param Singh, and Alastair Porter who helped me to solve a lot of problems that I encountered during the entire period. Working on their suggestions and reviews I was able to write good quality code which was efficient as well. The work culture at MetaBrainz inspired me a lot. At MetaBrainz we have weekly IRC meetings where we get to know what others are doing at the organization and also get a place to tell what we did in our past week. I would like to thank MetaBrainz and Google for giving me this chance to get involved in open source on such a cool project. The association of MSIDs to MBIDs can be used by ListenBrainz as stats are calculated on MSIDs which can then be mapped onto MBIDs which represents clean metadata. I would like to work on the project further because of the learning opportunities that are present in the project.
Rob, Suyash, Param and I met in the bustling city of Delhi where “horns are applied very liberally” (it is a very noisy city!) for a mini summit. Some may even call it elaborate break-out sessions on ListenBrainz and CrtiqueBrainz. We had discussions over a span of two days over laptops and notebooks, riding on bumpy roads in tuk-tuks and over spicy chicken biryanis. Here is a summary of all that we discussed:
ListenBrainz Data Visualizations We started Day 1 with graphs for ListenBrainz. After a long marathon of heavy development weightlifting tasks by Param and Rob (how do we work with BigQuery correctlty?), we are finally at a stage, where we can have some really cool amazing visualizations out of our dataset. What will they be? Where will they be? How will we implement them? Can our community pitch in with requests and maybe even play around with code?
After scrounging through a lot of other websites which do music-y data visualizations, and the few responses on our user survey, we started listing various ideas, and went through ideas on our community forum. We ended up dividing the data visualizations (from now on, called graphs) into two categories:
User specific graphs: showcasing a user’s listening history and taste Site-wide graphs: showcasing the overall listening patterns on ListenBrainz
We had to make some tricky calls based on technical constraints, but overall, for starters, we decided some cool user graphs. We have detailed 6 of them over the summit:
Listening history of a user: how much have you listen-ed, what you have you listened too, listen counts, etc
Your top artitsts
Your tracklist (listen history)
How much music did you explore
Which artists are trending in what parts of the worlds
Listener count across the world
All these graphs will be available over different time durations (last week, month, year) and will also have handles to manipulate them. They will also have tools to easily share them on social media networks. We think, our community will really enjoy tracking their listening history with these. We also discussed a few ideas of how we can create a sandbox so our community can pitch in with ideas, vote on ideas and send pull requests for new graphs. More on that later, as we get there!
Rating System If you are listening to a tracklist while working over something, how possible it is that you will rate a track saying “This is 3.5? This is 4.2? That is 5 stars!” So you see, ratings on ListenBrainz are tricky. It is very dynamic and interactive in real time, unlike other dear *Brainz projects, so we think that a Last.fm-like rating i.e like and dislike makes sense for ListenBrainz. There was also some discussion about where the ratings should reside — is CritiqueBrainz the correct place?
Home Page We worked on redesigning the “My Listens” page as well the home page. We now plan to include, apart from the graphs, an infographic explaining how ListenBrainz works and things you can do with it! I will further detail out the mockup later this week.
Potential Roadmap After almost two days of discussions, we could chalk up a rough roadmap for ListenBrainz, which include data visualizations, ability to rate/like tracks, create collections, follow users, and more. This also includes encouraging cross brainz pollination!
CritiqueBrainz With Suyash around (he worked on Critique Brainz as part of GSoC last year, and has been actively involved since), there were obviously a lot of discussions on reinvigorating the project. We discussed quite a few ideas, which included innovating ways of writing and sharing reviews, sharing it on social media, cross *brainz interactions, a few UI changes, etc. We’re considering allowing Quick Reviews that, like Twitter, are limited to 280 characters. What do you think? Suyash has written down his ideas for the same and would love some feedback from the community!
And of course! We couldn’t let Rob’s first visit to India be all about work. After the sunset, we went exploring the city of Delhi. That included rides in tuk-tuks, spicy chicken biryanis, shopping for some colorful clothes and definetly, the Indian chaat 🙂
All in all, it was a very productive mini summit and definitely made us all, more excited to start working on the ideas we discussed. We will keep you updated and post more soon!