GSoC 2025: Advanced User Statistics Visualizations

Greetings, Everyone!

I am Granth Bagadia (holycow23 on IRC), an undergraduate Computer Science student at Birla Institute of Technology and Science (BITS), Pilani. This summer, I had the opportunity to participate in Google Summer of Code 2025 with MetaBrainz, where I worked on introducing advanced user statistics visualizations for ListenBrainz.

I was mentored by Ansh Goyal (ansh on IRC), Kartik Ohri (lucifer on IRC), and Nicolas Pelletier (monkey on IRC). This post summarizes my project, its outcomes, and my experience over the course of the program.

Project Overview

ListenBrainz already provided some listening statistics, but these were limited in scope and depth. My project set out to design and implement advanced statistics that could offer users more meaningful insights into their listening habits. Since ListenBrainz is a user-centric platform, the idea was to create features that would let listeners explore their behavior from multiple perspectives. My original proposal focused on introducing a few key statistics.

The core statistics included:

Genre Trends – showing what genres a user listens to at different hours of the day.
Era Statistics – highlighting which musical eras dominate a user’s listening history.
Artist Evolution – tracking how much a user listens to specific artists over time.

Together, these features enrich the user experience, helping listeners discover patterns and reflect on habits.

To support these, I built a complete statistics pipeline. At the data layer, Spark jobs ingest large volumes of listens and apply transformations such as genre classification, temporal bucketing into eras, and aggregations of listens by artist. These Spark jobs write processed statistics back into a dedicated stats database. A Flask API layer then exposes the aggregated results in a request–response fashion.

On the frontend, React and TypeScript components consume these APIs and render interactive visualizations with Nivo. These charts allow drill-down and time-series exploration, ensuring that users not only see their top genres, eras, and artists but also understand how these evolve over time and context. The combined design delivers both scalability and accessibility: Spark and Flask handle the heavy lifting, while the frontend presents clear, engaging dashboards.

Pre-Community Bonding

I actually began contributing to ListenBrainz in January 2025, primarily working on the frontend side of the project. Most of my early contributions focused on improving the user interface, fixing bugs, and adding visualizations for statistics that were already available in the backend. You can find the complete list of my merged pull requests here.

Before the official community bonding period, I started experimenting with a simpler statistic that did not require a Spark backend. This was a Top Artists with Album Breakdown visualization, which used Python transformations over existing data to show the top artists I listened to, bifurcated into the albums they belonged to. This helped me get comfortable with the ListenBrainz data and also provided users with an immediate insight into their listening patterns. My work for this was merged in PR #3170.

Community Bonding

During the community bonding period, I worked closely with my mentors to refine the scope of the statistics and finalize the features that would be implemented. We discussed multiple approaches and agreed on the set of statistics that would provide both value to users and be feasible within the timeframe. I also prepared frontend mockups to demonstrate how the new stats might look and feel for users.

Setting up the development environment proved to be an important part of this phase. Getting Apache Spark running locally required extra effort, and since I did not have immediate access to the full database, I relied on a development server (wolf) for initial development and testing. By the end of the community bonding period, I was familiar with the ListenBrainz stack and ready to move into coding with a clear roadmap in place.

Coding Period

Before jumping into implementation, I spent time understanding the use cases behind each statistic and exploring the data that was already available in the ListenBrainz stack. This step helped me validate that the statistics I was designing would be actually meaningful for end-users. For example, by looking at available artist, genre, and release year data, I realised that we could derive richer patterns without any major backend changes.

With this clarity, I moved on to preparing mockups (like the one shown below), which served as a bridge between the raw data and the final user-facing visualizations. These mockups not only made it easier to align expectations with my mentors but also ensured that every statistic addressed a clear user need before I started coding. These mockups provided a clear vision of what the user experience should look like and acted as a reference point for both backend and frontend development.

Artist Evolution: Stream/area chart showing how listening to each artist evolves over time.

Genre Trends: Donut/radial chart breaking down genres by hour of day for the user.

Era Statistics: Bar chart by decade with zoom to see individual years (all 10 years in the era).

The next step was to create the base SQL queries for each of the three statistics. Below are trimmed versions of the queries to highlight their core logic.

Note: The below queries are shown in user‑specific form (they group by a single user). A sitewide variant simply removes user_id from SELECT/GROUP BY and aggregates across all users.

1) Artist Evolution — “How many times did I listen to each artist over time?”

SELECT user_id,
       DATE_TRUNC('month', listened_at) AS time_unit,
       artist_mbid,
       artist_credit_name,
       COUNT(*) AS listen_count
FROM listens
JOIN recording_artist USING (recording_mbid)
GROUP BY user_id, time_unit, artist_mbid, artist_credit_name;

Note: The DATE_TRUNC granularity ('month' in this example) varies depending on the stats_range of the statistic. It can be truncated to day, day of the week, month, or year as required.

Groups listens by time unit and artist so we can draw a time‑series of how much you listened to each artist over time.

2) Genre Trend — “Which genres do I listen to at different hours of the day?”

SELECT user_id,
       genre,
       EXTRACT(HOUR FROM listened_at) AS hour_of_day,
       COUNT(*) AS listen_count
FROM listens
LEFT JOIN genres USING (recording_mbid)
WHERE genre IS NOT NULLGROUP BY user_id, genre, hour_of_day;

Note: A sitewide Genre Trend isn’t meaningful since users are in different time zones—UTC hours don’t map to local hours consistently, and ListenBrainz lacks reliable time zone data. With such data, localized and accurate aggregates would be possible.

Surfaces patterns like “more jazz late at night, more pop in the morning” for the individual user.

3) Era Trend — “From which release years is the music I listen to?”

SELECT user_id,
       first_release_date_year AS year,
       COUNT(*) AS listen_count
FROM listens
LEFT JOIN release USING (release_mbid)
LEFT JOIN release_groups USING (release_group_mbid)
WHERE first_release_date_year IS NOT NULLGROUP BY user_id, year;

Counts listens by original release year to show which musical eras dominate your history.

Running Spark locally posed challenges due to memory limitations, which led me to switch to using a development server (wolf) for more reliable execution. Once the environment was stable, I iteratively developed the queries and implemented each statistic one by one. I started with Genre Trends, then moved to Era Statistics, and finally Artist Evolution. This staged approach ensured that each feature was independently functional before progressing further. Throughout the coding period, I interacted with my mentors both asynchronously on Element and synchronously on Google Meet. The Meet sessions were especially helpful during tricky debugging and setup issues, allowing us to resolve blockers faster and keep development moving smoothly.

With Spark pipelines producing results, I turned back to the frontend, implementing the corresponding UI components for each of the three statistics in the same order. This required making adjustments to the ListenBrainz interface so that the new visualizations would fit seamlessly with the existing design. Alongside the implementation, I also wrote frontend tests and Spark tests, again following the same sequence: genre first, then era, and finally artist evolution.

By the end of the coding period, all three statistics were implemented end‑to‑end and shipped to the UI. My work can be seen through the following pull requests:

Genre Activity Statistics: PR #3308
Era Activity Statistics: PR #3315
Artist Evolution Activity Statistics: PR #3314

The pull request for Genre Activity Statistics has already been successfully merged into the ListenBrainz codebase, marking the completion of that feature. The pull requests for Era Activity Statistics and Artist Evolution Activity Statistics are also nearing completion, with only the final rounds of review and testing remaining before integration.

Overall Experience

This summer has been an incredible journey, and I’m deeply grateful to my team at MetaBrainz and the Google Summer of Code organizers. Throughout this experience, I’ve had the unique opportunity to contribute to open source and work on real-life projects. It’s rewarding to see my work on advanced statistics now live in production!

Along the way, I picked up several important technical skills. I became much more comfortable with Git (handling branches, rebases, and reviews) and Docker for setting up reproducible environments. On the backend side, I improved at writing queries and working with Spark jobs to process large amounts of listening data. On the frontend, I gained hands-on experience with React and data visualization libraries like Nivo, which taught me how to turn raw statistics into clear, interactive charts. These learnings not only helped me complete the project but will also stay with me for future work.

Just as importantly, I also learnt how to work in a community-driven environment — discussing ideas openly, writing code that others would review, and collaborating under the guidance of mentors. This experience taught me the value of clear communication, iteration, and flexibility when working on an open-source project with a distributed team.

I would like to thank Ansh, Monkey, and Lucifer for their constant support and feedback throughout the project. Whether it was over Element chats or a quick Google Meet call for deeper debugging, their guidance was invaluable in overcoming challenges and shaping the final outcome. I am also grateful to the MetaBrainz community for being welcoming and collaborative at every stage of the project.

Finally, I am thankful to Google and the MetaBrainz Foundation for providing me with this wonderful opportunity to learn, contribute, and grow as a developer.

Project Overview

Pre-Community Bonding

Community Bonding

Coding Period

Overall Experience

Leave a Reply Cancel reply