GSoC 2018 : Building a design system. Journey and learnings..

Hello,
I am Chhavi. I have mostly been helping around with all things design in MetaBrainz. I recently graduated from IIT Guwahati, India and started contributing to MusicBrainz after attending the summit last year, around the same time.

As a Google Summer of Code student, my project was to build a design system with React UI components for the upcoming overhaul of MusicBrainz’s website. It surely was a really interesting journey, right from when I heard about the community and I would like to share some snippets of it with you!

May 2017: I hear about Picard, and how a bunch of really cool people who meet online are building it. I was intrigued.

Around August 2017: I pop in the IRC channel #metabrainz, and after much overthinking, I drop a “Hi”. Followed was a really warm welcome by people I will soon call friends and a lot of developer-y jargon I had no clue about.

September 2017: I attend the annual MusicBrainz developer summit in Barcelona. And boy oh boy, I am now part of the family. Over the few days there, I have immense fun interacting and learning from the community.

November 2017: We set up our JIRA ticket system for design issues and start working on the mockups for the redesign. The entire community comes together on JIRA tickets and Discourse posts to talk about where we want to go with this overhaul.

January 2018: Community members encourage me to try my hand at front-end development. One is really lucky to find people, who encourage you to grow out of your comfort zone and help you cross that wall. In MetaBrainz, there is no shortage of such kind of people.

March 2018: With little confidence and lots of hopes, I apply for the Google Summer of Code programme. I start learning the ropes of development, with help of online tutorials and obviously our community. We also met for a mini-summit in Delhi to discuss ListenBrainz and spicy food.

April 2018: Hence began my full-fledged journey of learning and spending a summer of coding. It wasn’t easy, but I learned a lot in the process.

We set up the initial design system using react-bootstrap and react-storybook. I then started importing UI components into the system, followed by its documentation. I wrote up a more detailed description of the process too.

August 2018: As of now, we have the design system in place. The future plan is to continue adding components to it as well focus on having well thought contributing guidelines. I will also continue working on designing the mockups for the user interface for various entities.

Google Summer of Code was just another milestone in my journey with MetaBrainz. My time here has been a time of both personal and professional growth. I now feel more comfortable in a development environment, the ongoing chats on IRC make more sense to me and I feel less inhibited to put my thoughts out there. I completed my college, moved cities, traveled… all while having a set of these amazing people I call family.

A special shout out to Rob for keeping me going, bitmap for being ever so patient and understanding, samj1912 for introducing me to MetaBrainz, CatQuest, iliekcomputers, Suyash, Freso, reo and zas for being amazing friends through it all.

The thing I like about our community is, we had seasoned developers as well as newbies like me, all together working together to create amazing stuff. Hoping to continue being an involved and colorful part of this community,

You will obviously keep hearing from me in the coming days,
Chhavi

GSoC 2018: Developing infrastructure for importing data into BookBrainz

Hi everyone!

I’m Shivam Tripathi, an undergraduate student from the Indian Institute of Information Technology, Una. I interned for the MetaBrainz foundation under the Google Summer of Code programme for the year 2018 and worked on the BookBrainz project. I was mentored by Ben Ockmore during this period. This post summarizes my contributions to the project and experiences that I had throughout the summer trying to solve various problems related to the implementation of the project.

Proposal

The original proposal I submitted to Google underwent some modifications as the project progressed, details of which can be found later in this post.

Community Bonding

Summer of Code started with the community bonding period – during which I attended the regular Monday meetings at MetaBrainz’s IRC channel #metabrainz and interacted with the MB community members. I added multiple new entities to the BookBrainz’s website and helped some users with BookBrainz related queries on the community page (intended for support/general QA related to all of MetaBrainz projects).

Also during this period, my mentor Ben Ockmore and I discussed and finalized the architecture of the importer application. It was decided to split the entire importer into two microservices: one for producer (which reads the data dumps and produces generic objects for each record using BookBrainz data storage format) and the other of consumer (which reads and validates the generic object and then insert them into the BB database). It was decided to connect these microservices using a message broker queue (RabbitMQ was finalized). In addition to this, the code repository architecture was decided to be such that we should be able abstract away the entire message broker logic, so that later it would be possible to swap out RabbitMQ with any other hosted service later (like pubsub).

Initial design of the intended importer application.

Fig. 1: Initial design of the intended importer application. For more information, visit the original document.

Coding period

First phase

The program coding period kicked off with making changes to the existing BookBrainz schema to enable it support our new imports. The initial design as discussed here was later updated to include views as well for imports per entity to enable simpler queries.

Following this, I started working on the bookbrainz-data repository to add some basic functions for aiding the import process. I started work in accordance with one of the existing roadmaps for the BookBrainz project which was to shift all database logic from bookbrainz-site to bookbrainz-data – adding features on a per-function basis. Initially it was decided to use Immutable.js for all data flowing in and out of bookbrainz-data-js, but very soon we realized that it was not practical to follow this approach. After some discussions, we finally settled on this repository design change to incorporate new function-oriented functionality. We named this new sub-module func.

Once I had basic functions to handle database transactions in place, I started working on the importer architecture. It was decided to create multiple instances of the producer process each with the ability to run asynchronous operations on it’s own. Similarly, we should be able to fork multiple consumer process, each capable of fetching data from queue and sending it off to the database.

To address this problem, I started working on a module which given a function would make it possible to run multiple processes running multiple instances of the given function. It should be such that we can generate the arguments dynamically for each process and along with some set-up and tear-down actions before and after we fork the process.

To get a better grasp of underlying functionality, one can read the final API and documentation. It’s a generic module which can be used for any functionality. The diagram for it’s execution flow can be found below:

AsyncCluster module execution flow

Fig 2. AsyncCluster module execution flow. For more details, see the complete documentation.

Second phase

While developing producers, I first designed the generic producer object structure for all entity types – an object skeleton which all producers need to create from the read records to be pushed into the queue. This object structure was to be enforced across all data sources, and this helps the consumers to expect an object of fixed nature on which they can later run automated validation tests prior to adding to the database.

As the data dumps were of considerable size, I used data streams to read the data from the flat files and parsed them to create generic entity objects which used BookBrainz data storage format. After parsing each record, I pushed the records into the queue.

Parsing required thorough analysis of the data dumps, and manually mapping each key-value pair in the data dump to the generic object structure. All the data which did not fit the present BB schema (and hence had to be excluded from the generic producer object structure) was added to a metadata field associated with the import. This metadata field is stored as a bjson in the database, so that we can individually query and index any of the fields in the metadata later.

While developing the consumers, I initially set up a validation module. Much of it was adapted from the existing validators on BookBrainz site, which I was able to use without much alteration thanks to the generic producer object structure. The validation modules in the bookbrainz-site have been written to quickly validate the form submitted by the editor post creation/editing of an entity. To use them in the import process, I wrote a converter which transforms the generic producer object to form sections understood by the validation module. Apart from this, I added better error handling to ensure all errors are caught and reported in case a something goes wrong.

Error handling was another important aspect of the import process, apart from the validation process. Being a command line application, tracking errors was central to ensuring that all components were running as intended. At the same time we had to ensure that no record which could be potentially imported into the database was missed. To address these problems, I decided to discard the record if it fails the data integrity validation tests (which means the data is most probably corrupt). But in the case of any transaction error, we give a fixed number of chances to the record before discarding it (by acknowledging the message). A future goal for this process could be to push the erred record into another queue for analysis and replaying of those messages from this queue back to the original queue when the problem gets sorted.

Once the importer was in place, I focused on building up the func.imports module with more functions for the import entities – like discard and approve. I also added functions to fetch recent imports, and a lot other helper code for the imports. I also ensured that all errors occur loudly and never silently slip away. With the help of my mentor, I also migrated most of the functions required for data transactions on the bookbrainz-site. This was crucial to my project – as in many instances the existing functions could not be used due to them initiating their own database transaction for each action. I split all these actions into functions, and bound them with the transaction object they received rather than initiating their own transaction. I also ensured we use modern ES6 features – which made the adapted code much more sleek and compact. It was a long process, as I had to read almost entire of existing code for data transactions on the bookbrainz-site and adapt each of them correctly. All the code finally came together in the create-entity module – which can now be used for entity creation as well as upgrading the imports to entities.

Third Phase

The work on bookbrainz-site and bookbrainz-data mostly happened side by side. First I added a recent imports page – which would fetch most recent imports from the database and display them inside the React component. The recent imports is designed as a single page application which dynamically loads the paginated records and renders them on-screen. The working of the recent page application is as follows:

Recent imports execution flow

Fig. 3: Recent imports execution flow

Next, I added import-entity display pages for all five entity types. They were supposed to display the entity attributes along with links to approve/discard/edit and approve functionality. Approving the import-entity was done so that the user gets redirected to the newly created entity. The import-entity display page for work is as follows:

Work Import Entity Page

Fig. 4: Work Import Entity Page. Similarly, pages were added for Creator, Entity, Publication and Publisher.

In case of a discarded import, I added an extra page similar to existing confirm deletion page – which asks the user to confirm the action and then waits until the entity is deleted before redirecting the user to the home page. The discard page looks as follows:

Discard Import Entity Page

Fig. 5: Discard Import Entity Page

Next, I implemented the editing imports prior to approval. For this, I wrote two modules – one to transform the import to the structure used by the editing form and one to convert form data to an entity. When a user wishes to edit an import, the import is transformed to the form and rendered on the screen. The user can then edit the import. When the user clicks submit, we transform the form data to a new entity type and use the create-entity function to create a new entity in the BookBrainz database. The user is then redirected to the newly created entity page. The code for rendering the form and editing the entity was completely reused for imports with minimal changes. I then added functionality to add imports to the ElasticSearch index, and display them in present results. The final search page is as follows:

Search showing Import Entities

Fig. 6: Search showing Import Entities

Links to the work done

  1. BookBrainz SQL
  2. BookBrainz Import
  3. BookBrainz Data
  4. BookBrainz Site

Conclusion

Last three months have been a fantastic experience for me. Not only did I get to learn a lot of new technologies and write some exciting software, but also I got to brush up my existing skills and interact with the completely awesome MetaBrainz community. Such an opportunity comes truly once in a lifetime, and I extend my sincerest gratitude to Google for running such a great and extremely inclusive programme which allows students from all over the world to avail such an opportunity. Special thanks to my mentor Ben Ockmore for always being patient and helping me out whenever I felt stuck.

Thank you MetaBrainz community for your continuous guidance and support!

!m Google and MetaBrainz

GSoC 2018: SpamBrainz – Fighting spam in MusicBrainz using machine learning

Hi, I’m Leo and I spent my summer building and training SpamBrainz, our new solution to fighting spam in MusicBrainz. If you haven’t heard of SpamBrainz before it’s probably because it did not exist before this year’s Summer of Code.

For quite a while now the amount of spam in MusicBrainz has started to become a serious problem. Often this means editors are automatically created with descriptions that look not unlike the spam emails most of us get every day, promoting other websites and services.

During last year’s MetaBrainz Summit we discussed possible solutions to this and came up with the Spam Ninja system. Essentially this means that Soon™ there will be a group of editors that receive spam reports and have the ability to delete editors and entities that are nothing but spam.

Now with MusicBrainz having almost two million registered editors, could we really expect the Spam Ninjas to manually check every single one of them in addition to all the new registrations? Obviously not, and this is where SpamBrainz comes in.

SpamBrainz is a machine learning system that looks at all editors and decides whether or not it thinks they are spammers. If it thinks they are, it automatically notifies the spam ninjas who then decide whether or not SpamBrainz was correct.

What’s great about this system is that a human is guaranteed to look at any report and at no point does a computer decide that you’re a spammer and should be banned, because no one wants machines to run the world, right?

Building SpamBrainz

While most GSoC projects involve adding features to existing systems, SpamBrainz is something entirely new and I had not built anything on this scale before so I started out by doing tons of research.

When building a machine learning project you should always start by doing some good
old statistics first
and trying to figure out what matters about your data and how the
system could use it. I wrote a couple Jupyter notebooks (which are great for working with data) to do this.

As I was not working for MetaBrainz at the time and had to respect our privacy policy, I wrote a script to collect the most common values of a couple different editors, anonymize them and save them to a report. Using that data I could compare all spam and non-spam editors and decide upon a set of datapoints that would be useful for my machine learning model. Yvanzo then ran these on the live database and I could happily do my data analysis without compromising user privacy.

Next I built a pretty boring Flask-based API that would allow MusicBrainz to queue up editor analysis and training. Quite a few different MetaBrainz projects use Python and need to access the MusicBrainz database so a long time ago someone wise decided to move commonly used code into a repository called brainzutils-python. All I had to do was to add some code for accessing editor data through it.

In a surprise move by ruaok I was then hired by MetaBrainz as a contractor with a yearly salary of 100g of chocolate. I probably should have negotiated what kind of chocolate but what mattered most was that I could now work with user data without breaching our privacy policy.

But before I could build my Keras model I had to decide on a final set of input features and do write code for preprocessing the data. Only then could I finally get started building and testing models.

The current SpamBrainz state of the art model is Lodbrok which actually turned out to work really well, reaching a 99% accuracy in detecting spam while only mis‐classifying 0.2% of real users as spammers. Obviously the latter won’t be a problem because after all a Spam Ninja will still check these reports.

Future outlook

Now that GSoC is over I could just disappear with all the money and leave SpamBrainz in its current state but obviously that’s not what I am planning to do.

I would like to work with zas on getting it deployed along with the Spam Ninja system, improve the code documentation and try to tackle the remaining problem that is online learning (which as it turns out, isn’t as easy as I had thought).

With spam always evolving and spammers already moving to more sophisticated methods than just using editor biographies, I’d also look into building separate models for other entities.

After all SpamBrainz is just getting started and I’m very much looking forward to continuing our journey towards reducing the spam we all have to endure on MusicBrainz and other MetaBrainz projects.

GSoC 2018: More detailed integration of AcousticBrainz with MusicBrainz

Here comes an end to a fantastic summer for this year and time to wrap up my GSoC project which I have been working in for the last 3 months (the official GSoC coding period).

Hello people!!

I am Rashi Sah, an undergraduate student at the National Institute of Technology, Hamirpur, India. I have been working on a really cool AcousticBrainz project for MetaBrainz Foundation Inc. as a participant in Google Summer of Code ‘18. It has been an amazing experience and I’ve learned a lot over the summer, spending countless days and nights to successfully take the project to the stage of completion. I decided to contribute to MetaBrainz in late December, then spent some time understanding the codebase of the project and then began creating pull requests and pushing commits for many features, tasks and fixing bugs since January 2018. This blog post consists of my GSoC experience as a student and the work I’ve done for the program so far.

Before starting the GSoC program, I started looking for some good-first-bugs initially and found some tickets to work on. Then I talked to the AcousticBrainz community members and started contributing. I created some big PRs mostly for adding new features to AcousticBrainz. I also worked on many bug fixes which are already merged into the AcousticBrainz codebase. New feature additions PRs include AB-21, AB-98 and AB-298. In mid‐February, I started looking for a suitable idea to work on for GSoC program and to create a proposal for the same. As the month of March was approaching, I did a lot of proposal discussion with MetaBrainz community members especially with Alastair, AcousticBrainz project lead who has helped me a lot in reviewing and guiding me to improve my proposal to a better extent. Later April, my proposal for a more detailed integration of AcousticBrainz with MusicBrainz got accepted. In the community bonding period, I mostly tried to continue my work which I was already doing for the past 3–4 months.

Getting entity information from the MusicBrainz database

The first thing I worked on when the official GSoC coding period began was adding a way to directly access MusicBrainz database for different entities to the MusicBrainz database module in BrainzUtils (a Python utility for all of our MetaBrainz projects). I worked on getting artist and release entity information from the MusicBrainz database via a direct connection. (See PRs BU-13 and BU-14.) Later, I worked on setting up the MusicBrainz server by adding a service in AcousticBrainz’s docker-compose files allowing us to easily read data directly from the MusicBrainz database in AcousticBrainz (PR AB-334). Our major aim of the project was to implement both the methods of MusicBrainz database access in AcousticBrainz especially importing the MusicBrainz database in AcousticBrainz from scratch and then to decide which methods works better while implementing a particular functionality in AcousticBrainz using MusicBrainz data.

Import the MusicBrainz data in AcousticBrainz database

MusicBrainz’s database contains a huge number of tables, but I analysed the use case of MB data in AB and made a list of those tables that we would actually require in our AcousticBrainz integrations. Then I made a PR (AB-338) for creating new tables in the AB database under the MusicBrainz schema. Later, I worked on a big PR (AB-340) which imports MB data corresponding to each and every recording present in AcousticBrainz’s database and writes the data into the tables of the MusicBrainz schema in AB. This PR was really huge and I had to take care of a lot of integrity constraints and foreign key dependencies.

Update MB data in AB for every new recording added to AB

Another feature I worked on after importing the MB data was updating the MB data present in AB whenever any new recording is added to the AcousticBrainz database (see PR AB-346) by importing the data from MB’s database via the direct connection. While working on a few bug fixes, I and my mentor, Param realized that the MB data import is taking a lot more time than expected when I applied the MusicBrainz importer script for full MB data dumps (of around 2.8 GB). So, I then worked on making the MusicBrainz importer more efficient and was able to import the data for few recordings within seconds (see PR AB-348). I had to figure out a lot for each table import and to detect the parts of the code which were making things slower.

To reduce the load on the processor, I included a sleep schedule of 5 seconds in the MusicBrainz importer module to wait before importing data for any new recording (see PR AB-354). During my GSoC period, I learned how important it is to write tests and make them run fast. I wrote tests for almost every script inside the db module. Later, I worked on writing tests for the MusicBrainz importer script (AB-352).

Apply replication packets to keep MB data in AB updated with the actual MusicBrainz database

Then came another tricky part of this project which was to update the MusicBrainz schema data in AB whenever there is any change in the actual MusicBrainz database whether it is an update or a deletion taking place. MusicBrainz provides hourly replication packets which describe the changes to the database in a specific period. Replication packets are .tar.bz2 archives with a collection of files in them which can be downloaded via the MetaBrainz API. Lukas Lalinsky, a long-time contributor to MetaBrainz projects, the founder of AcoustID and maintainer of the mbdata Python module, had worked on implementing replication packets on MB data. I did a lot of modifications in his script to apply replication packets to the MusicBrainz schema data till it’s recent update for the recordings data present in AcousticBrainz (see AB-350).

Integration with MB database: Use MBID redirect information to get original entity

After working on the direct connection and importing the MusicBrainz data, keeping it updated by all means, it was time to start working on writing evaluation scripts to decide the better method for any integration we apply in AcousticBrainz. I wrote a script to implement an integration in AB with MB database to use the redirect information of an entity and then returns the original entity corresponding to the MBID provided (see PR AB-356).

Evaluate both methods of MusicBrainz database access in AcousticBrainz

Now moving towards the last work of my GSoC period and the most important as well. After working on both the methods, we really needed to evaluate both in order to test which one is more efficient for any specific integration with the MB database. I first wrote an evaluation script which fetches the data from the recording and low-level tables. For this case, the difference between the time taken by both methods comes out to be really large (approx. 70 seconds for around 250+ recordings). So whenever we would have to get the data from local AB tables and MB tables as well, we would go for the import database method as this method turns out to be faster than the other one. Next I tested with the MBID redirect integration part in which I didn’t find much difference between both the methods (PR AB-357). But I ran these tests locally, the tests in production may yield different results.

All in all, it has been an exciting summer. By this time I am familiar with a very good part of the AcousticBrainz codebase. I really look forward to work on adding a lot more integrations with MB data in AcousticBrainz and plan to completely remove AB’s dependency over the web service to use the MusicBrainz database which would be very useful for the users.

Details of contributions made

By the end of the GSoC coding period, I have opened a total of 39 PRs of which 35 are pull requests to the AcousticBrainz server, 3 are pull requests to BrainzUtils and 1 pull request to the AcousticBrainz client and have made a total of 135 commits (109 in AB, 9 in BU, 3 in AC and 14 in AB master) and out of them, pull requests created and merged during the official GSoC coding period are PRs to AcousticBrainz server and PRs to Brainzutils.

These last three months were full of thrill, excitement and much frustration as well. And this doesn’t end here, I’d love to contribute in the future and act as a maintainer for the AcousticBrainz project. I believe people must try to contribute to open source organizations as it helps you learn and gain much experience in a short period of time especially when working for a great platform like Google Summer of Code.

I am really happy working with the awesome MetaBrainz community and the people here are fantastic. I’d love to stay being a part of MetaBrainz in future as well. So in the end a big thanks to my mentor Param Singh, without his help & support throughout the program, wouldn’t have been possible for me to reach the end phase of GSoC, and my organization admin Robert Kaye, AcousticBrainz project lead Alastair Porter and all of the MetaBrainz Foundation community members for choosing me as a GSoC student and thus providing me such a great opportunity and also for being very kind and helpful throughout the program. And I want to thank Google for making this all possible. Hope I get a chance to work with you all again!!

GSoC 2018: A way to associate listens with MBIDs

Hi, I’m Kartikeya Sharma, a postgrad student at National Institute of Technology, Hamirpur. I’ve worked on the project MessyBrainz as a student developer for GSoC 2018. Robert Kaye mentored me during this GSoC programme. The goal of my project is associating MBIDs to MSIDs and clustering together the MSIDs which represents the same MBID. The MBIDs represent MusicBrainz Identifier. It is an Universally Unique Identifier that is permanently assigned to each entity in the MusicBrainz database, MSID represents MessyBrainz Identifier which is associated with each unique recording, artist_credit and release in MessyBrainz database. In simple words MSIDs represents unclean metadata whereas MBIDs represent clean metadata.

This blog post summarizes the work that I did in my project, which was divided into three parts.

Processing the data already in MessyBrainz database

The first part involves creating clusters using the MBIDs already present in the MessyBrainz database. This involves creating clusters for recordings, artists, and releases. To implement this part I created the following three PRs #37, #41, and #44.

After that, I began to work on the second part of this which involves creating clusters using the artist MBIDs and release MBIDs and names fetched from MusicBrainz database. I needed to access MusicBrainz database, for that, I first had to work on BrainzUtils to have methods to access MusicBrainz database to fetch artist MBIDs using recording MBIDs and release name and release MBIDs using recording MBIDs. The part to fetch artist MBIDs was done during the community bonding period in PR #14 at BrainzUtils and to fetch releases I created PR #18 at BrainzUtils during GSoC coding period. After that, I created a PR to create clusters using the fetched artist MBIDs #47 and another one to create clusters using releases fetched #49.

I did write around 60 tests which proved to be vital in making sure that the code does what it’s supposed to do.

Processing the data as it is inserted into the MessyBrainz database

Creating clusters for the data inside the database requires a lot of resources. So, it was better to create clusters as recordings are inserted into the database but, even this type of clustering is not efficient. So, to cluster these recordings first these recordings are sent to rabbitMQ server and from that, these are sent to a clustering script which runs in a different container and runs continuously and clusters the incoming recordings. That way it does not slow down the process of submitting recordings to the database. For this I created PR #50.

Create endpoints to access MSIDs and MBIDs

I created two API endpoints in PR #51.One endpoint is to fetch MBIDs and MSID using an MSID. Another endpoint is to fetch MSIDs using an MBID. This way end users can access MBIDs and MSIDs which may be used for calculating different stats.

Apart from that with the help of my mentor, I did setup a VM to test the above code on the MsB datadump. This task had some challenges: first I had to create indexes for various fields to speed up the process of clustering. Without indexing, it would have taken approximately 37 days but after creating indexes on various fields It just took 3 hours. I found out that PostgreSQL does allow to create indexes on functions too which came into use while creating artist_credit clusters for which I created a custom function. Indexes were created in PR #53. When I ran the clustering code on a VM on which the whole MessyBrainz datadump was present I found out that we have fields in recording_json table which are supposed to store MBIDs but were pointing to empty strings. This was not supposed to happen initially as ListenBrainz is the only source of data for MessyBrainz currently. Submissions to MessyBrainz are restricted from users directly and ListenBrainz does validate listens for that. So, those recordings must have been inserted before that validation was present. To solve the problem I created PR #52.

The summer was a great learning experience for me. I started slowly as things were messy at the start. As at the start everything wasn’t crystal clear to me, I wasn’t sure on how exactly to write scripts that manipulate database and did write the scripts in the most trivial way possible. Here I was doing a query for every single MBID to first check if it’s present in the recording_cluster tables and if not then cluster the recording. Which is conceptually correct but not efficient by any means. And this could be done by executing a single query on the recording_json table to fetch only those recording MBIDs that are not present in recording_redirect table as those are unclustered. That way we don’t have to process the recording MBIDs that have been already processed making the process of clustering efficient.

With time I got an understanding of how clusters are created and how to handle anomalies. Such as James Morrison. In the end, the definition of anomaly can be put as an MSID represents an anomaly if it points to different MBIDs in entity_redirect table (entity can be artist_credit, recording, and release).

Work to be done ahead

The project is still in its initial stages and requires a lot of work to be done before moving it into production. We still need to write integration tests for ClusterWriter and API endpoints. After that, we can work on the Additional Ideas that I proposed in my proposal. We need to figure out some way to associate MBIDs to MSIDs for the artists, recordings, and releases where no MBIDs are present. This does not seem like a trivial task with so many anomalies to take care of.

Last three months have been a great experience for me. I would like to thank Robert Kaye, Param Singh, and Alastair Porter who helped me to solve a lot of problems that I encountered during the entire period. Working on their suggestions and reviews I was able to write good quality code which was efficient as well. The work culture at MetaBrainz inspired me a lot. At MetaBrainz we have weekly IRC meetings where we get to know what others are doing at the organization and also get a place to tell what we did in our past week. I would like to thank MetaBrainz and Google for giving me this chance to get involved in open source on such a cool project. The association of MSIDs to MBIDs can be used by ListenBrainz as stats are calculated on MSIDs which can then be mapped onto MBIDs which represents clean metadata. I would like to work on the project further because of the learning opportunities that are present in the project.