BookBrainz is now an official MetaBrainz project!

After many years as a community driven project and often under-staffed, the BookBrainz project has always been the red-headed step child of our projects. A few weeks ago I asked if the community felt that we should make BookBrainz an official project of the foundation and got a very positive response.

After that, we started informally seeking developers to take on this position, leading to the hire of Monkey, who will now be the lead of the BookBrainz project, taking over for Ben Ockmore. Ben will take on a contributor role to BookBrainz going forward and remain on the project! Thanks for all of your hard efforts in the past, Ben!

While Monkey comes up to speed on the codebase, we’ve been brainstorming what features he should focus on first . The short term focus on BookBrainz will be on bringing it into our hosting setup at Hetzner, which means making the codebase ready for running inside of docker with all of the MetaBrainz specific hosting quirks. Part of this project will be to remove elastic search and to utilize our new Solr based search system that we recently released for MusicBrainz.

After getting BookBrainz moved to our hosting facility that focus will be to create a minimally viable product. What exactly does this mean? One of the frequent complaints I’ve received about BookBrainz is that it is missing core functionality of a proper metadata project. Core functionality means that a user should be able to view and edit all of the metadata that is in BookBrainz and then retrieve this data from the BookBrainz API. It should include full data dumps with incremental data dumps being added a bit later.

What do you think the missing core features of BookBrainz are?

Finally, we’re in discussions with the OpenLibrary team, wondering how to best work together and not to duplicate efforts — we’ll post more about this once we’ve reached an agreement with the OpenLibrary team on how we should proceed.

Thanks!

MetaBrainz team changes, autumn 2018

Hello!

The only constant in the world is change, right?

First off, the somewhat sad news: Sambhav, AKA samj1912, has left MetaBrainz the team as a contractor and has moved to London. The upside of this news is that he will continue to work on Picard for us and will remain a part of our team as a volunteer, but his presence will not be quite as intense as before. Thank you for your hard work these past months, especially for finishing the impossible Solr search project!

With Sambhav’s departure and our improved finances, I’m proud to announce that we’re taking on two new contractors!

Nicolas Pelletier AKA Monkey: You may remember the talented Monkey from when we designed our new logos. He was the designer who created the logos and our new bootstrap theme that adorns most of our pages now. Working with Monkey was straightforward, effective and the results were great, so when he expressed interest in working on BookBrainz, I was pleased to hear this news. Monkey will be working for us full time and spending 75% of his time on BookBrainz and 25% of his time to help with design and UX work for the rest of our projects. In the next blog post I’ll talk more about BookBrainz and what we can expect from that project in the future.

Nicolás Tamargo AKA Reosarevok: Reosarevok is no stranger to our community — he’s made 1.7M edits to MusicBrainz, is our Style BDFL and answers all of our support@ emails. He’s been learning more programming and asked to be part of the MusicBrainz team part time. We agreed to give this a go and in the short term he will be focusing on genre support and helping with the React migration among other tasks. If this trial run works out, we’ll see about expanding his scope on our team.

Welcome on board Monkey and good luck with the new position, Reo!

 

 

GSoC 2018: Developing infrastructure for importing data into BookBrainz

Hi everyone!

I’m Shivam Tripathi, an undergraduate student from the Indian Institute of Information Technology, Una. I interned for the MetaBrainz foundation under the Google Summer of Code programme for the year 2018 and worked on the BookBrainz project. I was mentored by Ben Ockmore during this period. This post summarizes my contributions to the project and experiences that I had throughout the summer trying to solve various problems related to the implementation of the project.

Proposal

The original proposal I submitted to Google underwent some modifications as the project progressed, details of which can be found later in this post.

Community Bonding

Summer of Code started with the community bonding period – during which I attended the regular Monday meetings at MetaBrainz’s IRC channel #metabrainz and interacted with the MB community members. I added multiple new entities to the BookBrainz’s website and helped some users with BookBrainz related queries on the community page (intended for support/general QA related to all of MetaBrainz projects).

Also during this period, my mentor Ben Ockmore and I discussed and finalized the architecture of the importer application. It was decided to split the entire importer into two microservices: one for producer (which reads the data dumps and produces generic objects for each record using BookBrainz data storage format) and the other of consumer (which reads and validates the generic object and then insert them into the BB database). It was decided to connect these microservices using a message broker queue (RabbitMQ was finalized). In addition to this, the code repository architecture was decided to be such that we should be able abstract away the entire message broker logic, so that later it would be possible to swap out RabbitMQ with any other hosted service later (like pubsub).

Initial design of the intended importer application.

Fig. 1: Initial design of the intended importer application. For more information, visit the original document.

Coding period

First phase

The program coding period kicked off with making changes to the existing BookBrainz schema to enable it support our new imports. The initial design as discussed here was later updated to include views as well for imports per entity to enable simpler queries.

Following this, I started working on the bookbrainz-data repository to add some basic functions for aiding the import process. I started work in accordance with one of the existing roadmaps for the BookBrainz project which was to shift all database logic from bookbrainz-site to bookbrainz-data – adding features on a per-function basis. Initially it was decided to use Immutable.js for all data flowing in and out of bookbrainz-data-js, but very soon we realized that it was not practical to follow this approach. After some discussions, we finally settled on this repository design change to incorporate new function-oriented functionality. We named this new sub-module func.

Once I had basic functions to handle database transactions in place, I started working on the importer architecture. It was decided to create multiple instances of the producer process each with the ability to run asynchronous operations on it’s own. Similarly, we should be able to fork multiple consumer process, each capable of fetching data from queue and sending it off to the database.

To address this problem, I started working on a module which given a function would make it possible to run multiple processes running multiple instances of the given function. It should be such that we can generate the arguments dynamically for each process and along with some set-up and tear-down actions before and after we fork the process.

To get a better grasp of underlying functionality, one can read the final API and documentation. It’s a generic module which can be used for any functionality. The diagram for it’s execution flow can be found below:

AsyncCluster module execution flow

Fig 2. AsyncCluster module execution flow. For more details, see the complete documentation.

Second phase

While developing producers, I first designed the generic producer object structure for all entity types – an object skeleton which all producers need to create from the read records to be pushed into the queue. This object structure was to be enforced across all data sources, and this helps the consumers to expect an object of fixed nature on which they can later run automated validation tests prior to adding to the database.

As the data dumps were of considerable size, I used data streams to read the data from the flat files and parsed them to create generic entity objects which used BookBrainz data storage format. After parsing each record, I pushed the records into the queue.

Parsing required thorough analysis of the data dumps, and manually mapping each key-value pair in the data dump to the generic object structure. All the data which did not fit the present BB schema (and hence had to be excluded from the generic producer object structure) was added to a metadata field associated with the import. This metadata field is stored as a bjson in the database, so that we can individually query and index any of the fields in the metadata later.

While developing the consumers, I initially set up a validation module. Much of it was adapted from the existing validators on BookBrainz site, which I was able to use without much alteration thanks to the generic producer object structure. The validation modules in the bookbrainz-site have been written to quickly validate the form submitted by the editor post creation/editing of an entity. To use them in the import process, I wrote a converter which transforms the generic producer object to form sections understood by the validation module. Apart from this, I added better error handling to ensure all errors are caught and reported in case a something goes wrong.

Error handling was another important aspect of the import process, apart from the validation process. Being a command line application, tracking errors was central to ensuring that all components were running as intended. At the same time we had to ensure that no record which could be potentially imported into the database was missed. To address these problems, I decided to discard the record if it fails the data integrity validation tests (which means the data is most probably corrupt). But in the case of any transaction error, we give a fixed number of chances to the record before discarding it (by acknowledging the message). A future goal for this process could be to push the erred record into another queue for analysis and replaying of those messages from this queue back to the original queue when the problem gets sorted.

Once the importer was in place, I focused on building up the func.imports module with more functions for the import entities – like discard and approve. I also added functions to fetch recent imports, and a lot other helper code for the imports. I also ensured that all errors occur loudly and never silently slip away. With the help of my mentor, I also migrated most of the functions required for data transactions on the bookbrainz-site. This was crucial to my project – as in many instances the existing functions could not be used due to them initiating their own database transaction for each action. I split all these actions into functions, and bound them with the transaction object they received rather than initiating their own transaction. I also ensured we use modern ES6 features – which made the adapted code much more sleek and compact. It was a long process, as I had to read almost entire of existing code for data transactions on the bookbrainz-site and adapt each of them correctly. All the code finally came together in the create-entity module – which can now be used for entity creation as well as upgrading the imports to entities.

Third Phase

The work on bookbrainz-site and bookbrainz-data mostly happened side by side. First I added a recent imports page – which would fetch most recent imports from the database and display them inside the React component. The recent imports is designed as a single page application which dynamically loads the paginated records and renders them on-screen. The working of the recent page application is as follows:

Recent imports execution flow

Fig. 3: Recent imports execution flow

Next, I added import-entity display pages for all five entity types. They were supposed to display the entity attributes along with links to approve/discard/edit and approve functionality. Approving the import-entity was done so that the user gets redirected to the newly created entity. The import-entity display page for work is as follows:

Work Import Entity Page

Fig. 4: Work Import Entity Page. Similarly, pages were added for Creator, Entity, Publication and Publisher.

In case of a discarded import, I added an extra page similar to existing confirm deletion page – which asks the user to confirm the action and then waits until the entity is deleted before redirecting the user to the home page. The discard page looks as follows:

Discard Import Entity Page

Fig. 5: Discard Import Entity Page

Next, I implemented the editing imports prior to approval. For this, I wrote two modules – one to transform the import to the structure used by the editing form and one to convert form data to an entity. When a user wishes to edit an import, the import is transformed to the form and rendered on the screen. The user can then edit the import. When the user clicks submit, we transform the form data to a new entity type and use the create-entity function to create a new entity in the BookBrainz database. The user is then redirected to the newly created entity page. The code for rendering the form and editing the entity was completely reused for imports with minimal changes. I then added functionality to add imports to the ElasticSearch index, and display them in present results. The final search page is as follows:

Search showing Import Entities

Fig. 6: Search showing Import Entities

Links to the work done

  1. BookBrainz SQL
  2. BookBrainz Import
  3. BookBrainz Data
  4. BookBrainz Site

Conclusion

Last three months have been a fantastic experience for me. Not only did I get to learn a lot of new technologies and write some exciting software, but also I got to brush up my existing skills and interact with the completely awesome MetaBrainz community. Such an opportunity comes truly once in a lifetime, and I extend my sincerest gratitude to Google for running such a great and extremely inclusive programme which allows students from all over the world to avail such an opportunity. Special thanks to my mentor Ben Ockmore for always being patient and helping me out whenever I felt stuck.

Thank you MetaBrainz community for your continuous guidance and support!

!m Google and MetaBrainz

BookBrainz GSoC Gamification/Achievement System

Hi guys, I’m Max (AKA QuoraUK), a university student working with BookBrainz as part of Google Summer of Code. My project this summer has been to build a new gamification system, that introduces rewards for BookBrainz users and recognises their achievements. Here I will explain the system and the features I’ve implemented.

Overview

My original specification for the gamification system is here. To summarise, the idea behind gamification is to add game-like elements to the site in order to make it more engaging for users. The plan for the gamification of BookBrainz was:

  • Add badges and titles for users to earn on the BookBrainz site
  • Allow users to display badges and titles on their profile page
  • Encourage regular and high quality content

To implement this plan we have added 12 achievement tracks – once an achievement track is completed a title is unlocked. The artwork for the badges is currently “programmer art” and we are very open to other people designing replacements for them. This could be a part of this year’s Google Code-In. The achievements that will be available on launch are:

revisioncreator
Revisionist: Perform (1, 50, 250) Revision(s); Creator Creator: Create (1, 10, 100) Creator(s)
limitedpublisher
Limited Edition: Create (1, 10, 100) Edition(s); Publisher: Create (1, 10, 100) Publication(s)
pubcreatworker
Publisher Creator: Create (1, 10, 100) Publisher(s); Worker Bee: Create (1, 10, 100) Work(s)
runnerexplorer
Sprinter: Create 10 revisions in an hour; Fun Runner: Create a revision a day for a week; Marathoner: Create a revision a day for 30 days; Explorer: View (10, 100, 1000) Entities
timetrack
Time Traveller: Create an edition before it is released; Hot Off the Press: Create an edition within a week of release

All of these are unit tested and have unique badges for each tier on the track. If you would have already unlocked these achievements before the system was launched, you will earn them with your next revision/creation. Badge templates are available for developers to introduce new badges and adding achievements can be as simple as making a badge and adding a few lines of code.

Profile Page

profilednd
Profile Page, Drag and drop badge selector

The gamification system also brings some changes to the profile page. There is now a badge box which can contain your three favorite badges. Additionally, your selected title is shown next to your username. You can select your favorite badges in the new achievements menu on the profile, then drag and drop your favorites into the boxes. Titles can be selected by going to Edit Profile, and selecting them from the drop down menu.

Other Areas

2016-08-20_16-12-21
Achievement Alert

On creation of an entity or revision you will now see an alert if an achievement is unlocked. This will prompt you to go to your profile page and set the ones you want to display. Usernames in other areas of the site can be hovered over in order to see the title they have set.

Demonstration

Here is a demonstration video I’ve made for the system:


Continue reading “BookBrainz GSoC Gamification/Achievement System”