GSoC 2022: Unified Form Editor for BookBrainz

Hi, I am Shubham gupta (IRC Shubh) pursuing my bachelor’s from the National Institute of Technology, Kurushetra. This year I participated in Google Summer of Code and implemented a new editor in Bookbrainz.

In this project, I was mentored by Nicolas Pelletier (IRC monkey). The purpose of this blog is to summarize my contribution made for this project and share my experiences along the way.

Before GSoC

I joined metabrainz at the end of November’21, due to my affection for novels I instantly fell in love with BookBrainz project. I initially started working on small bug fixes and typo corrections but later shifted to work on more challenging features.

My first challenging work was to pre-fill the current entity editor with POST requests which was required for user scripts to work and also created some user scripts to help simplify the creation process.

Later I worked on to upgrade react-select and completing notification feature on BB.

Proposal

For GSoC’22, I made a proposal to work on implementing a unified form on BookBrainz.

My main motivation behind this project was to make the entity creation process more intuitive and simpler for new users. The purpose of this project is to unify all the workflows of entity creation into a simpler book interface, this abstracts away the BookBrainz specifics for users and provides them an easy interface to work with.

Though a lot has changed since the proposal, from design to implementation details, the main idea behind the project remained unchanged.

Community Bonding

During this period, I worked with my mentor to finalize the design for the editor. This included a lot of back & forth discussion but finally, we ended up picking a base design that was similar to how we choose a book: we first go through the book’s cover and back cover, then its details and inner contents.

I also discussed a new timeline with my mentor which incorporated my university classes and exams. Following this I started implementing the editor from this period itself.

Coding Period

First Phase

By this phase, I completed all the mockups and made relevant changes in design as suggested by community members.

We ended up with the following design, also later we added a summary section as a new tab to make reviewing new entities easier.

unified-form mock

I started working on the new editor routes which can support multiple entity submissions for the creation and later added support for editing as well.

Pull requests: #847#858

Since a lot of implementation was similar to existing entity routes, the main thing that was missing was to unify them into one api and make it support temporary BBIDs for new entities.

The main idea behind keeping temporary BBIDs was to allow late submission of entities, meaning new entities would only be created when the user submits the whole form. This allowed a user to undo their actions and gave them more granular control over entity creation/modification. But following this approach resulted in a lot of duplicate code which was hard to generalize due to temporary ids; this was later fixed in the second phase.

I completed working on the routes part with suitable tests and started working on the React front-end. I started by setting up a Redux store to handle multiple entity states, after some discussion with my mentor we ended up going with the state design that segregates each entity into their own states.

At the end of this phase, the editor looked something like this

First Phase Screenshot

Second Phase

During this period I continued working on the editor interface since that is the meat of the project it took most of the time of this project.

Frontend PR:#850

The challenging part of managing a large store like we had was to minimize the state updates as much as possible, since this was so crucial for performance I spent about a week reading redux articles and profiling editor state. All this paid off and resulted in blazingly fast editors (entity/unified-form) with minimal state update calls, which also benefited the existing editing pages.

The solution was to reduce the scope of a redux state by memoizing the components as much as possible and caching the results of expensive calculations which reduced component load time drastically.

After implementing all entity creation workflows, I moved to linking them, either through relationship or by some other attributes.

This linking process needs to be automatic and users don’t have to know the relationships, they should also be able to opt-out of linking specific entities with respective checkboxes.

An example of linking entities is Series-Work, where selecting a Work already adds it to a Series item.

Unified Form Series Editor

We also introduce a major change in the way we submit the entities: we now submit new entities directly to the server. This reduced the duplicate code by half as compared to before since now we don’t have to manage those temporary ids anymore. This also resulted in reducing the amount of work potentially lost when an error occurs during filling the form.

I also wrote mocha/enzyme tests for required React components. This is all for the frontend PR.

I made the follow-up PRs to improve UI and introduced bug fixes: #872#871#874

Overall Experience

I enjoyed working together with my mentor on such a large project. I learned a lot during my journey and understood the importance of different phases of software development. I realized the importance of carefully designing the application and discussing the ideas with other team members. I also got to know a lot about testing and why it is so important for large projects like this, overall the best learning experience I could ask for.

Also, the members of the MetaBrainz foundation are very supportive and help each other to resolve issues. Lastly special thanks to my mentor Nicolas Pelletier who helped me a lot during my GSoC journey. He always supported me and encouraged me even when things weren’t looking good. He is truly one of the most amazing people I’ve ever met!

GSoC 2021: Series Entity for BookBrainz

Hi everyone, I am Akash Gupta, currently pursuing my undergraduate from Kalinga Institute of Industrial Technology. This summer, I participated in Google Summer of Code and developed a new feature — Series Entity— for the project BookBrainz.

I was mentored by Nicolas Pelletier (monkey on IRC) during this period. This post summarizes my contributions to the project and the experiences that I had throughout the summer.

Continue reading “GSoC 2021: Series Entity for BookBrainz”

GSoC 2020: User Collection for BookBrainz

Hi everyone, I am Prabal Singh currently studying in Indian Institute of Technology, Guwahati. This summer I participated in Google Summer of Code and developed a new feature – User Collections – for the project BookBrainz.

I was mentored by Nicolas Pelletier (Mr_Monkey on IRC) during this period. This post summarizes my contributions to the project.

Continue reading “GSoC 2020: User Collection for BookBrainz”

Introducing the BookBrainz merging tool

Today we come with a big BookBrainz website update that allows you to merge duplicate entities!

Being able to clean up the database is an essential step towards importing public bibliographic records and catalogs from partner websites. As with MusicBrainz, you can visit an entity page on BookBrainz and click on a button to add an entity to a merge queue. You can merge multiple entities in one go easily.

BookBrainz merge queue

After clicking the merge button you will be presented with a page that lets you review and select the correct information in case of conflicting data. The revision history of merged entities is preserved, and in the near future you’ll be able undo merges.

BookBrainz merge page

Your feedback is very welcome! We also have a short tutorial on how to use the new merge tool for the curious.

This latest website update also adds annotations for any information that does not fit into the existing format, some small design improvements and bug fixes.

We’ve also added the ability to search for users on the search page. This last feature will come in handy soon as we introduce collaborative User Collections; stay tuned!

State of the Brainz: 2019 MetaBrainz Summit highlights

The 2019 MetaBrainz Summit took place on 27th–29th of September 2019 in Barcelona, Spain at the MetaBrainz HQ. The Summit is a chance for MetaBrainz staff and the community to gather and plan ahead for the next year. This report is a recap of what was discussed and what lies ahead for the community.

Continue reading “State of the Brainz: 2019 MetaBrainz Summit highlights”

GSoC 2019: JSON Web API for BookBrainz

The time has come to wrap up the very productive and learning summer of the last 3 months as a GSoC student with MetaBrainz.

Hello Everyone!!

I am Akhilesh Kumar, a recent graduate from the National Institute of Technology, Hamirpur, India. I have been working on BookBrainz for MetaBrainz Foundation Inc. as a participant in the Google Summer of Code ’19. It has been an amazing experience and I’ve learned a lot over the summer. I was mentored by Nicolas Pelletier (Mr_Monkey on IRC) during this period. This post summarizes my contributions to the project and the experiences that I had throughout the summer.

Continue reading “GSoC 2019: JSON Web API for BookBrainz”

Automating the voting system

MetaMetaData

For the last several years, one of the things our community has struggled with is a lack of active voters. We’ve tried to implement various measures to decrease the need for voters and load for the wonderful ones that actually do actively look through edits and help vote on them—e.g., making more edits auto‐edits and decreasing amount of time edits stay open. However, the edit queue is still quite unwieldy and as such we’ve kept trying to come up with other ways to decrease the load on our contributors.

Over the past few months since our last summit, we’ve been working on training AIs, both for recommendation engines and data analytics, and for helping out with spam, but it soon appeared that we had another valuable dataset: our history of 15,693,824 votes from 16,336 voters and 56,374,198 edits from 2,007,134 editors. It turns out that this is an unintended side-effects of the editing and voting system in that it creates a paper trail of our habits as a community and our collective mind.

A paper trail that you could, say, train a neural network on. And that’s just what we did.

By feeding data from our top voters, we’ve been able to train our network to replicate with 96.4% accuracy the personality when using the other half as test data. That figure is the average for 300 bots each based on our top 300 voters.
We were really impressed with the results but the story doesn’t stop there…

Meet BrainzVoter

The next logical step was to create our own Frankenstein’s monster. By training on 70% of our entire set of votes, we gave birth to a voting bot that represents the essence of our community. “BrainzVoter”, as we dubbed it, is precise and scores a staggering 98.9% accuracy on test data and comparing with the other 30% of our dataset.

To quote the late Terry Pratchet:

Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.

Edit filters

In view of the recent developments on net neutrality taken by the European Union with articles 11 & 13/17, MusicBrainz is taking measures to protect against copyright infringement: we’re implementing automatic edit filters. BrainzVoter will use the latest in NLP technology to understand what you, the editors, write in your edit notes, and use this understanding to vote on your edit. It will also inspect any URLs included in the edit note to cross-reference the data. The aggregate data will not be available to the public.

Edits with better and clearer notes will become more likely to pass. Consider this a good opportunity to (re‐)read How to Write Edit Notes!

How will this affect me as an editor?

Not much will change, and you can continue doing what you were doing before! We recommend that you take the time to make clear statements in your edit notes.
You will also be able to use a system of tags to express intent, using for example #typo #correction in the content of your edit text. Syntax highlighting and shortcuts will be available in the text editor.

In the end, by removing the need for humans to look over edits, the bot should give you, the editor, more time to add and edit and fix data in MusicBrainz, without having to spend time checking everyone else’s edits or worry about other editors disagreeing with yours!

After a brief trial period on MusicBrainz, this system will be adapted and also rolled out to BookBrainz.

We hope you will share our excitement for the benefits of automation and help us improve our training models over time. I, for one, welcome our AI overlords.

BookBrainz is now an official MetaBrainz project!

After many years as a community driven project and often under-staffed, the BookBrainz project has always been the red-headed step child of our projects. A few weeks ago I asked if the community felt that we should make BookBrainz an official project of the foundation and got a very positive response.

After that, we started informally seeking developers to take on this position, leading to the hire of Monkey, who will now be the lead of the BookBrainz project, taking over for Ben Ockmore. Ben will take on a contributor role to BookBrainz going forward and remain on the project! Thanks for all of your hard efforts in the past, Ben!

While Monkey comes up to speed on the codebase, we’ve been brainstorming what features he should focus on first . The short term focus on BookBrainz will be on bringing it into our hosting setup at Hetzner, which means making the codebase ready for running inside of docker with all of the MetaBrainz specific hosting quirks. Part of this project will be to remove elastic search and to utilize our new Solr based search system that we recently released for MusicBrainz.

After getting BookBrainz moved to our hosting facility that focus will be to create a minimally viable product. What exactly does this mean? One of the frequent complaints I’ve received about BookBrainz is that it is missing core functionality of a proper metadata project. Core functionality means that a user should be able to view and edit all of the metadata that is in BookBrainz and then retrieve this data from the BookBrainz API. It should include full data dumps with incremental data dumps being added a bit later.

What do you think the missing core features of BookBrainz are?

Finally, we’re in discussions with the OpenLibrary team, wondering how to best work together and not to duplicate efforts — we’ll post more about this once we’ve reached an agreement with the OpenLibrary team on how we should proceed.

Thanks!

GSoC 2018: Developing infrastructure for importing data into BookBrainz

Hi everyone!

I’m Shivam Tripathi, an undergraduate student from the Indian Institute of Information Technology, Una. I interned for the MetaBrainz foundation under the Google Summer of Code programme for the year 2018 and worked on the BookBrainz project. I was mentored by Ben Ockmore during this period. This post summarizes my contributions to the project and experiences that I had throughout the summer trying to solve various problems related to the implementation of the project.

Proposal

The original proposal I submitted to Google underwent some modifications as the project progressed, details of which can be found later in this post.

Community Bonding

Summer of Code started with the community bonding period – during which I attended the regular Monday meetings at MetaBrainz’s IRC channel #metabrainz and interacted with the MB community members. I added multiple new entities to the BookBrainz’s website and helped some users with BookBrainz related queries on the community page (intended for support/general QA related to all of MetaBrainz projects).

Also during this period, my mentor Ben Ockmore and I discussed and finalized the architecture of the importer application. It was decided to split the entire importer into two microservices: one for producer (which reads the data dumps and produces generic objects for each record using BookBrainz data storage format) and the other of consumer (which reads and validates the generic object and then insert them into the BB database). It was decided to connect these microservices using a message broker queue (RabbitMQ was finalized). In addition to this, the code repository architecture was decided to be such that we should be able abstract away the entire message broker logic, so that later it would be possible to swap out RabbitMQ with any other hosted service later (like pubsub).

Initial design of the intended importer application.

Fig. 1: Initial design of the intended importer application. For more information, visit the original document.

Coding period

First phase

The program coding period kicked off with making changes to the existing BookBrainz schema to enable it support our new imports. The initial design as discussed here was later updated to include views as well for imports per entity to enable simpler queries.

Following this, I started working on the bookbrainz-data repository to add some basic functions for aiding the import process. I started work in accordance with one of the existing roadmaps for the BookBrainz project which was to shift all database logic from bookbrainz-site to bookbrainz-data – adding features on a per-function basis. Initially it was decided to use Immutable.js for all data flowing in and out of bookbrainz-data-js, but very soon we realized that it was not practical to follow this approach. After some discussions, we finally settled on this repository design change to incorporate new function-oriented functionality. We named this new sub-module func.

Once I had basic functions to handle database transactions in place, I started working on the importer architecture. It was decided to create multiple instances of the producer process each with the ability to run asynchronous operations on it’s own. Similarly, we should be able to fork multiple consumer process, each capable of fetching data from queue and sending it off to the database.

To address this problem, I started working on a module which given a function would make it possible to run multiple processes running multiple instances of the given function. It should be such that we can generate the arguments dynamically for each process and along with some set-up and tear-down actions before and after we fork the process.

To get a better grasp of underlying functionality, one can read the final API and documentation. It’s a generic module which can be used for any functionality. The diagram for it’s execution flow can be found below:

AsyncCluster module execution flow

Fig 2. AsyncCluster module execution flow. For more details, see the complete documentation.

Second phase

While developing producers, I first designed the generic producer object structure for all entity types – an object skeleton which all producers need to create from the read records to be pushed into the queue. This object structure was to be enforced across all data sources, and this helps the consumers to expect an object of fixed nature on which they can later run automated validation tests prior to adding to the database.

As the data dumps were of considerable size, I used data streams to read the data from the flat files and parsed them to create generic entity objects which used BookBrainz data storage format. After parsing each record, I pushed the records into the queue.

Parsing required thorough analysis of the data dumps, and manually mapping each key-value pair in the data dump to the generic object structure. All the data which did not fit the present BB schema (and hence had to be excluded from the generic producer object structure) was added to a metadata field associated with the import. This metadata field is stored as a bjson in the database, so that we can individually query and index any of the fields in the metadata later.

While developing the consumers, I initially set up a validation module. Much of it was adapted from the existing validators on BookBrainz site, which I was able to use without much alteration thanks to the generic producer object structure. The validation modules in the bookbrainz-site have been written to quickly validate the form submitted by the editor post creation/editing of an entity. To use them in the import process, I wrote a converter which transforms the generic producer object to form sections understood by the validation module. Apart from this, I added better error handling to ensure all errors are caught and reported in case a something goes wrong.

Error handling was another important aspect of the import process, apart from the validation process. Being a command line application, tracking errors was central to ensuring that all components were running as intended. At the same time we had to ensure that no record which could be potentially imported into the database was missed. To address these problems, I decided to discard the record if it fails the data integrity validation tests (which means the data is most probably corrupt). But in the case of any transaction error, we give a fixed number of chances to the record before discarding it (by acknowledging the message). A future goal for this process could be to push the erred record into another queue for analysis and replaying of those messages from this queue back to the original queue when the problem gets sorted.

Once the importer was in place, I focused on building up the func.imports module with more functions for the import entities – like discard and approve. I also added functions to fetch recent imports, and a lot other helper code for the imports. I also ensured that all errors occur loudly and never silently slip away. With the help of my mentor, I also migrated most of the functions required for data transactions on the bookbrainz-site. This was crucial to my project – as in many instances the existing functions could not be used due to them initiating their own database transaction for each action. I split all these actions into functions, and bound them with the transaction object they received rather than initiating their own transaction. I also ensured we use modern ES6 features – which made the adapted code much more sleek and compact. It was a long process, as I had to read almost entire of existing code for data transactions on the bookbrainz-site and adapt each of them correctly. All the code finally came together in the create-entity module – which can now be used for entity creation as well as upgrading the imports to entities.

Third Phase

The work on bookbrainz-site and bookbrainz-data mostly happened side by side. First I added a recent imports page – which would fetch most recent imports from the database and display them inside the React component. The recent imports is designed as a single page application which dynamically loads the paginated records and renders them on-screen. The working of the recent page application is as follows:

Recent imports execution flow

Fig. 3: Recent imports execution flow

Next, I added import-entity display pages for all five entity types. They were supposed to display the entity attributes along with links to approve/discard/edit and approve functionality. Approving the import-entity was done so that the user gets redirected to the newly created entity. The import-entity display page for work is as follows:

Work Import Entity Page

Fig. 4: Work Import Entity Page. Similarly, pages were added for Creator, Entity, Publication and Publisher.

In case of a discarded import, I added an extra page similar to existing confirm deletion page – which asks the user to confirm the action and then waits until the entity is deleted before redirecting the user to the home page. The discard page looks as follows:

Discard Import Entity Page

Fig. 5: Discard Import Entity Page

Next, I implemented the editing imports prior to approval. For this, I wrote two modules – one to transform the import to the structure used by the editing form and one to convert form data to an entity. When a user wishes to edit an import, the import is transformed to the form and rendered on the screen. The user can then edit the import. When the user clicks submit, we transform the form data to a new entity type and use the create-entity function to create a new entity in the BookBrainz database. The user is then redirected to the newly created entity page. The code for rendering the form and editing the entity was completely reused for imports with minimal changes. I then added functionality to add imports to the ElasticSearch index, and display them in present results. The final search page is as follows:

Search showing Import Entities

Fig. 6: Search showing Import Entities

Links to the work done

  1. BookBrainz SQL
  2. BookBrainz Import
  3. BookBrainz Data
  4. BookBrainz Site

Conclusion

Last three months have been a fantastic experience for me. Not only did I get to learn a lot of new technologies and write some exciting software, but also I got to brush up my existing skills and interact with the completely awesome MetaBrainz community. Such an opportunity comes truly once in a lifetime, and I extend my sincerest gratitude to Google for running such a great and extremely inclusive programme which allows students from all over the world to avail such an opportunity. Special thanks to my mentor Ben Ockmore for always being patient and helping me out whenever I felt stuck.

Thank you MetaBrainz community for your continuous guidance and support!

!m Google and MetaBrainz

BookBrainz GSoC Gamification/Achievement System

Hi guys, I’m Max (AKA QuoraUK), a university student working with BookBrainz as part of Google Summer of Code. My project this summer has been to build a new gamification system, that introduces rewards for BookBrainz users and recognises their achievements. Here I will explain the system and the features I’ve implemented.

Overview

My original specification for the gamification system is here. To summarise, the idea behind gamification is to add game-like elements to the site in order to make it more engaging for users. The plan for the gamification of BookBrainz was:

  • Add badges and titles for users to earn on the BookBrainz site
  • Allow users to display badges and titles on their profile page
  • Encourage regular and high quality content

To implement this plan we have added 12 achievement tracks – once an achievement track is completed a title is unlocked. The artwork for the badges is currently “programmer art” and we are very open to other people designing replacements for them. This could be a part of this year’s Google Code-In. The achievements that will be available on launch are:

revisioncreator
Revisionist: Perform (1, 50, 250) Revision(s); Creator Creator: Create (1, 10, 100) Creator(s)

limitedpublisher
Limited Edition: Create (1, 10, 100) Edition(s); Publisher: Create (1, 10, 100) Publication(s)

pubcreatworker
Publisher Creator: Create (1, 10, 100) Publisher(s); Worker Bee: Create (1, 10, 100) Work(s)

runnerexplorer
Sprinter: Create 10 revisions in an hour; Fun Runner: Create a revision a day for a week; Marathoner: Create a revision a day for 30 days; Explorer: View (10, 100, 1000) Entities

timetrack
Time Traveller: Create an edition before it is released; Hot Off the Press: Create an edition within a week of release

All of these are unit tested and have unique badges for each tier on the track. If you would have already unlocked these achievements before the system was launched, you will earn them with your next revision/creation. Badge templates are available for developers to introduce new badges and adding achievements can be as simple as making a badge and adding a few lines of code.

Profile Page

profilednd
Profile Page, Drag and drop badge selector

The gamification system also brings some changes to the profile page. There is now a badge box which can contain your three favorite badges. Additionally, your selected title is shown next to your username. You can select your favorite badges in the new achievements menu on the profile, then drag and drop your favorites into the boxes. Titles can be selected by going to Edit Profile, and selecting them from the drop down menu.

Other Areas

2016-08-20_16-12-21
Achievement Alert

On creation of an entity or revision you will now see an alert if an achievement is unlocked. This will prompt you to go to your profile page and set the ones you want to display. Usernames in other areas of the site can be hovered over in order to see the title they have set.

Demonstration

Here is a demonstration video I’ve made for the system:


Continue reading “BookBrainz GSoC Gamification/Achievement System”