BookBrainz – Page 2 – MetaBrainz Blog

MetaBrainz Summit 2022

The silliest, and thus best, group photo from the summit. Left to right: Aerozol, Monkey, Mayhem, Atj, lucifer (laptop), yvanzo, alastairp, Bitmap, Zas, akshaaatt

After a two-year break, in-person summits made their grand return in 2022! Contributors from all corners of the globe visited the Barcelona HQ to eat delicious local food, sample Monkey and alastairp’s beer, marvel at the architecture, try Mayhem’s cocktail robot, savour New Zealand and Irish chocolates, munch on delicious Indian snacks, and learn about the excellent Spanish culture of sleeping in. As well as, believe it or not, getting “work” done – recapping the last year, and planning, discussing, and getting excited about the future of MetaBrainz and its projects.

We also had some of the team join us via Stream; Freso (who also coordinated all the streaming and recording), reosarevok, lucifer, rdswift, and many others who popped in. Thank you for patiently waiting while we ranted and when we didn’t notice you had your hand up. lucifer – who wasn’t able to come in person because of bullshit Visa rejections – we will definitely see you next year!

A summary of the topics covered follows. The more intrepid historians among you can see full event details on the wiki page, read the minutes, look at the photo gallery, and watch the summit recordings on YouTube: Day 1, Day 2, Day 3

GSoC’22: CritiqueBrainz reviews for BookBrainz entities

Greetings, Everyone!

I am Ansh Goyal (ansh on IRC), an undergraduate student from the Birla Institute of Technology and Science (BITS), Pilani, India. This summer, I participated in Google Summer of Code and introduced a new feature, CritiqueBrainz reviews for BookBrainz entities.

I was mentored by Alastair Porter (alastairp on IRC) and Nicolas Pelletier (monkey on IRC) during this period. This post summarizes my contributions made for this project and my experiences throughout the journey.

GSoC 2022: Unified Form Editor for BookBrainz

Hi, I am Shubham gupta (IRC Shubh) pursuing my bachelor’s from the National Institute of Technology, Kurushetra. This year I participated in Google Summer of Code and implemented a new editor in Bookbrainz.

In this project, I was mentored by Nicolas Pelletier (IRC monkey). The purpose of this blog is to summarize my contribution made for this project and share my experiences along the way.

GSoC 2021: Series Entity for BookBrainz

Hi everyone, I am Akash Gupta, currently pursuing my undergraduate from Kalinga Institute of Industrial Technology. This summer, I participated in Google Summer of Code and developed a new feature — Series Entity— for the project BookBrainz.

I was mentored by Nicolas Pelletier (monkey on IRC) during this period. This post summarizes my contributions to the project and the experiences that I had throughout the summer.

GSoC 2020: User Collection for BookBrainz

Hi everyone, I am Prabal Singh currently studying in Indian Institute of Technology, Guwahati. This summer I participated in Google Summer of Code and developed a new feature – User Collections – for the project BookBrainz.

I was mentored by Nicolas Pelletier (Mr_Monkey on IRC) during this period. This post summarizes my contributions to the project.

Introducing the BookBrainz merging tool

Today we come with a big BookBrainz website update that allows you to merge duplicate entities!

Being able to clean up the database is an essential step towards importing public bibliographic records and catalogs from partner websites. As with MusicBrainz, you can visit an entity page on BookBrainz and click on a button to add an entity to a merge queue. You can merge multiple entities in one go easily.

After clicking the merge button you will be presented with a page that lets you review and select the correct information in case of conflicting data. The revision history of merged entities is preserved, and in the near future you’ll be able undo merges.

Your feedback is very welcome! We also have a short tutorial on how to use the new merge tool for the curious.

This latest website update also adds annotations for any information that does not fit into the existing format, some small design improvements and bug fixes.

We’ve also added the ability to search for users on the search page. This last feature will come in handy soon as we introduce collaborative User Collections; stay tuned!

State of the Brainz: 2019 MetaBrainz Summit highlights

The 2019 MetaBrainz Summit took place on 27th–29th of September 2019 in Barcelona, Spain at the MetaBrainz HQ. The Summit is a chance for MetaBrainz staff and the community to gather and plan ahead for the next year. This report is a recap of what was discussed and what lies ahead for the community.

GSoC 2019: JSON Web API for BookBrainz

The time has come to wrap up the very productive and learning summer of the last 3 months as a GSoC student with MetaBrainz.

Hello Everyone!!

I am Akhilesh Kumar, a recent graduate from the National Institute of Technology, Hamirpur, India. I have been working on BookBrainz for MetaBrainz Foundation Inc. as a participant in the Google Summer of Code ’19. It has been an amazing experience and I’ve learned a lot over the summer. I was mentored by Nicolas Pelletier (Mr_Monkey on IRC) during this period. This post summarizes my contributions to the project and the experiences that I had throughout the summer.

Continue reading “GSoC 2019: JSON Web API for BookBrainz”

Automating the voting system

MetaMetaData

For the last several years, one of the things our community has struggled with is a lack of active voters. We’ve tried to implement various measures to decrease the need for voters and load for the wonderful ones that actually do actively look through edits and help vote on them—e.g., making more edits auto‐edits and decreasing amount of time edits stay open. However, the edit queue is still quite unwieldy and as such we’ve kept trying to come up with other ways to decrease the load on our contributors.

Over the past few months since our last summit, we’ve been working on training AIs, both for recommendation engines and data analytics, and for helping out with spam, but it soon appeared that we had another valuable dataset: our history of 15,693,824 votes from 16,336 voters and 56,374,198 edits from 2,007,134 editors. It turns out that this is an unintended side-effects of the editing and voting system in that it creates a paper trail of our habits as a community and our collective mind.

A paper trail that you could, say, train a neural network on. And that’s just what we did.

By feeding data from our top voters, we’ve been able to train our network to replicate with 96.4% accuracy the personality when using the other half as test data. That figure is the average for 300 bots each based on our top 300 voters.
We were really impressed with the results but the story doesn’t stop there…

Meet BrainzVoter

The next logical step was to create our own Frankenstein’s monster. By training on 70% of our entire set of votes, we gave birth to a voting bot that represents the essence of our community. “BrainzVoter”, as we dubbed it, is precise and scores a staggering 98.9% accuracy on test data and comparing with the other 30% of our dataset.

To quote the late Terry Pratchet:

Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.

Edit filters

In view of the recent developments on net neutrality taken by the European Union with articles 11 & 13/17, MusicBrainz is taking measures to protect against copyright infringement: we’re implementing automatic edit filters. BrainzVoter will use the latest in NLP technology to understand what you, the editors, write in your edit notes, and use this understanding to vote on your edit. It will also inspect any URLs included in the edit note to cross-reference the data. The aggregate data will not be available to the public.

Edits with better and clearer notes will become more likely to pass. Consider this a good opportunity to (re‐)read How to Write Edit Notes!

How will this affect me as an editor?

Not much will change, and you can continue doing what you were doing before! We recommend that you take the time to make clear statements in your edit notes.
You will also be able to use a system of tags to express intent, using for example #typo #correction in the content of your edit text. Syntax highlighting and shortcuts will be available in the text editor.

In the end, by removing the need for humans to look over edits, the bot should give you, the editor, more time to add and edit and fix data in MusicBrainz, without having to spend time checking everyone else’s edits or worry about other editors disagreeing with yours!

After a brief trial period on MusicBrainz, this system will be adapted and also rolled out to BookBrainz.

We hope you will share our excitement for the benefits of automation and help us improve our training models over time. I, for one, welcome our AI overlords.

BookBrainz is now an official MetaBrainz project!

After many years as a community driven project and often under-staffed, the BookBrainz project has always been the red-headed step child of our projects. A few weeks ago I asked if the community felt that we should make BookBrainz an official project of the foundation and got a very positive response.

After that, we started informally seeking developers to take on this position, leading to the hire of Monkey, who will now be the lead of the BookBrainz project, taking over for Ben Ockmore. Ben will take on a contributor role to BookBrainz going forward and remain on the project! Thanks for all of your hard efforts in the past, Ben!

While Monkey comes up to speed on the codebase, we’ve been brainstorming what features he should focus on first . The short term focus on BookBrainz will be on bringing it into our hosting setup at Hetzner, which means making the codebase ready for running inside of docker with all of the MetaBrainz specific hosting quirks. Part of this project will be to remove elastic search and to utilize our new Solr based search system that we recently released for MusicBrainz.

After getting BookBrainz moved to our hosting facility that focus will be to create a minimally viable product. What exactly does this mean? One of the frequent complaints I’ve received about BookBrainz is that it is missing core functionality of a proper metadata project. Core functionality means that a user should be able to view and edit all of the metadata that is in BookBrainz and then retrieve this data from the BookBrainz API. It should include full data dumps with incremental data dumps being added a bit later.

What do you think the missing core features of BookBrainz are?

Finally, we’re in discussions with the OpenLibrary team, wondering how to best work together and not to duplicate efforts — we’ll post more about this once we’ve reached an agreement with the OpenLibrary team on how we should proceed.

Thanks!