Automating the voting system

MetaMetaData

For the last several years, one of the things our community has struggled with is a lack of active voters. We’ve tried to implement various measures to decrease the need for voters and load for the wonderful ones that actually do actively look through edits and help vote on them—e.g., making more edits auto‐edits and decreasing amount of time edits stay open. However, the edit queue is still quite unwieldy and as such we’ve kept trying to come up with other ways to decrease the load on our contributors.

Over the past few months since our last summit, we’ve been working on training AIs, both for recommendation engines and data analytics, and for helping out with spam, but it soon appeared that we had another valuable dataset: our history of 15,693,824 votes from 16,336 voters and 56,374,198 edits from 2,007,134 editors. It turns out that this is an unintended side-effects of the editing and voting system in that it creates a paper trail of our habits as a community and our collective mind.

A paper trail that you could, say, train a neural network on. And that’s just what we did.

By feeding data from our top voters, we’ve been able to train our network to replicate with 96.4% accuracy the personality when using the other half as test data. That figure is the average for 300 bots each based on our top 300 voters.
We were really impressed with the results but the story doesn’t stop there…

Meet BrainzVoter

The next logical step was to create our own Frankenstein’s monster. By training on 70% of our entire set of votes, we gave birth to a voting bot that represents the essence of our community. “BrainzVoter”, as we dubbed it, is precise and scores a staggering 98.9% accuracy on test data and comparing with the other 30% of our dataset.

To quote the late Terry Pratchet:

Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.

Edit filters

In view of the recent developments on net neutrality taken by the European Union with articles 11 & 13/17, MusicBrainz is taking measures to protect against copyright infringement: we’re implementing automatic edit filters. BrainzVoter will use the latest in NLP technology to understand what you, the editors, write in your edit notes, and use this understanding to vote on your edit. It will also inspect any URLs included in the edit note to cross-reference the data. The aggregate data will not be available to the public.

Edits with better and clearer notes will become more likely to pass. Consider this a good opportunity to (re‐)read How to Write Edit Notes!

How will this affect me as an editor?

Not much will change, and you can continue doing what you were doing before! We recommend that you take the time to make clear statements in your edit notes.
You will also be able to use a system of tags to express intent, using for example #typo #correction in the content of your edit text. Syntax highlighting and shortcuts will be available in the text editor.

In the end, by removing the need for humans to look over edits, the bot should give you, the editor, more time to add and edit and fix data in MusicBrainz, without having to spend time checking everyone else’s edits or worry about other editors disagreeing with yours!

After a brief trial period on MusicBrainz, this system will be adapted and also rolled out to BookBrainz.

We hope you will share our excitement for the benefits of automation and help us improve our training models over time. I, for one, welcome our AI overlords.

Area editing, part I: How did we wind up here?

First, where is “here”?

The current MB-area landscape looks pretty bleak. The data is incomplete, and adding new data is a hassle.

To add an area, you need to:

  1. Create an account on tickets.musicbrainz.org.
  2. Make a ticket to request that the new area is added.
  3. Wait for an area editor to do the rest, and judging by the backlog that might happen sometime between “in a long time” and “never”.

Where did area_bot go? Why are there so few area editors? Why isn’t somebody trying to improve the situation? In short, how did we wind up here? To understand that, we need to look at where we’ve been.

Where did we start out?

By design, areas were meant to be added by area_bot, pulling data from Wikidata. The workflow would look something like this:

  • If area_bot made a mistake, there would be a handful of editors who could correct it by editing areas manually.
  • If the bot missed an area in Wikidata, you could either:
    • (if it didn’t already have a valid “type) improve the Wikidata entry, or
    • (if it did have a valid “type”) ask nikki to tweak area_bot, so that it would recognize more types.

And that worked. Sort of. For a while.

How did we get so far off course?

At some point, things started to go wrong. While I didn’t see it firsthand, what I’ve been told is this: rather than ask nikki to add more area types to area_bot’s white-list, some editors started adding incorrect area types on Wikidata, types which area_bot already recognized. So, the area would be added to MusicBrainz, but at the expense of Wikidata.

At this point, communication broke down. Area_bot was taken offline (to discourage low-quality Wikidata edits), but very little was done to explain the situation to users. This lack of communication became a larger problem than areas themselves, because it kept us from fixing the problem.

So what’s the plan?

Broadly, the first steps are:

  1. Improve overall communication within the project, as is being discussed in Rob’s recent blog posts.
  2. Make a long-term plan for areas and how they should be edited
  3. Possibly open up area editing to more people, based on what’s decided in step #2.

My next post, Area editing, part II, will go into more detail about step #2.

Robustness principle applied to communities

The great internet pioneer Jon Postel once wrote the following in an early draft of the TCP specification:

Be conservative in what you do, be liberal in what you accept from others

He wrote this in the context of computer networking and this philosophy arguably helped the Internet become robust to faults. Personally, I think this is great wisdom even in a larger scope — it can be applied to many other contexts in life. Today, I would like to apply this wisdom to our community:

If members of the MusicBrainz community could work hard to craft their edits so that they adhere to the guidelines as much as possible and to add supporting links to their edits, that would fit the bill of “being conservative in what you do”. Then, when you consider other edits, be liberal in accepting other people’s edits. If an edit makes the database better, vote yes, even if you don’t fully agree with it. If you see a small mistake and you’re an auto-editor, accept the edit and fix the small mistake. See if you can find a way to accept the edit, rather than shooting it down.

Our attitudes shouldn’t be “How can I shut this person down?”, but “How can I help this person make better edits?”. If an editor gets shut down for small mistakes, the editor is going to be discouraged from doing more edits. This harms the project overall! But, if an experienced editor politely helps a less experienced editor to improve their edits, the less experienced editor will feel more welcomed and is much more likely to continue learning and to continue making more edits.

After all, happy teams are vastly more productive than unhappy teams.

Happy editing!