Data quality: We want your feedback!

The release locking concept has been around for quite some time and has been debated at great lengths. After a couple of long calls Don and I reworked the concept into the data quality concept. The idea is relatively simple:

  • Each artist and each release in MusicBrainz now has a quality indicator that shows the quality of the data: Unknown, low, normal or high.
  • Data that is marked with low quality should be easy to change.
  • Data with normal quality should take about the same amount of effort to change as it takes now.
  • Data with high quality should be harder to change, to avoid incorrect changes to good data.
  • Each edit type will define the number of votes required to pass, the duration votes stay open, what to do if an edit receives no votes, and if a vote is an auto edit.

This new feature will allow us to edit sloppy data faster, tune the editing system to fit better with how people use it and it will allow us to prevent accidental edits to good data. Now we need to your help in testing this system and giving us feedback about the various edit levels.

If you would like to help, please read the data quality wiki page, view the new edit information page and then test the new features on the staging server. Each artist and release page now has a Change Quality link that will allow you to change the quality of the artist/release. Once those are changed, the edit system will behave according to the values set forth in the edit information page. Please note that the change artist/release quality edits are currently autoedits, which will be changed once we’re done testing the bulk of this new system. For right now we’re making it easy to change the data quality for testing purposes.

To start testing, head over to the staging server. Add any bugs you find to the bug tracker. Or post your feedback in the comments.

6 thoughts on “Data quality: We want your feedback!”

  1. One minor comment I forgot about yesterday and without any testing. The level name ‘normal’ seems out of place. If we are going to stick to ‘low’ and ‘high’ we’d better change ‘normal’ to ‘medium’. While I think I understand where the name ‘normal’ came from, it has about the same limits as it is now without the data quality system, I think it fails to indicate it’s relation to ‘low’ and ‘high’.
    I also think that a level ‘unknown’ isn’t needed. Three is enough if we apply e.g. ‘low’ to everything that’s already present when we start using the system and everything new that gets added afterwards.

  2. I think the ‘unknown’ level should be removed, existing data moved to ‘normal’, and new data added to ‘low’.

  3. I think the term “normal” is fine. Low is lower than normal and high is higher than normal, it’s not really that confusing.

    As for unknown. With my understanding of the way the system will work (very limited) I can see the need for an unknown setting for new entries that may be 100% immaculate or full of errors. Until somebody sets the quality (through voting I’d assume) it might be best to have it unknown so something that’s perfect doesn’t have a low setting making it easier for somebody to screw up.

    Then if somebody comes along and notices is was a freedb import full of errors they set to low. If it’s well backed up with proof set to normal or high.

  4. As there is no auto-bot for applying pending edits on the staging server, it’s difficult to test.

    Also for the same reason, we should maybe automatically set editors with more than 1500 or 2000 approved edits as auto-editor on the staging server.
    It’s easier to test when your edits are applied immediately – it was an issue I’ve had while testing labels features.

    Another question: is it possible to test webservice (e.g. for labels) on the staging server?

  5. Suggestion: Instead of having Low/Medium/High quality indicators, why not allow editors to give a release/artist a “Thumbs Up/Down” or “+1/-1” quality rating. Low/Medium/High could be determined by the quality rating. This would also allow us to rank things by quality and possibly generate lists of albums that “need attention” or “have poor quality”.

    For example, a new classical release is being added. Voters notice it needs a lot of work and in addition to their Abstain vote they also give it a “-1” quality rating. As more editors stumble upon this release, we then see that the release has a overall quality rating of “-7”. We could query the quality ratings to find this release and others that need some investigation/correction/validation.

    I think this would be a great way for editors/voters to indicate that they aren’t sure of the details and think that a piece of information needs some attention from other editors.

    -Aaron (cooperaa)

  6. murdos – although there is no auto-bot for applying pending edits, any user can manually apply edits on the staging server (click on a link to the edit, and scroll to the bottom).

    your suggestion isn’t a bad one, though.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.