Libdiscid 0.5.1 released

A new libdiscid version was uploaded today.

Changes:

  • LIB-40: discid_get_webservice_url() (web service version 1) is deprecated.
    Please use libmusicbrainz to gather metadata by disc ID
  • LIB-7: Rewrote data track handling, releases with multiple data tracks. This also fixes LIB-18 (no ID for DVDs) and LIB-9 (PS/PS2 CDs)
  • LIB-44: fix invalid disc IDs on first read of multi-session discs
  • LIB-37: Autotools optimization (non-recursive build etc.)
  • LIB-42: remove Windows 9x platform code
  • renamed openbsd platform code to netbsd, still used by both.

The data track/multi session disc handling was rewritten. This was started by Lukáš Lalinský quite some time ago and I finished this now. This should fix several issues with data tracks and data track handling works the same on all platforms now.

Christophe Fergeau contributed some changes to remove lots of clutter from the autotools build. This stops changing directories during the build all the time (non-recursive build) and should make it easier to see actual problems when building.

Additional contributions come from Philipp Wolfer and Sebastian Ramacher again.

Feedback needed:

We are (still) a bit stuck in the discussion for LIB-28 which also blocks several connected things in python-discid and possibly Picard.
The main question is how to name the “default device”, especially on Mac OS X. Usually real/internal device names are used. However, especially on Mac OS X these device names change quite frequently for various reasons, even when the same physical disc drive is “meant”. So I don’t know if using these makes any sense. It also might not be good to add these device names to a configuration file, since they will “break” easily (opening a .dmg before inserting the disc already breaks it).
I proposed using “1” to define “use the first disc drive”. I am no actual Mac user and really would like to know how you guys feel about this.
For Windows the drive letters are much more stable, but can change when USB disc drives are used. Using “1” is an option, but might make less sense, since drive letters are actually known to the normal user.
On Linux/BSD/Solaris the disc drive names are only used for disc drives (not for hard disks) and are quite stable.
This relates to python-discid#30 (should DEFAULT_DEVICE be a constant?) which again somewhat blocks PICARD-503 (using python-discid). This question is the only thing keeping python-discid from a stable API release.

LIB-28 is probably the best place for feedback, but you can also just answer here or in the announcement mail.

Information, documentation and other links are at:
http://musicbrainz.org/doc/libdiscid
That includes builds for Windows and Mac OS X.

Summer of Code: We're in for another round!

I’ve not had a chance to blog about our participation in Google’s Summer of Code program this year, so it is time to fix this now. As you might guess, we’ve been accepted into the program again and were given 3 slots. We awarded the slots to:

  • Rearchitect/Improve the Release Editor by Michael Wiencek (bitmap): This proposal aims to re-work the guts of our Release Editor and to change the architecture to use one page and not a series of pages. This project is potentially massive, so the goal is to work on the guts of the editor while not making many (if any!) changes to the UI. But, bitmap is a veteran GSoC student and long time Picard contributor, so we’re excited to have him back!
  • MBS-6200: Add a “place” entity by Nicolás Tamargo (reotab): Our very own Reosarevok joins the GSoC ranks to implement the Places support. In our previous schema change release we added support for Areas and Reotab aims to finish this project by implementing Places. For more discussion and background on Areas and Places, please see this ticket in jira.
  • Repository for music reviews by Maciej Czerwiński (mjjc): The goal of this project is to create a site that allows anyone to write a non-neutral point of view review of an artist, a release or a recording. All of the reviews in this site will be licensed under a Creative Commons license to be compatible with MusicBrainz and its data.

I’m really excited by all of these projects and the people who are contributing. Summer of Code started yesterday, so we’ll see very soon what our three students will accomplish.

Search server update: June 13

On 13th June we updated the search servers once more. Thanks for fixing bugs and adding Area support, Paul!

Release Notes – MusicBrainz Search Server – Version 2013-06-13

Bug

  • [SEARCH-297] – Webservice Json output for aliases when searching is inconsistent with output when doing a lookup
  • [SEARCH-302] – search server json output use singular for a list of release-groups.

Improvement

  • [SEARCH-292] – Include area info in the indexed search artist and label results
  • [SEARCH-299] – Ouput TrackIds

New Feature

  • [SEARCH-301] – Search for Area by ISO 3166 code

Task

  • [SEARCH-273] – Support for multiple country/release events on release as as part of schema changes
  • [SEARCH-286] – Add areas to the indexed search

Upcoming feature: contested edit extension

The next release (a week from Monday) will include a useful new feature: extending the expiration of edits that receive ‘No’ votes! I’d like to take a bit to explain how it’ll work.

The problem

Especially since the amount of time edits stay open was reduced to 7 days, but also before, several problematic situations could arise when edits were contested:

  • If voters cast ‘No’ votes shortly before the expiration of the edit, the original editor may not have time to respond to the concerns before the edit closes. As a result, it’s generally been considered bad etiquette to cast ‘No’ votes right before an edit expires unless the edit is particularly destructive.
  • In a somewhat related case, sometimes an edit can get many ‘No’ votes in short succession. Since 3 unanimous ‘No’ votes will close an edit, the period between the first vote cast and the edit being closed can be as short as an hour, which is certainly not enough time for the original editor or other voters to respond.
  • It’s also occasionally possible for edits to be put at risk of failing without an email being sent. Specifically, the current code only sends an email on the very first ‘No’ vote. Therefore, if a voter votes ‘No’ early in the voting period and later changes their vote, a second voter later voting ‘No’ would not result in an email being sent. However, a tied vote or a majority of ‘No’ votes will result in an edit being closed, so even a lone vote can tip the balance.

The solution

In light of all of these problems, the next release will work differently to give editors time to respond to votes against their edits.

In short: editors will always have at least 72 hours (three days) to respond after the first vote against their edits.

More specifically, and more technically:

To address the third point above, the emails for ‘No’ votes will now be sent whenever the count of ‘No’ votes goes from 0 to 1. That is: if two people vote ‘No’ with neither changing their vote in-between, only one email will be sent. But, in a case like the one described above, where an early ‘No’ vote is superseded and the total count goes back to 0, a subsequent ‘No’ vote will send a new email.

To address the second point above, ModBot will not reject an edit before its expiration time due to three unanimous ‘No’ votes unless 72 hours have passed since the earliest ‘No’ vote (that is, the vote which resulted in an email being sent). If the expiration time passes or an edit has three unanimous ‘No’ votes after 72 hours, the edit will be closed as usual.

Finally, to address the first point above, when new ‘No’ votes are cast close to an edit’s expiration time, the edit’s expiration time will be extended to allow 72 hours for response. This extension will, once again, only happen when the total count of ‘No’ votes goes from 0 to 1 – so only when an edit becomes contested and previously was not.

In total, these changes should hopefully ensure that editors are better informed about edits that are in danger of being voted down, and given sufficient time to respond to voter concerns.

In summary

First of all, this change will be fully live on Monday, June 24th. Before then, votes cast on the beta server may result in a small number of edits having their expiration times extended, but it won’t happen on the main server or for the majority of edits.

While editing: Rest assured you’ll be informed and given time to fix problems with your edits!

While voting: Don’t worry too much about casting ‘No’ votes when edits need improvement. Certainly be ready to supersede your votes if things do get fixed up, but if you find an edit in need of fixing just before it closes, or which already has a bunch of recent ‘No’ votes, don’t hold back or vote differently to give the original editor time to respond. This should take care of that for you!

Happy editing!

Server Update, 2013-06-10

Another week, another release! This release is mostly bug fixes, but I’m sure a lot of people will be happy to notice that we have finally resolved MBS-357. That’s right – we no longer have a single clear text password in the MusicBrainz database. We’re sorry that it took so long, but at least it’s finally been done! Other than that, this release mostly builds on top of the new schema change features, and provides bug fixes for things that have regressed.

Many thanks to Michael Wiencek, Nicolás Tamargo and the MusicBrainz team for their work in this release. Here’s what’s changed:

Bug

  • [MBS-6007] – JSON release webservice doesn’t include works
  • [MBS-6179] – Sorting a collection by catalog number sorts ‘8BP117’, ‘8BP126’ and ‘8BP127’ in the wrong order
  • [MBS-6260] – Dates and countries are displayed inconsistently
  • [MBS-6263] – Adding an area without ISO 3166 codes shows blank values in edits
  • [MBS-6264] – Languages/Scripts statistics have an empty "Last updated field"
  • [MBS-6297] – Can’t create relationships from areas to other entities from the area page, but only from the other entities’ page
  • [MBS-6325] – Moving a release with a release date into an empty release group does not provide it with a date (sorting doesn’t work)
  • [MBS-6326] – ModBot is unable to close some edits as they still use ‘country_id’
  • [MBS-6334] – Release editor: loses join phrases and artist credits when switching to tracklist tab
  • [MBS-6355] – 502 displaying relationships for prolific artist
  • [MBS-6357] – Track/medium tables have indexes/constraints with bad names
  • [MBS-6370] – Beta: Inline search can load duplicate results
  • [MBS-6377] – Track ACs appear changed but the edit isn’t submitted
  • [MBS-6385] – Strange results when pasting an area URL in the inline search
  • [MBS-6409] – ModBot leaves translated edit notes
  • [MBS-6441] – Release (date, country) are blank when displaying a disc ID
  • [MBS-6443] – Ascending and descending sorting of releases in a collection results in the same order

Improvement

  • [MBS-357] – Don’t store passwords in clear text
  • [MBS-4962] – Slave servers should cache heavily due to hourly update process
  • [MBS-6203] – Only allow recognised URLs for relationship types which are for specific sites
  • [MBS-6233] – Ensure SecondHandSongs links are added at the right level
  • [MBS-6300] – Link to countries in country statistics
  • [MBS-6301] – Consider linking to areas {artist,release,label} list in country statistics
  • [MBS-6347] – Show parents somehow for underlying areas
  • [MBS-6358] – Add basic area stats display
  • [MBS-6362] – Improve how release data is displayed in Google searches
  • [MBS-6371] – Extend Wikipedia and Wikidata autoselect to areas

New Feature

  • [MBS-6269] – Expose track identifiers in the webservice
  • [MBS-6270] – Expose track identifiers on the website

The Git tag for this release is v-2013-06-10.

New Guidelines for Recordings

We got this from Ben, who has been in charge of taking this discussion forward:

Since the beginning of the year there has been a lot of discussion about recordings and what exactly they should be used for. After several meetings on IRC and a couple of huge topics on the style mailing list, we’re finally ready to bring in a new definition for recordings, and new style guidelines to go with it!

The new recording definition can be viewed at http://musicbrainz.org/doc/Recording

And the accompanying style guidelines are at http://musicbrainz.org/doc/Style/Recording

The new guideline brings significant changes to the way recordings should be used, so all editors dealing with recordings should take the time to read it.

As a short summary, recordings are now never produced solely through copying or mastering. This means that recordings shouldn’t distinguish between different masters of some audio – in general, a recording will correspond a particular mix or edit. In addition:

– AcoustIDs and ISRCs have been removed from the guideline – they are mostly irrelevant for managing recordings under the new definition.
– Guidelines for audio channels have been introduced.
– Existing guidelines have been expanded.
– Several in-depth examples have been added to explain how recording should be used.

Also, as a result of these changes, the recording-recording remaster relationship type and the artist-recording master relationship type have been deprecated.

Thanks to everyone who was involved on mb-style and in the IRC meetings for your excellent ideas and contributions!

Server update, 2013-05-28

Now that the fires around the last schema upgrade seem to be dying down to a much more manageable level, we’ve just put out the next version of MusicBrainz server. As you might already expect, this release is mostly a bug fix release. Thanks to Lukáš Lalinský, Michael Wiencek, Nicolás Tamargo and the rest of the MusicBrainz team for their work on this release. Here’s what’s changed:

Bug

  • [MBS-3059] – Using "Submit votes & edit notes" added edit note to all edits
  • [MBS-6158] – Reset password form claims the user/email address does not exist
  • [MBS-6172] – Lost password claims to work when user has no email
  • [MBS-6209] – Direct search for work doesn’t check aliases
  • [MBS-6230] – Internal server error removing a relationship in the relationship editor
  • [MBS-6271] – Internal server error trying to set track lengths
  • [MBS-6274] – Capitalisation of "release events" in sidebar and edits is inconsistent
  • [MBS-6299] – "License" header in release sidebars is displayed even if the following list is empty
  • [MBS-6302] – Country column empty in search results page
  • [MBS-6304] – Table on Attach CD TOC page broken (too many columns)
  • [MBS-6305] – Regression: Release dates are no longer optional
  • [MBS-6318] – Search hint alias for areas require a sortname when adding
  • [MBS-6319] – Internal server error editing an area alias
  • [MBS-6329] – REGRESSION. no more dates in collection pages
  • [MBS-6335] – Regression: Tracklists not shown when merging releases
  • [MBS-6341] – Area alias aren’t found by direct search unless they match an area name

Improvement

  • [MBS-3083] – Do not use placeholder text for artist credit names
  • [MBS-5603] – Edit relationships page should show disambiguation comments
  • [MBS-6198] – Deleted Editor Stats
  • [MBS-6213] – Permit ISRCs starting with TC
  • [MBS-6219] – Stop using two edit types and two places in the UI for the "change release group" functionality
  • [MBS-6222] – Update FeaturingRecordings report
  • [MBS-6223] – Display language and script in the right order on release edits
  • [MBS-6294] – Release events are weirdly indented on release pages
  • [MBS-6340] – Redirect to direct search for indexed area searches as long as the latter are not actually available
  • [MBS-6348] – Display the type of area on inline search

New Feature

  • [MBS-6018] – Add /oauth2/tokeninfo and /oauth2/userinfo handlers for 3rd party login

Task

  • [MBS-6229] – Add autoselect, sidebar links and pretty URLs for Wikidata

The Git tag for this release is v-2013-05-28.

Schema 17/18 upgrade instructions

We’ve just completed our extra schema upgrade. The full instructions for upgrade follow:

Schema 16 to schema 17 upgrade

If you already ran the migration that was announced May 15th, or if you imported a data dump from May 15th or later, skip to the next section.

  1. Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 16.
  2. Take down the web server running MusicBrainz, if you’re running a web server.
  3. Turn off cron jobs if you are automatically updating the database via cron jobs.
  4. Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
  5. Switch to the new code with git fetch origin followed by git checkout schema-16-to-17
  6. Run carton install --deployment to install any new perl modules.
  7. Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
  8. Set DB_SCHEMA_SEQUENCE to 17 in lib/DBDefs.pm
  9. Turn cron jobs back on, if needed.
  10. Restart the MusicBrainz web server, if needed.

Schema 17 to schema 18 upgrade

  1. Run replication with carton exec -Ilib -- ./admin/replication/LoadReplicationChanges until it cannot apply any packets in schema 17.
  2. Take down the web server running MusicBrainz, if you’re running a web server.
  3. Turn off cron jobs if you are automatically updating the database via cron jobs.
  4. Make sure your REPLICATION_TYPE setting is RT_SLAVE in lib/DBDefs.pm
  5. Switch to the new code with git fetch origin followed by git checkout v-2013-05-24
  6. Run carton exec -Ilib -- ./upgrade.sh from the top of the source directory.
  7. Set DB_SCHEMA_SEQUENCE to 18 in lib/DBDefs.pm
  8. Turn cron jobs back on, if needed.
  9. Restart the MusicBrainz web server, if needed. EDIT: also restart memcached here, see http://tickets.musicbrainz.org/browse/MBS-6376

Note that the tags to check out for the two migrations are different.

Changes

For the list of changes in schema 17, see the former blog post. The changes for schema 18 are:

  1. Fix the track table corruption that the schema 16-17 upgrade created, by importing a copy of the ‘track’ table from the production database.
  2. Fix some indexes and constraints that should not be on slaves or which had bad names starting with ‘medium2013’ or ‘track2013’
  3. Create a missing index on medium.release that dramatically improves performance.
  4. Fix the ref_count column of the artist_credit table, which was not updated properly at the schema 16-17 upgrade.

Urgent schema update required

On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!

This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.

We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.

What went wrong

Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.

How are we fixing it

In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.

Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!

What problems may have arisen

  1. In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
  2. Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.

FAQ about this update

Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.

Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “

SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
     medium.track_count  count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id)  medium.track_count;

Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:

SELECT distinct m.id FROM
  (SELECT DISTINCT medium.* FROM
    ( SELECT track.medium, min(track.position) AS first_track, max(track.position)
      AS last_track, count(track.position) AS track_count, sum(track.position)
      AS track_pos_acc
      FROM track
      GROUP BY track.medium) s
    JOIN medium ON medium.id = s.medium
    WHERE first_track != 1 OR last_track != s.track_count OR
        (s.track_count * (1 + s.track_count)) / 2  track_pos_acc
    ) m

(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)

For more safety, don’t trust anything in the track table that’s been updated since the schema change:

SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’

If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.

Again, we’re sorry for the trouble this update may have caused you!