First, where is “here”?
The current MB-area landscape looks pretty bleak. The data is incomplete, and adding new data is a hassle.
To add an area, you need to:
- Create an account on tickets.musicbrainz.org.
- Make a ticket to request that the new area is added.
- Wait for an area editor to do the rest, and judging by the backlog that might happen sometime between “in a long time” and “never”.
Where did area_bot go? Why are there so few area editors? Why isn’t somebody trying to improve the situation? In short, how did we wind up here? To understand that, we need to look at where we’ve been.
Where did we start out?
By design, areas were meant to be added by area_bot, pulling data from Wikidata. The workflow would look something like this:
- If area_bot made a mistake, there would be a handful of editors who could correct it by editing areas manually.
- If the bot missed an area in Wikidata, you could either:
- (if it didn’t already have a valid “type) improve the Wikidata entry, or
- (if it did have a valid “type”) ask nikki to tweak area_bot, so that it would recognize more types.
And that worked. Sort of. For a while.
How did we get so far off course?
At some point, things started to go wrong. While I didn’t see it firsthand, what I’ve been told is this: rather than ask nikki to add more area types to area_bot’s white-list, some editors started adding incorrect area types on Wikidata, types which area_bot already recognized. So, the area would be added to MusicBrainz, but at the expense of Wikidata.
At this point, communication broke down. Area_bot was taken offline (to discourage low-quality Wikidata edits), but very little was done to explain the situation to users. This lack of communication became a larger problem than areas themselves, because it kept us from fixing the problem.
So what’s the plan?
Broadly, the first steps are:
- Improve overall communication within the project, as is being discussed in Rob’s recent blog posts.
- Make a long-term plan for areas and how they should be edited
- Possibly open up area editing to more people, based on what’s decided in step #2.
My next post, Area editing, part II, will go into more detail about step #2.
Reading this post makes me think that Wikidata shouldn’t be the primary source of geographic data at all. Rather, a Geo data service should. I was looking at Open Street Map.org and saw the whole database is downloadable (44GB!). Why not use that as a starting point for municipalities, then look and correlate it on Wikidata after OSM?
I was thinking it over, and the way to do area requests should be overhauled as well. Why not create a submission process for areas? It could work something like CD stubs. Editors will create a new area entity, populate it with references, then submit it for official consideration.
Open street map would make a good choice for this type of thing.
It does contain the country, state, city, suburb boundaries that would make up an area.
You could change the bot to check both wikidata and openstreetmap and only add it if the two agree as a quick check for validity.
Looking around some more, GeoNames.org seems like a better source for municipality names and related database entries.
Any comment on historical areas? I know there are some already (East Germany, Soviet Union; for examples).
Some discussion here: http://forums.musicbrainz.org/viewtopic.php?id=5487
@bflaminio I’d check to see what GeoNames, or whatever geo-info source is decided upon, how that handles historical places.
Looking around a bit more, I think GeoNames would be a good primary source to build the Area entities. I think that MusicBrainz’s primary responsibility is to maintaining a music database. Areas are the only entity who’s population and maintenance should be (mostly) hands off to the average user, should be almost entirely automated and sourced from an expert geographic information database. Let that DB handle creating, editing, moving, merging & administering the information, MB should just use it like any client should.
I think the Area process should work something like this:
1. Relate all existing Areas to GeoNames via a Geographic DB relationship.
2. Import all unrelated GeoName areas to Areas and relate the new entities with a Geographic DB relationship.
For users manually adding areas, how about this:
General users can’t actually make areas, they must import them from a GeoDB service.
So let’s say I want to add “Taktser”, the birthplace of the 14th Dalai Lama. I go to GeoNames.org and see it doesn’t exist. From there I create it from the Wikidata/Wikipedia reference. I can then place the url into the import source for an area in MB, import it, then use use it in the birthplace field for the 14th Dalai Lama.
@bflaminio
After more poking around GeoNames, they do support historic locations. So if we automate an importing process from them, then there will be plenty of historic entities.
I’ve made several new tickets related to this.
http://tickets.musicbrainz.org/browse/STYLE-551
http://tickets.musicbrainz.org/browse/STYLE-552
http://tickets.musicbrainz.org/browse/STYLE-553
Good job CyberSkull, I definitely also think that this is something where MB should stand on the shoulders of giants. eg Giant existing and maintained geolocation databases 🙂
Oh, yes, and places should remain curated by users. But I think the import process should be available to areas and places (just remember to import the correct type!).
TL;DR
Areas should be (mostly) automatically imported & updated.
Places should be (mostly) manually curated by users.
It is perhaps time to make progress on this, see https://musicbrainz.org/edit/43111786