Scheduling opened to all unscheduled tickets

We finished making some decisions on a few tickets in IRC today regarding scheduling stuff that was in the Scheduling Game. After this, we decided that it was time to open the game up to all issues that we weren’t sure when to schedule. This means that scheduling.ocharles.org.uk now has approx. 700 issues that we need help scheduling!

For more information on the scheduling game, see the introductory blog post.

It’s a lot to get through, but over time, the critical issues should gradually rise to the top, and our core developers can help solve these as quickly as possible. Happy voting, and thanks for all your help so far!

MetaBrainz Foundation Annual Report for 2010

Introduction

2010 was a big development year for us with an amazing amount of tangible progress on the Next Generation Schema (NGS). NGS would prove to be our number one task in 2010, with no server releases happening at all because of our focus on NGS.

Early in the year we increased our engineering capacity with Kuno Woudt joining us as a full time developer in February; Oliver Charles moved from a part time position into a full time position in June 2010. Their sole focus for the entire year was to finish NGS, and we got within a few months of finishing it.

We transitioned from Subversion to Git as our version control system in the process of writing NGS. From Nov 2009 to Feb 2010 the MusicBrainz Server codebase went from ~224,000 lines of code down to ~72,000 (the lowest since 2003!). By the end of 2010 we reached ~130,000 lines of code. NGS does a lot more, but with a lot less code. NGS was a much needed cleanup and overhaul of our aging codebase.

Financially we started 2010 off pretty weak, but got an early boost with a $50,000 donation from Richard Jones (one of the founders of last.fm) and later a $40,000 donation from Google. These generous donations allowed us to continue focusing on NGS — thank you Google and Richard Jones!

MetaBrainz took on The GuardianmusiXmatch, and ZeeZee as new data customers in 2010.

Google Summer of Code 2010 resulted in an exciting new addition to our product line – the MusicBrainz Android App. Developed by Jamie McDonald, the Android app allows anyone to carry the MusicBrainz Database around in their pocket wherever they go. It is a very handy app to settle music debates at parties!

Jess Hemerly conducted a study and wrote a paper on MusicBrainz as part of her Master’s program at the UC Berkeley School of Information. Among many other topics, she answered questions such as ‘Why do people contribute?’, ‘What characterizes editors’ participation?’, and ‘What is the role of metadata in music technology?’.

Profit & Loss

In 2010 the foundation took in $177,740.94 and spent $172,904.94 for a total excess income of $4,836.00.

Income

Direct Donations $90,278.89
PayPal Donations $6,867.99
Consulting $2,669.75
Live Data Feed Licenses $52,682.76
CC Data Licenses $5,100.00
Amazon Associates $1,299.94
Tagger Affiliates $18,147.62
CD Baby Affiliate $12.00
Bank Credits $0.10
Bank Interest $681.89
Total Income: $177,740.94

Expenses

Bank Fees $995.30
PayPal Fees $1,263.63
Rent $2,856.00
Hardware $5,455.15
Travel $4,806.26
Internet $184.56
Development $109,991.19
Gifts $458.63
Events $270.98
Hosting $16,900.00
Filing Fees $60.00
Software $99.00
Entertainment $331.53
Books $20.99
Insurance $2,025.00
Accounting $1,200.00
Shipping $87.51
Payroll Taxes $25,639.39
Advertising $259.82
Total Expenses: $172,904.94

The Profit & Loss shows:

  • In 2010 the foundation spent $22,355.15 on hosting and hardware costs and served out 3.2 billion web hits and 1.5 billon web service hits. Calculating a cost per hit, we find that we spent $6.93 per one million web hits and $14.50 per one million web service hits. Compared to our 2009 figures of $8.66 per one million web hits and $14.37 per one million web service hits, these numbers didn’t change much.
  • End-user donations via PayPal came to $6,867.99 which is roughly 10% less than last year. End-user donations came to less than 4% of our overall income due to the much larger role of sponsors such as Google and Richard Jones.
  • Development costs in the form of salaries paid to Robert Kaye, Oliver Charles, and Kuno Woudt came to $109,991.19. It is amazing what we were able to accomplish with such a limited budget for paid engineers.
  • In 2010 we earned $52,682.76 from live data feed licenses and $5,100.00 from Creative Commons licensed data for a total of $57,782.76. This is up 28.8% from the total of $44,878.50 in 2009.

Balance Sheet

The balance sheet for the end of 2010 showed the MetaBrainz Foundation with $77,011.94 retained earnings, a net income of $4,836.00, and cash assets totalling $81,847.94.

Traffic

The following chart shows our overall web traffic to musicbrainz.org for 2006 – 2010:

Musicbrainz Traffic 2006-2010

The blue line represents the overall number of hits to musicbrainz.org and the red line shows how many of the overall hits were web service (API) hits. As in previous years, our web service hits represent about 85% – 90% of our overall traffic. Please note that in September of 2009 we switched to a more accurate method for keeping track of our overall web services hits. Prior to this, the graph shows the sum of the artist/release/track counts, rather than the total web service traffic.

Our traffic grew considerably in the first half of 2010, but then leveled off for the second half of 2010. We don’t know what accounted for this leveling off, but we suspect that the lack of new features for the MusicBrainz server generally decreased interest in the project.

Top contributors

Top Editors
1. drsaunde 73982
2. brianfreud 72121
3. gswanjord 58933
4. murdos 54517
5. HumHumXX 43985
6. dimpole 39527
7. salo.rock 38644
8. nikki 37770
9. reosarevok 37702
10. refresh_daemon 34806
11. jesus2099 33515
12. Senax 30956
13. MeinDummy 25521
14. mr_maxis 24743
15. Billy Yank 20248
16. dinog 20215
17. crazee_canuck 18537
18. ojnkpjg 18469
19. Bitmap 17612
20. kepstin 16550
21. NAvAP 16095
22. Jeroen 16064
23. rswarbrick 15972
24. zos18 15204
25. fred576 14897
Top Voters
1. chabreyflint 49247
2. salo.rock 48651
3. murdos 33786
4. Locustus 29267
5. SuicideScrub 24312
6. bogdanb 22473
7. nikki 21085
8. brianfreud 20606
9. gswanjord 19276
10. Bitmap 17956
11. MClemo 17362
12. drsaunde 15082
13. KRSCuan 13922
14. dinog 13844
15. reosarevok 9650
16. MeinDummy 9262
17. HumHumXX 8534
18. mr_maxis 7617
19. PhantomOTO 7194
20. articpenguin 7092
21. ojnkpjg 6977
22. Plagueis 6877
23. alphaseven 6762
24. fatih 6611
25. alllysssa 5819

Server farm

At the end of 2010, MusicBrainz had 14 machines in service. From the top, going down:

  • moose: Our database server
  • scooby: Our aging catch all server: blog, forums, mailing lists, etc
  • catbus: Raw database server (raw tags, collections, etc)
  • bender: Former TRM server, now idle cold spare machine
  • blik: memcached
  • stimpy, dexter: web service servers
  • cartman: Search server, index builder
  • wiley: New catch all server: SVN, git, jira, wiki, trac, mail, backups
  • lenny/carl: Redundant network gateways
  • tails: Front end web server
  • asterisk: Search server
  • jem: Search server

MusicBrainz uses 6mbits of bandwidth per second and draws 21 Amps of current for a power consumption of about 2,310 Watts. MusicBrainz physically occupies 20Us of space (half of a rack) at Digital West in San Luis Obispo, CA.

Words of Appreciation

2010 was a challenging year for us, starting off with rocky finances, but support from Richard Jones and Google put us back on track. There were many people who thought that we could not ship NGS or that MusicBrainz would languish while we worked to complete NGS. Given that we had no server releases at all in 2010, we are pleased that the project remained relevant and that our community believed in us to finish NGS.

MusicBrainz would like to thank its community of stellar editors (see above), its core developers (Lukáš Lalinský, Oliver Charles, Kuno Woudt, Aurélien Mino), our hero of system administration, Dave Evans and our goddess of bug tracking, bug fixing, editing and all things unicode, Nikki. We thank Jamie McDonald for the awesome Android app he wrote and we’d also like to thank Pavan Chander for all of his contributions.

We’d like to thank Richard Jones, Google and every single donor who donated money to MetaBrainz in 2010. We’d also like to thank our board of directors (Cory Doctorow, Brian Zisk, Matt Wood, Rachel Segal/Carol Smith), our pro bono legal advisors Daniel Appelman and Ed Cavazos, our awesome hosting company Digital West and all of our customers. Finally, we would also like to thank the music teams at the BBC for their continued support and for motivating us to bring NGS to a close.

User agent based throttling is now live

Yesterday we talked about rolling out our throttling based on User-Agent strings. A few minutes ago we pushed this feature live on our servers so now the updated rules are in effect. python-musicbrainz/0.7.3 users are now allowed 500 requests every 10 seconds and every single one of these requests is constantly being used. No surprise here. 🙂

For the exact details on what is throttled and how to get around your application being throttled, see our rate limiting documentation.

Current web service rate limiting documentation

We’ve just added a page that documents what we’re currently blocking on our Web Service. We hope to lift the block on python-musicbrainz/0.7.3 tomorrow and instead throttle the number of requests it can make in a given period of time.

I’ll post another entry once we’re done with making those changes.

Schema change releases for 2012

One of the issues raised at the last summit was that our customers could use more time to prepare for schema change releases, since they require engineering effort on their part. In an effort to meet our customer’s needs we’re going to a set schedule for schema change releases. Going forward we’re going to have two schema change releases per year: On or about 15 May and on or about 15 October.

We’ve picked these two dates as dates that have the least amount of impact from holidays and people taking holidays. Most companies have normal working schedules around these dates, which should allow companies to dedicate the required resources to handle our schema changes.

However, we have one significant change that we need to push out on a more timely basis than May of next year. For that reason we’re going to plan a one-time exception to our new schedule for 12 January 2012. You can see the tickets we’ve scheduled for release in January in this schema change milestone.

We’ve already created release versions in Jira for all of the schema change releases in 2012. As we go through the year we’re going to add tickets to that milestone. Of course, we’re going to make lots of noise as these schema change release dates approach. We’re going to post a list of tickets that will be included no later than a month before the release.

Any questions? Ask them in the comments!

Search server update 2011-12-08

We’ve just updated our search server with the latest changes and bug fixes!

This update to the Search Server finally solves the search for artist “!!!” problem properly, should now be able to find just about any artist, release, etc that contain any crazy character combination. Also you can now search for number of releases in release groups and artists with unknown gender. The code base has now been updated to Lucene 3.4, which was the latest available when these changes were made (its at 3.5 now).

Thanks for your hard work on this Paul!

Bug

  • [SEARCH-33] – Search needs to find a preposterously bad label name: !"@.*!%
  • [SEARCH-51] – Searching for certain characters returns no results, even if they’re valid.
  • [SEARCH-131] – When search for Unknown country it returns <country>UNKNOWN</country> it should return nothing
  • [SEARCH-134] – Annotation search can’t filter by type for release groups

Improvement

  • [SEARCH-93] – Update Search Index Code to Lucene 3.4
  • [SEARCH-119] – Allow artist search for gender:unknown
  • [SEARCH-124] – Allow searching release groups by # releases
  • [SEARCH-128] – Make Search Server use mmap by default

Server update 2011-12-05

Another update has just gone out, a few days later than planned. This is mostly a bug fix and minor improvement release. Sadly, when deploying the update we broke the release editor for anyone who was editing during the server switches – something we didn’t anticipate. Sorry! Here’s what’s changed:

Bug

  • [MBS-2371] – Can’t type in the basic tracklist editor
  • [MBS-2785] – Release Editor messes up release artist multi-artist credits
  • [MBS-3152] – Tracklist duplication with sub-second track duration differences
  • [MBS-3428] – If seed new release with track artist ids it doesnt properly resolve the artist ids
  • [MBS-3471] – Wrong display of RG in the Edit Note tab of the RE
  • [MBS-3498] – Editing tracklists on any release which has been through an artist split fails.
  • [MBS-3601] – Incorrect "Artist as credited" displayed and impossible to edit track artist in release editor
  • [MBS-3804] – Entering "Edit barcodes" edit should not be possible if nothing is being changed
  • [MBS-3808] – Internal server error searching for edits
  • [MBS-3813] – Release editor gets stuck on tracklist tab saying there are errors when there are none
  • [MBS-3814] – Guess case broken in the RE
  • [MBS-3816] – Track parser displays "This is placeholder text" instead of instructions (or nothing)
  • [MBS-3831] – "There were one or more errors, please check the following fields: " after changing the capitalization of one letter
  • [MBS-3836] – Tracklist tab showing only the first track when importing a CD stub
  • [MBS-3849] – RE: Can’t import FreeDB releases through Add Disc
  • [MBS-3858] – Tracklist duplication with equal sub-second track durations
  • [MBS-3864] – importing a cd stub only gives you the first track in the track editor
  • [MBS-3908] – Edit numbers in edit notes should not be parsed when they are on different lines

Improvement

  • [MBS-89] – Subscribed artist e-mails should be optional
  • [MBS-2924] – timeline.js should use MB.html
  • [MBS-3239] – Subscription emails > Option to receive weekly
  • [MBS-3356] – Seeded add release (copied from existing release) doesnt link tracks to recordings of copied release
  • [MBS-3684] – Improve sorting of the "Releases with superfluous data tracks" report
  • [MBS-3743] – Move the trackparser (basic view) on the tracklist tab to the "Add Disc" dialog.
  • [MBS-3892] – Display artist disambiguation comments on mouse over

New Feature

  • [MBS-3899] – Report: Discogs pages attached to multiple release groups

Task

  • [MBS-3911] – Remove all "DeferredUpdate" code

Sub-task

  • [MBS-3733] – Add create/get session logging to release editor, to debug MBS-3379 / MBS-3590.