Splunk supports MusicBrainz!

As part of Google’s Summer of Code program we accepted Dániel Bali to work on analyzing our web server logs to mine them for interesting information about MusicBrainz and people who are using MusicBrainz. (see a preview of this project)

To make that project a reality we had help from Splunk, the company that creates the fantastic data analysis tool by the same name. Splunk provided us with enterprise trial licenses during the summer and now going forward has accepted us into their Splunk for Good program. This program provides a free 10GB/day (it allows us to import 10GB of data into our Splunk server per day) license on a yearly basis.

We now count Splunk among our sponsors and we’re looking forward to rolling out Dániel’s work in October. Thank you Splunk and thank you to Joyce Morrell and Christy Wilson from Splunk for working with us to make this happen!

Summer of Code log analysis project: May we share our data with our GSoC student?

UPDATE: This clearly going to be a major hassle, so we’ll spend the extra time coding a program that will sanitize the data before it goes into splunk.

Last week Google’s Summer of Code program started and my student Dániel Bali is ready to get busy combing through our massive logs and see what sorts of information he can mine from our logs.

We only have one minor problem — our logs contain the IP addresses of our users and some requests contain the user names of the person making the request. Removing this private information from the logs before Dániel sees them is quite a pain to do well.

I would like to propose that we:

  1. Consider Dániel part of our core team for the summer and allow him to see IP addresses and all the requests in full.
  2. Have Dániel sign a short statement stating that he will not divulge any private information.
  3. Will fail him in his GSoC project if he does divulge any private information.

If this is not acceptable to you, please speak up soon. I would like to make this happen early next week so Dániel can continue his GSoc work.

UPDATE: The final output of Dániel’s work will not contain any private information. If we end up using any private data as input, we will sanitize it and remove private information before we publish the output.

Looking for Language Liaisons

As some of you may know, this summer through Google Summer of Code I’m working on internationalization of musicbrainz-server. As outlined in my proposal, I’m currently looking to find what I call “language liaisons”: folks willing to be the go-to person about a given language for me and other developers.

Auf deutsch!
Auf Deutsch!

What’s expected of liaisons:

  1. Willing to be pestered occasionally, by me or other developers, about language-specific concerns: when adding new features, and thus adding new strings, we’d like to be able to ensure nothing’s added that will need to be changed before it can be translated into a given language.
  2. Willing to file bugs for strings already in the database that are untranslatable, should you find them.
  3. Be on the musicbrainz-i18n mailing list; this will be the main venue for organization and communication about i18n issues.
  4. Ideally, to be an active translator for your language – but this isn’t a requirement, because I’d like to get the widest global coverage I can; even if a language doesn’t currently have a translation, we don’t want to unintentionally sabotage future translators with untranslatable strings!
musicbrainz-japanese
日本語

I’ll also be determining a (related) list of “target languages” for the summer, with the intention of releasing translation on musicbrainz.org with these languages at the end of the summer. I’ll consider for inclusion on this list languages that are both in active translation on Transifex and have language liaisons.

If you’re interested in being a language liaison, please contact me: ianmcorvidae (at) musicbrainz (dot) org, editor ianmcorvidae, or ianmcorvidae on IRC, and join the mailing list.

If you’re interested in i18n generally,  please join the musicbrainz-i18n list. For more information on my project and musicbrainz-server i18n, see the server internationalization wiki pagemy post on my personal blog, and my official proposal, or come ask about it on IRC or the mailing list!

(less useful languages)
(less useful languages)

MusicBrainz Android app now available in the Android Market

Jamie McDonald has continued his Summer of Code work and has submitted the first version of the MusicBrainz app to the Android Market! If you would like to be able to look up releases by barcode, search for artists and rate/tag data in MusicBrainz, this app is for you:

MusicBrainz App in the Android Market!

I’ve already used this application in a number of social situations where someone wanted to know some music info and I was able to look it up very quickly. Its quite handy! Also, an iPhone version is still in the works.

Thanks very much for your continued work on this project, Jamie!

Accepted Summer of Code projects

First, many apologies to our Google Summer of Code students for not posting about which projects we had accepted for this summer. The NGS release took an amazing amount of time and I’m finally getting on top of the backlog of things to catch up on.

Now on to introduce the projects for this year:

Eliza Gebow (Batsy) has been accepted to hack on Embeddable Widgets that will allow anyone to embed a MusicBrainz widget into their site/blog/whatever that will dynamically display MusicBrainz information about artists, releases, recordings and possible works if there is enough time. You can follow Eliza’s progress on her blog.

Ian McEwen (ianmcorvidae) has already been madly hacking on improving our timeline and statistics pages. The goal of the project is to provide MusicBrainz users with a comprehensive tool for examining the growth of the MusicBrainz data and to understand how changes in MusicBrainz features and policy affect our database. Follow
Ian’s progress on his blog
.

Last, but not least, Michael Wiencek (bitmap) is spending his time this summer hacking on Picard. First Micheal is focusing to make Picard ready for the NGS site — as we were developing NGS we didn’t have the resources to make Picard ready for NGS. Michael is fixing this and adding some new features to Picard as well. Michael has overcome his dislike for blog and you can follow his progress on his blog.

Most amazingly, both Ian and Michael have already shipped working code as part of their GSoC work. Ian’s bare bones timeline is now live on the NGS site and Michael has already released a new beta version of Picard. Amazing stuff — please keep up the good work!