In part 1 I set the groundwork for how we arrived at our current situation of me being overloaded, and in part 2 I would like to start looking at long term solutions to issues from part 1. Part 3 will look at more short term solutions.
I know there are a number of people who do not like the fact that MusicBrainz is a business. But we have to realize that MusicBrainz and its legal parent the MetaBrainz Foundation is a non-profit business and that it would not really be possible to grow such a project without it being a business. (Before we started the non-profit everything was owned by myself, which IMHO is a much less desirable way of doing this.) We are still an open source project. We’re also an open data project.
Something is working: We have users, editors, peer reviewers and loads of traffic. We even have $$$ money flowing into the non-profit. The community is constantly asking for more features to expand the database and to improve the service.
However, something is not working: We’re dreadfully short on server developers. The number of people who have made serious contributions to the mb_server project can be counted on one hand (or pretty close to that). In order to work on mb_server you need to have a linux box, with free disk space, a fair amount of RAM and you have to have the skills to deal with the long INSTALL process. Some people simply do not own a machine that is capable of doing this!
And even those people who can deal with technical requirements are not necessarily up for the social issues that come with the job. The community can be very demanding and quite harsh at times — no one wants to work long hours for free only to have their work insulted. Compared to ordinary open source projects, mb_server is quite challenging to work on. No doubt about it, its a tough and harsh job that people have been doing for free.
Going forward I will be working very hard to raise funds in order to hire a full time mb_server developer. We’ve passed the point where we can handle all of the mb_server development tasks with volunteer labor. Programming a project towards self-sustainabilty is quite different from programming to scratch an itch. We still have many itches that I hope volunteers will take on, but managing and coding towards sustainability will soon need to be handled by a full time paid developer.
How do we accomplish such a thing? Ever since the last summit where we hammered out the Next Generation Schema I’ve been talking to potential sponsors about donating money towards the development of this next version of MusicBrainz. Finally we have a concrete task to focus on, rather than just trying to make the basics work. Various companies have been receptive to this idea and I will continue to look for sponsors for this task.
My goal is to raise $100k – $200k in hopes of being able to secure a salary for an engineer for one year. The hope is to sell enough data licenses in the course of that year to keep the developer on and pay for this person’s salary. Once this engineer is on staff, I would expect to see 2-4 releases over the this year to prepare for and roll out the Next Generation Schema.
I expect that my time will continue to be taken by running the business development aspects of Musicbrainz and interacting with the community and partners. However, please don’t take this as me no longer participating in the development process. I will still be involved — certainly to manage the process and also to hack out my personal itches. I just don’t think its viable for me to be the official maintainer of any major pieces of code.
Please keep in mind that this is long term planning. I have no schedule for when this will happen. Please don’t mail me in two weeks asking to be hired or to have Next Generation Schema working.
Technorati Tags: communities, metabrainz, musicbrainz
13 thoughts on “Addressing MusicBrainz' growing problems: part 2”
Robert did not fire me, but has taken away my priviledges as a developer, but inviting me to still be involved in the project.
Since this literally said i wasn’t up to the task of handling the community (from my POV i was not able to deal with the bickering of a handful of people, while still doing valuable work) i could not accept Rob declaring his mistrust in me, and handed in my resignation publicly.
Therefore the act of taking away the priviledges was in fact firing me, which i assume was not entirely intended. Mistakes have been made by all parties involved, but this is not necessarily a sign of social inadequacies, but more a statement, that the balance between reward and “hard work” wasn’t there like it had been before. This can serve as an attempt to explain why some of the involved parties weren’t able to keep the lid on the pot like they used to, before.
I wish you all the best with taking the next steps.
Alright, time for an unpopular opinion:
MB isn’t ready for a Next Generation Schema. mb_server is difficult to maintain and still contains lots of more or less severe bugs (just have a look at the open bug reports), despite of Stefan’s very positive impact on the code base. Major schema changes in the current state is the last thing MB needs, IMHO. What we need is a large scale cleanup, like separating the business logic from the display, for example. At the moment, too much code is hidden in the mason pages, which is almost impossible to test properly in an automated fashion.
Also, don’t forget that we are currently in the process of migrating to the new web service. Developers are starting to work with it (it’s still buggy, but usable) and with pymb2 and libmb3, two implementations have been created. I, personally, am not really keen on working on yet another web service and to change existing libraries once more. I’m not sure anybody knows how much work was done and what NGS would mean here.
Not to mention the social problems we’re having with the editing process (too few voters, etc.). With AR, the complexity of MB has increased significantly for new users. With a new schema and user interfaces, the learning curve is going to increase and things will get worse.
My suggestion is to take at least a year of time to concentrate on stabilizing MB in both the social and technological area. Fix bugs on every level, lower the preconditions of getting into mb_server development, things like that. *Then* think about new features.
I think a lot of your points also show that mb_server was a project without a clear owner. There are a lot of bugs to fix and many other tasks to clean up.
I believe that the road to NGS must be at the same time a clean-up process. I envision one house-cleaning release before actually starting on NGS work itself. And each NGS release will have to clean up more cruft as we go. Overall, I dont think that we can afford to take a year to clean things up before moving on with new features.
As for the web service — we’ve built a good basis that we can use to roll out NGS features. I don’t think that we need to start over with the web services when NGS comes along. But yes, adjusting the web services for NGS will be far from trivial.
How would you suggest that we lower the preconditions for mb_server development?
The question in my mind is this: is this a one-person task? Everything I’ve read here indicates that it’s really not. [Please understand that my engineering background is strictly aerospace, from the manufacturing side of things, and my idea of code in college was FORTRAN 77. ;)] IF you go to a single developer, what will their charge be: sole developer, or primary developer with patches submitted from the community, directing an open source effort?
I’m quite confident that MuB/MeB will keep everything open, but in my mind, hiring a single developer sends a message that most open-source projects don’t send. [At least in my experience.] If this is done—and I think that it could be done and done well!—the job responsibilities need to be clear to the developer and the community. 🙂
This is clearly not a one-person job — you’re right in that. My hope is that the mb_server developer would act as the liason between mb_server and the community. I forsee this person spending half their time coding, and the other half of their time interacting and coordinating the efforts of others.
I hope that this person would then also rally other developers into helping out with coding NGS. We’re not moving away from our open source roots where we rely on volunteer efforts. In the future we will still need lots of volunteer efforts, but most of all, we will need one person who gets paid to keep everyone on the same page. One person who gets paid to take tha abuse of the community. 🙂
Rob, I think you underestimate the amount of work that’s necessary for a cleanup release. It’s not enough to clean things up, which is bad enough, there has to be a huge amount of testing. The project has to take that time, no matter what new features people want. Stop promising things, you’re putting pressure on yourself, you’ll get a heart attack.
Anyway, suggestions on lowering the mb_server preconditions: Make importing the DB use fewer perl modules; most aren’t needed but the code still depends on them. Make it easier to download the correct branch (trivial). Provide a partial database snapshot of a few megabytes so that people can play with it (not quite trivial).
Get rid of that
/home/httpd/musicbrainz/mb_server/cgi-bin/perl hack, I doubt it’s needed anymore. Write a wrapper around Text::Unaccent so that patching the module is no longer necessary. Make things work (in a limited way) without Language::Guess, XML::RSS and other secondary stuff.
In general, update the INSTALL document, check if all those modules are still needed or if additional ones are required. Let people without mb_server and postgres knowledge test the installation process.
I’m sure there’s more that can be done.
As for the cleanup release — I think you and I have different views on what should be done in a cleanup release. I think moving to NGS means restructuring a lot of code and doing a lot of cleanup, which requires a lot of testing. Personally I don’t see the point of cleaning up things without actually getting new features.
Your other suggestions are spot on — I’ll see if I can tackle a few of them in the next release.
As for the “Get rid of that
/home/httpd/musicbrainz/mb_server/cgi-bin/perl hack” — I agree there has to be a better way. But the main point is that the perl interpreter is not always in the same spot and users have pointed this out time and time again. By fixing the location of the perl interpreter to a location in the mb_server code, we solve this issue.
How would you solve this?
The perl location hack isn’t necessary for the vast majority of potential developers who use mainstream linux distributions. If someone has their perl interpreter in a different location, let *them* rewrite the scripts.
Regarding the definition of “cleanup”: Our opinions couldn’t be more different in this point. Anyway, I think mb-devel would be a better place to discuss this.
I’m very likely missing something, but if the perl hack referred to is the putting a symlink to perl there, then wouldn’t “#!/usr/bin/env perl” do the trick? It’s not necessarily slower either, it’ll be cached, and very nearly as fast as a specific path.
Passing the -w flag to perl doesn’t work with env — you end up making a nasty hack to make it work. see
Interesting. I figured I was missing something, since it was too obvious to have been overlooked.
Turns out, FreeBSD’s /bin/env always did handle these arguments, but since 6.0 has been changed to by default to not do so. In the interests of portability, it has to now be enabled with a switch to env itself 🙂
You mention business at the start of this piece. Has anyone looked into opening the MB site to Google and whether the costs of increased traffic could be offset by advertising and affiliate kickbacks? It’s hardly a novel approach these days but I believe MB currently intentionally excludes itself from search results, so I’m assuming that someone made the calculation at one point and decided it wasn’t worth it. Is that still true though?
Would the income from this be enough to pay an actual employee, who’s initial job could simply be to make more money by this route, by which I mean enhancing the Googlability, ensuring efficient caching, stripping down bulky html, linking to MB customer’s use of data and generally lowering costs while adding value and increasing potential revenue?
Just a thought.
I have done the math on adding google adwords to musicbrainz. My reasoning against that is:
1. We’d need to do work in order to support HTTP caching headers (last changed since) and that ends up being a schema change and a fair amount of work.
2. We’d be putting ads in front of people who help us. Doesn’t seem right.
3. We’d have to fork over more money for extra bandwidth for the googlebot.
4. The income would not be enough to hire a person.
I hadn’t actually considered using env — that would’ve been a nice and clean solution. Alas. As it stands I standardized on /usr/bin/perl and move the directions for changing the perl interpreter into a new INSTALL.advanced file to keep the main INSTALL file as simple as possible.