TRM Database Pruned

The TRM database has been pruned again, making the system much faster and more reliable again.

At about 1930hrs UTC on November 4th the TRM database
was “pruned” again (see the previous time
for more information about this). 
This time we removed all TRMs apart from the ones attached to
MusicBrainz tracks, where the TRM had been looked up at least once. 
This is a slightly more aggressive prune compared to the previous time. 

The TRM database is now about
40% of the size it was before, which (like last time) means that
it now fits easily into the server’s memory, so the server as a whole
runs quickly
and reliably.

Annotations – final testing?

I’ve made what I hope will be the final series of changes to the annotations work (at least, for its first release) – please see this posting on mb-users for details. Please test it, and let me know what you think. Thanks!

I’ve made what I hope will be the final series of changes to the annotations work (at least, for its first release) – please see this posting on mb-users for details. Please test it, and let me know what you think. Thanks!

Duplicate Artist/Album Reports Updated (expect delays)

The raw data feeding into the duplicate artist/album reports has been updated. These are available under Edit The Data/Suggestions as the last two entries: “Albums that might need merging” and “More possibly duplicate artists”. The last time this report was generated (July) we had 1703 possible album duplicates and 1872 possible artist duplicates. We now … Continue reading “Duplicate Artist/Album Reports Updated (expect delays)”

The raw data feeding into the duplicate artist/album reports has been updated. These are available under Edit The Data/Suggestions as the last two entries: “Albums that might need merging” and
“More possibly duplicate artists”. The last time this report was generated (July) we had 1703 possible album duplicates and 1872 possible artist duplicates. We now have 2714 possible album duplicates and 2424 possible artist duplicates.

Note that there is a delay between when I upload the raw data and it is reflected on the server – I think this was set up to happen once a day, but it may be only once a week.

As always, if anyone is looking at these, and there is a confirmed false positive, let me know and I will (a) make sure it doesn’t show up in the next report, and (b) see if I can improve the overall reporting. So far very few people have submitted false positives.

Annotations – please try it out!

The annotations work I’ve been talking about over the last few days is now live on the test server. Please have a play and let me know what you think. Is it too limited? Does the page layout need tweaking? Maybe the way the moderations work isn’t quite right? All feedback gratefully received.

The annotations work I’ve been talking about over the last few days is now live on the test server. Please have a play and let me know what you think. Is it too limited? Does the page layout need tweaking? Maybe the way the moderations work isn’t quite right? All feedback gratefully received.

Continue reading “Annotations – please try it out!”

I did mention that Lucene rocks, right?

I decided that I wanted to put together a comprehensive test of Lucene so I could show how powerful, fast and accurate Lucene is. This is just a simple Python script that is not integrated to the rest of MusicBrainz — it doesn’t even touch the Postgres DB! My little test is hosted in the … Continue reading “I did mention that Lucene rocks, right?”

I decided that I wanted to put together a comprehensive test of Lucene so I could show how powerful, fast and accurate Lucene is. This is just a simple Python script that is not integrated to the rest of MusicBrainz — it doesn’t even touch the Postgres DB!

My little test is hosted in the staging server — the DNS should’ve propagated by now. Come check it out:

http://search.musicbrainz.org

Annotations, part II

Following on the Annotations work, I’ve now got it automatically handling artist / album merges and deletes, which was one of my main concerns before. I think this makes it just about ready to test. It’s a very simple feature, which is possibly one reason why I think it will work – I’m not sure … Continue reading “Annotations, part II”

Following on the Annotations work, I’ve now got it automatically handling artist / album merges and deletes, which was one of my main concerns before. I think this makes it just about ready to test. It’s a very simple feature, which is possibly one reason why I think it will work – I’m not sure it needs to be any more complex – not for the first release anyway.

Update: see the Wiki for documentation.

Continue reading “Annotations, part II”

Style Guidelines Work

Started work on the effort to move the official style guidelines to the Wiki. Created a Wiki page mimicing the current Official Style Guidelines page, it’s still a bit rough and I need to work on the formatting, but it’s there. Also started work on the style pages from the main site out of CVS, … Continue reading “Style Guidelines Work”

Started work on the effort to move the official style guidelines to the Wiki. Created a Wiki page mimicing the current Official Style Guidelines page, it’s still a bit rough and I need to work on the formatting, but it’s there. Also started work on the style pages from the main site out of CVS, but realised I can’t get too far without a working running version of the MB server – viewing the raw html file in the browser doesn’t get all the Mason code. Setting up the server is a job for tomorrow.

Annotations

This evening I’m taking another look at the “Annotations” work done by Matthias a while back. It’s pretty near complete – he did a good job. Now I’ve just got to polish off a couple of rough edges and merge it back into the CVS trunk. I think we should be able to throw this … Continue reading “Annotations”

This evening I’m taking another look at the “Annotations” work done by Matthias a while back. It’s pretty near complete – he did a good job. Now I’ve just got to polish off a couple of rough edges and merge it back into the CVS trunk. I think we should be able to throw this one open for testing pretty soon.

Continue reading “Annotations”

Server Updates

Better Disc ID support, UTF-8 support for FreeDB, several tweaks to the automoderator election system and the usual miscellaneous bunch of bug fixes and other changes.

Changes mainly of interest to MusicBrainz Users

Revamped Support for Disc IDs

Duplicate disc IDs are now allowed.  As a pleasant
side-effect you can now
search for albums based on FreeDB ID
as well as by disc ID
You can also inspect the disc ID details
much more closely than before.

FreeDB

FreeDB Moderations, (the mechanism whereby MusicBrainz automatically
imports data from FreeDB with no human intervention),
has been turned off – no more “FreeDB mods”. 
You can still do a manual FreeDB import
if you like.

Also, MusicBrainz now uses “FreeDB protocol 6”, which means much better
Unicode support when importing albums.

Automoderator Elections

All the e-mails which the system sends to the mb-automods mailing list
(when a candidate is nominated, when voting opens, etc) are now “Cc”d to
the candidate (assuming the candidate has entered an e-mail address). 
Previously it was possible, indeed probable, that the candidate had no idea
they were being nominated, and indeed accepted, right up until they got the
“Welcome to the mb-automods mailing list” e-mail.

At the end of an election, if the nominee was accepted, the system can now
make the successful candidate into an Automod without needing manual
intervention from the server administrators.  It can’t yet subscribe
the new auto-moderator to the “mb-automods” mailing list, however.

While voting is open, the tally of votes cast so far is now hidden to all
apart from the proposer, the seconders and the candidate.

If you are not logged in, the automod voting page now displays a more
helpful message than before.

Special “system” users (ModBot etc) can no longer be nominated for
auto-moderator status.

A small typo was fixed in the e-mails sent by The Returning Officer.

TRM Statistics

Ever since TRMs were introduced to MusicBrainz, we’ve kept a count of
how many times each TRM has been looked up by the Tagger (or similar apps). 
The problem was, we had no idea how often each TRM was then used to tag
each associated track.  So when looking at a TRM joined to several
tracks, working out which was the “most used” track was a matter of guesswork.

This release introduces the ability to count uses of TRMs (i.e.
against a specific track), as well as lookups.  You can see this
on the track detail page,
although all the “use” counts will all start out as blank (i.e. zero). 
In fact we’ll be keeping month-by-month lookup counts and use counts for
each TRM, so in theory you’ll be able to see how the tagging “popularity”
for each song rises and falls over time (although all you can see on the web
is the running total).

Other Changes

A bug was causing the artist search index to become
corrupted.  The bug has been fixed and the index has been rebuilt.

When albums are imported from FreeDB, and the ModBot adds a note
giving the URL of the original FreeDB data, that note is no longer
mailed to the original moderator.

When adding an album, both album attributes now default
to “not known” instead of “Album, Official”.

You can now make a case-change edit on artist aliases
(previously it erroneously complained that there was a conflicting
alias).  The code which adds and renames aliases has been made more
robust.

The edit artist page now includes a “copy” button (copies the name
into the sortname field).

“Guess Case” has been tweaked again: it no longer adds a space
after “.” if the next character is “.” (because we don’t want “…” to
become “. ..”).  It doesn’t check for a sub-title split when
inside parentheses (fixes “Album Title (Disc 1: Disc Name) bug). 
It converts “reprise” to lower case when within parentheses. 
It strips spaces after “(” or “[“, and before “]” or “)”.

The MusicBrainz data dumps now include Amazon cover art URLs.

The tagger search page is no longer fooled by extra whitespace
around your search query.

New and Changed Documentation

Changes mainly of interest to MusicBrainz Server Programmers

The INSTALL file has been further updated,
describing the installation process more fully and more helpfully than ever
before 🙂

InitDb.pl now does a much better job
of creating the replication function.  You can use
--with-pending=FILE to tell it where to find pending.so.

Various scripts (MBImport.pl, ImportReplicationChanges,
LoadReplicationChanges) now check the status of completed sub-processes more
carefully.

FixLength.pl now runs every
night.  It has been made more robust too – it reports any errors it
encounters and keeps running, and it also shows the IDs of any albums it
can’t fix.

Only one instance of LoadReplicationChanges
is now allowed at a time.

ProcessReplicationChanges
now deletes “pending” data as it is processed, which means that if the
script is stopped (or crashes) we can safely restart where we left
off.  Several options have also been added to help with
troubleshooting.

Bugs and RFEs Closed

Dave Evans

TRM Database Pruned

The TRM database has been pruned, making the system much faster again.

At about 11pm (UK time) on August 9th the TRM database
was “pruned”, removing all TRMs apart from the ones attached to
MusicBrainz tracks.  This resulted in the database becoming about
one-third of the size it was before.

Because the TRM database is smaller, it now fits into the server’s
memory, reducing the need for disk activity (which means the rest of
MusicBrainz runs faster too).

Because TRM requests can now be served from memory instead of from disk,
TRM responses are now much quicker,
which in turn means that
we don’t have to refuse as many TRM requests
due to the server being too busy.

(Note that most of those linked graphs started in mid March 2004,
just before the TRM server’s memory was upgraded.  This explains why
several of the graphs start with a sudden change in behaviour.)