From the portugal paper --"the test set adopted for the qualitative evaluation of the proposed method is the one presented in (Dalitz et al., 2008) and already described."
Dalitz raises and answers some questions in his 2008 paper about a dataset:
"How do we measure the distance of a given segmentation from a perfect 'ground truth' segmentation, and how do we obtain the ground truthing data?"
"Even though the labeling of the ground-truth data could be done manually, this is very time consuming and has the disadvantage of an ad-hoc classification of dubious pixels belonging both to a staffline and a crossing symbol. Therefore, we generate our music images from postscript images created with music typesetting software, which allows for “perfect” staff removal."
Dalitz's data set is available over here (along with another handwritten data set).
UCSD CSE 190: Projects in Vision and Learning
23.1.12
19.1.12
Acquiring Another Dataset
This week I focused on growing my dataset of images.
I looked around online for existing databases of music images and found a few collections of mostly historic music. The main problem with these collections is that they are set up so that you can easily browse, but downloading large amounts is difficult. I emailed the people in charge of the Digital Scores and Libraries over at Harvard regarding the optimum way to download their collection, but I have not received a reply yet.
I ended up downloading a collection of public domain music from the Cantorion collection. I was lucky enough to find an open directory listing with a large amount of PDFs from this collection. I then wrote a simple web scraper in Python which went through the files in the web directory and downloaded all the PDF files to my hard drive. This yielded a total of 681 PDF files, which (after splitting into individual images using ImageMagick) should be an ample size dataset for now.
I looked around online for existing databases of music images and found a few collections of mostly historic music. The main problem with these collections is that they are set up so that you can easily browse, but downloading large amounts is difficult. I emailed the people in charge of the Digital Scores and Libraries over at Harvard regarding the optimum way to download their collection, but I have not received a reply yet.
I ended up downloading a collection of public domain music from the Cantorion collection. I was lucky enough to find an open directory listing with a large amount of PDFs from this collection. I then wrote a simple web scraper in Python which went through the files in the web directory and downloaded all the PDF files to my hard drive. This yielded a total of 681 PDF files, which (after splitting into individual images using ImageMagick) should be an ample size dataset for now.
15.5.11
Pipeline for Processing
As of now, our process looks somewhat like this:
1. Get picture of score
2. Invert picture of score
3. Convert inverted picture to Float image in Gamera
4. Convolve inverted Float image with template in Gamera
5. Threshold result of convolution to get locations of template
6. Use locations of templates combined with locations of staff lines to determine locations of notes
We have implemented steps 1 through 5. We have been able to threshold some of the images manually, but have not decided on a good algorithm yet to do it automatically. Step 6 will be worked on after we figure out thresholding.
(Above: an image produced by manually thresholding)
1. Get picture of score
2. Invert picture of score
3. Convert inverted picture to Float image in Gamera
4. Convolve inverted Float image with template in Gamera
5. Threshold result of convolution to get locations of template
6. Use locations of templates combined with locations of staff lines to determine locations of notes
We have implemented steps 1 through 5. We have been able to threshold some of the images manually, but have not decided on a good algorithm yet to do it automatically. Step 6 will be worked on after we figure out thresholding.
(Above: an image produced by manually thresholding)
9.5.11
Convolution
We used the gamera function convolve with the bootstrap template set and got some very strange results initially. We used a piece of sheet music generated by MuseScore as a toy example:
Then using a quarternote template:
We don't have a clue what caused the visual doppler effect, but we had the idea to invert both the template and the toy image so that the highest response would correspond to the correct points in the image. A non inverted image has (255,255) for white and (0,0) for black, which would cause the correct points to have a low response. This was not ideal, hence the inversion.
Running the convolution function after the inversion yielded:
This made a lot more sense, and the areas where quarternotes actually were yielded very high values. Running a non-maximal suppression algorithm would be a good following step.
We then tried this again using a treble clef template:
Resulting in:
While it "somewhat" recognized the treble clef in some completely wrong spots, what matter is that it very strongly recognized it in the approximate spot. The great thing about clefs is that you don't need to spot them at an exact location, as long as they can be identified with a certain staff.
Then we tried the bass clef:
While the values were strong where the bass clef actually is, it's stronger where the TREBLE clef is. This sort of thing, along with other thus far untested symbols will require more training and template averaging in the future.
Ok, so convolution works for the most part. However, all the tests thus far were using templates created by MuseScore, on a toy example created by MuseScore as well. So we went back to Twinkle, Twinkle Little Star, something we pulled off the 'net.
We ran the Fujinaga staff removal algorithm on Twinkle solely to obtain the staffline height, in pixels: 19. Our templates are based on a 22-pixel staffline height. So we scaled our Twinkle sheet music by 22/19 and ran the convolution algorithm to obtain:
So far the results aren't bad for quarternotes, but much more testing will be needed for other music scores and symbols.
Then using a quarternote template:
We don't have a clue what caused the visual doppler effect, but we had the idea to invert both the template and the toy image so that the highest response would correspond to the correct points in the image. A non inverted image has (255,255) for white and (0,0) for black, which would cause the correct points to have a low response. This was not ideal, hence the inversion.
Running the convolution function after the inversion yielded:
This made a lot more sense, and the areas where quarternotes actually were yielded very high values. Running a non-maximal suppression algorithm would be a good following step.
We then tried this again using a treble clef template:
Resulting in:
While it "somewhat" recognized the treble clef in some completely wrong spots, what matter is that it very strongly recognized it in the approximate spot. The great thing about clefs is that you don't need to spot them at an exact location, as long as they can be identified with a certain staff.
Then we tried the bass clef:
While the values were strong where the bass clef actually is, it's stronger where the TREBLE clef is. This sort of thing, along with other thus far untested symbols will require more training and template averaging in the future.
Ok, so convolution works for the most part. However, all the tests thus far were using templates created by MuseScore, on a toy example created by MuseScore as well. So we went back to Twinkle, Twinkle Little Star, something we pulled off the 'net.
We ran the Fujinaga staff removal algorithm on Twinkle solely to obtain the staffline height, in pixels: 19. Our templates are based on a 22-pixel staffline height. So we scaled our Twinkle sheet music by 22/19 and ran the convolution algorithm to obtain:
So far the results aren't bad for quarternotes, but much more testing will be needed for other music scores and symbols.
MuseScore for bootstrap template creation
For the purposes of bootstrap template creation, MuseScore is pretty cool. With the consideration of staff lines, it would be maddening to search the internet for music sheets to create a comprehensive starting template set. A good suggestion from a previous session was the idea of creating sheet music myself to generate the desired templates: notes in various places in the staff, under even obscure circumstances.
I created a toy sheet of music and placed notes, rests in various places: low and high in the staff, to accommodate all the possible places symbols can be found in a sheet of music. The resulting template set was not exhaustive, but will do for now, as it contains the most common notes, rests, and clefs.
After a quick slice and dice on Photoshop, involving some heavy eyeballing, I boxed out the templates, making the box sizes as small as I possibly could.
I tried to make the template sizes consistent among symbol types, quarter notes and half notes being 32 x 22. However, there is more variety of sizes within our rest symbols. As you can see, a quarter rest (24 x 60 in the set) is larger than a eighth rest (22 x 40)
A problem I foresee with our set right now is that at least with handwritten scores, ignoring the stems (which I did for quarter and half note templates), half notes can look very much like whole notes. I left a little bit of the stem in the templates, but it might not be enough.
This bootstrap set is by no means final; I foresee a lot of resizing in the future. It depends on how our template matching algorithm goes.
24.4.11
Staff removal and template matching
Tim and I took a step back to see what is ahead and do a little bit of planning. We decided to use template matching to recognize the symbols. These are the major tasks we need to do next:
Here is overview of the algorithms after some testing with the sample set, which consists of super high quality (I'm talking 10+ MB per file) png's which were created when Tim took photographs of music sheets with his fancy camera.
- Create a bootstrap template library (I am currently working on this - will henceforth refer to this as BTL: Bootstrap Template Library, and while we're on it, BTC : Bootstrap Template Creation)
- Create a template matcher (Tim is currently working on this)
- Create a training program which will add to the bootstrap template library
- ???
-Profit
Backtrack a little: so far we've dealt with staff removal, a task which may or may not help with character recognition. It does more harm than good if our staff removal algorithm mangles symbols that overlap with the staves. No staff removal algorithm is better than a bad one, BUT no staff removal algorithm means more tedious BTC. Creating and using template matcher with a dataset that has staff lines means we would have to account all ways staff lines could pass through a symbol: a near-exponential blowup of the number of templates.
Luckily for us, we found a really cool Gamera library called musicStaves which contains functions that implement some of the current best staff removal algorithms. That thing is Tim and I have heard of all these algorithms in various papers we read about staff removal. It's really cool to actually see them in action.
Here is overview of the algorithms after some testing with the sample set, which consists of super high quality (I'm talking 10+ MB per file) png's which were created when Tim took photographs of music sheets with his fancy camera.
Linetracking - the worst. Makes sense, since the algorithm is a very simple one
Fujinaga - really good, staves were removed cleanly, for the most part
Carter - also really good, not that much different than Fujinaga
Roach-Tatem - this one actually caused the Gamera GUI to crash, resulting in a bunch of errors. Never using this again.
Skeleton - this one was also good, very similar to Fujinaga and Carter.
It seems like Fujinaga, Carter, and Skeleton are the best. I stored sample results with Fujinaga, just because the name is cooler.
See for yourself in this sample result.
Also, another sample result.
As you can see, the results are pretty good, but fails to remove staves in seemingly random locations. This might be due to Tim's lack of photography skills; nevertheless, we still need to account for the fact that the staff removal algorithms are not perfect and will fail on some, albeit rare cases.
See for yourself in this sample result.
Also, another sample result.
As you can see, the results are pretty good, but fails to remove staves in seemingly random locations. This might be due to Tim's lack of photography skills; nevertheless, we still need to account for the fact that the staff removal algorithms are not perfect and will fail on some, albeit rare cases.
This presents an annoying problem for templating: a very good staff removal algorithm, apparently, is not good enough to control the size of our BTL. We can't neglect the potential areas of the music sheet where the staff line removal algorithm failed.
I guess BTC will have to include stafflines. That's something I personally don't look forward to. BTC is not difficult, just tedious and highly repetitive. It will look something like this:
mFile = open(musicfile)
while(true){
copyRegion(mFile);
pasteRegion(new ImageFile() );
cry();
}
Now if only this function really existed.
16.4.11
Small Update, Staff Removal
Tested a toolkit for Gamera that does stave removal called MusicStaves. I checked it out using CVS and compiled it on my computer. Before, I was using a primitive runlength algorithm coded by myself to remove staves. Here is a comparison of it against one of MusicStaves' algorithms:
My Algorithm
MusicStaves
As you can see, my algorithm produces a lot more noise. Probably should rerun some of the previous tests to see if they will be affected.
My Algorithm
MusicStaves
As you can see, my algorithm produces a lot more noise. Probably should rerun some of the previous tests to see if they will be affected.
Subscribe to:
Posts (Atom)