As of now, our process looks somewhat like this:
1. Get picture of score
2. Invert picture of score
3. Convert inverted picture to Float image in Gamera
4. Convolve inverted Float image with template in Gamera
5. Threshold result of convolution to get locations of template
6. Use locations of templates combined with locations of staff lines to determine locations of notes
We have implemented steps 1 through 5. We have been able to threshold some of the images manually, but have not decided on a good algorithm yet to do it automatically. Step 6 will be worked on after we figure out thresholding.
(Above: an image produced by manually thresholding)
15.5.11
9.5.11
Convolution
We used the gamera function convolve with the bootstrap template set and got some very strange results initially. We used a piece of sheet music generated by MuseScore as a toy example:
Then using a quarternote template:
We don't have a clue what caused the visual doppler effect, but we had the idea to invert both the template and the toy image so that the highest response would correspond to the correct points in the image. A non inverted image has (255,255) for white and (0,0) for black, which would cause the correct points to have a low response. This was not ideal, hence the inversion.
Running the convolution function after the inversion yielded:
This made a lot more sense, and the areas where quarternotes actually were yielded very high values. Running a non-maximal suppression algorithm would be a good following step.
We then tried this again using a treble clef template:
Resulting in:
While it "somewhat" recognized the treble clef in some completely wrong spots, what matter is that it very strongly recognized it in the approximate spot. The great thing about clefs is that you don't need to spot them at an exact location, as long as they can be identified with a certain staff.
Then we tried the bass clef:
While the values were strong where the bass clef actually is, it's stronger where the TREBLE clef is. This sort of thing, along with other thus far untested symbols will require more training and template averaging in the future.
Ok, so convolution works for the most part. However, all the tests thus far were using templates created by MuseScore, on a toy example created by MuseScore as well. So we went back to Twinkle, Twinkle Little Star, something we pulled off the 'net.
We ran the Fujinaga staff removal algorithm on Twinkle solely to obtain the staffline height, in pixels: 19. Our templates are based on a 22-pixel staffline height. So we scaled our Twinkle sheet music by 22/19 and ran the convolution algorithm to obtain:
So far the results aren't bad for quarternotes, but much more testing will be needed for other music scores and symbols.
Then using a quarternote template:
We don't have a clue what caused the visual doppler effect, but we had the idea to invert both the template and the toy image so that the highest response would correspond to the correct points in the image. A non inverted image has (255,255) for white and (0,0) for black, which would cause the correct points to have a low response. This was not ideal, hence the inversion.
Running the convolution function after the inversion yielded:
This made a lot more sense, and the areas where quarternotes actually were yielded very high values. Running a non-maximal suppression algorithm would be a good following step.
We then tried this again using a treble clef template:
Resulting in:
While it "somewhat" recognized the treble clef in some completely wrong spots, what matter is that it very strongly recognized it in the approximate spot. The great thing about clefs is that you don't need to spot them at an exact location, as long as they can be identified with a certain staff.
Then we tried the bass clef:
While the values were strong where the bass clef actually is, it's stronger where the TREBLE clef is. This sort of thing, along with other thus far untested symbols will require more training and template averaging in the future.
Ok, so convolution works for the most part. However, all the tests thus far were using templates created by MuseScore, on a toy example created by MuseScore as well. So we went back to Twinkle, Twinkle Little Star, something we pulled off the 'net.
We ran the Fujinaga staff removal algorithm on Twinkle solely to obtain the staffline height, in pixels: 19. Our templates are based on a 22-pixel staffline height. So we scaled our Twinkle sheet music by 22/19 and ran the convolution algorithm to obtain:
So far the results aren't bad for quarternotes, but much more testing will be needed for other music scores and symbols.
MuseScore for bootstrap template creation
For the purposes of bootstrap template creation, MuseScore is pretty cool. With the consideration of staff lines, it would be maddening to search the internet for music sheets to create a comprehensive starting template set. A good suggestion from a previous session was the idea of creating sheet music myself to generate the desired templates: notes in various places in the staff, under even obscure circumstances.
I created a toy sheet of music and placed notes, rests in various places: low and high in the staff, to accommodate all the possible places symbols can be found in a sheet of music. The resulting template set was not exhaustive, but will do for now, as it contains the most common notes, rests, and clefs.
After a quick slice and dice on Photoshop, involving some heavy eyeballing, I boxed out the templates, making the box sizes as small as I possibly could.
I tried to make the template sizes consistent among symbol types, quarter notes and half notes being 32 x 22. However, there is more variety of sizes within our rest symbols. As you can see, a quarter rest (24 x 60 in the set) is larger than a eighth rest (22 x 40)
A problem I foresee with our set right now is that at least with handwritten scores, ignoring the stems (which I did for quarter and half note templates), half notes can look very much like whole notes. I left a little bit of the stem in the templates, but it might not be enough.
This bootstrap set is by no means final; I foresee a lot of resizing in the future. It depends on how our template matching algorithm goes.
Subscribe to:
Posts (Atom)