We used the gamera function convolve with the bootstrap template set and got some very strange results initially. We used a piece of sheet music generated by MuseScore as a toy example:
Then using a quarternote template:
We don't have a clue what caused the visual doppler effect, but we had the idea to invert both the template and the toy image so that the highest response would correspond to the correct points in the image. A non inverted image has (255,255) for white and (0,0) for black, which would cause the correct points to have a low response. This was not ideal, hence the inversion.
Running the convolution function after the inversion yielded:
This made a lot more sense, and the areas where quarternotes actually were yielded very high values. Running a non-maximal suppression algorithm would be a good following step.
We then tried this again using a treble clef template:
Resulting in:
While it "somewhat" recognized the treble clef in some completely wrong spots, what matter is that it very strongly recognized it in the approximate spot. The great thing about clefs is that you don't need to spot them at an exact location, as long as they can be identified with a certain staff.
Then we tried the bass clef:
While the values were strong where the bass clef actually is, it's stronger where the TREBLE clef is. This sort of thing, along with other thus far untested symbols will require more training and template averaging in the future.
Ok, so convolution works for the most part. However, all the tests thus far were using templates created by MuseScore, on a toy example created by MuseScore as well. So we went back to Twinkle, Twinkle Little Star, something we pulled off the 'net.
We ran the Fujinaga staff removal algorithm on Twinkle solely to obtain the staffline height, in pixels: 19. Our templates are based on a 22-pixel staffline height. So we scaled our Twinkle sheet music by 22/19 and ran the convolution algorithm to obtain:
So far the results aren't bad for quarternotes, but much more testing will be needed for other music scores and symbols.
You should try normalized cross correlation (NCC). This can be implemented using a couple passes of the convolution function. If you need a reference on it let me know.
ReplyDelete