UCSD CSE 190: Projects in Vision and Learning: 04.11

Tim and I took a step back to see what is ahead and do a little bit of planning. We decided to use template matching to recognize the symbols. These are the major tasks we need to do next:

- Create a bootstrap template library (I am currently working on this - will henceforth refer to this as BTL: Bootstrap Template Library, and while we're on it, BTC : Bootstrap Template Creation)

- Create a template matcher (Tim is currently working on this)

- Create a training program which will add to the bootstrap template library

- ???

-Profit

Backtrack a little: so far we've dealt with staff removal, a task which may or may not help with character recognition. It does more harm than good if our staff removal algorithm mangles symbols that overlap with the staves. No staff removal algorithm is better than a bad one, BUT no staff removal algorithm means more tedious BTC. Creating and using template matcher with a dataset that has staff lines means we would have to account all ways staff lines could pass through a symbol: a near-exponential blowup of the number of templates.

Luckily for us, we found a really cool Gamera library called musicStaves which contains functions that implement some of the current best staff removal algorithms. That thing is Tim and I have heard of all these algorithms in various papers we read about staff removal. It's really cool to actually see them in action.

Here is overview of the algorithms after some testing with the sample set, which consists of super high quality (I'm talking 10+ MB per file) png's which were created when Tim took photographs of music sheets with his fancy camera.

Linetracking - the worst. Makes sense, since the algorithm is a very simple one

Fujinaga - really good, staves were removed cleanly, for the most part

Carter - also really good, not that much different than Fujinaga

Roach-Tatem - this one actually caused the Gamera GUI to crash, resulting in a bunch of errors. Never using this again.

Skeleton - this one was also good, very similar to Fujinaga and Carter.

It seems like Fujinaga, Carter, and Skeleton are the best. I stored sample results with Fujinaga, just because the name is cooler.

See for yourself in this sample result.

Also, another sample result.

As you can see, the results are pretty good, but fails to remove staves in seemingly random locations. This might be due to Tim's lack of photography skills; nevertheless, we still need to account for the fact that the staff removal algorithms are not perfect and will fail on some, albeit rare cases.

This presents an annoying problem for templating: a very good staff removal algorithm, apparently, is not good enough to control the size of our BTL. We can't neglect the potential areas of the music sheet where the staff line removal algorithm failed.

I guess BTC will have to include stafflines. That's something I personally don't look forward to. BTC is not difficult, just tedious and highly repetitive. It will look something like this:

mFile = open(musicfile)

while(true){

copyRegion(mFile);

pasteRegion(new ImageFile() );

cry();

}

Now if only this function really existed.

We have started building our training set. However, we are running into some difficulties on the way. One of the main problems is that the connected component analysis did not work as well as we hoped it would. Some of the note symbols would be stuck together, while others would be much too segmented. Nevertheless, we still tried labeling some of these symbols and running a k-NN algorithm on the rest of the unlabeled symbols and seeing what kind of results we would get. The results were not too bad; it was not complete noise, but it was definitely not as accurate as it could be. To improve the results, one thing we can try is increasing the amount of manually labeled symbols included in the training data.

UCSD CSE 190: Projects in Vision and Learning

24.4.11

Staff removal and template matching

16.4.11

Small Update, Staff Removal

13.4.11

Classification and its difficulties

Contributors

Blog Archive

About