24.4.11

Staff removal and template matching

Tim and I took a step back to see what is ahead and do a little bit of planning. We decided to use template matching to recognize the symbols. These are the major tasks we need to do next:

- Create a bootstrap template library (I am currently working on this - will henceforth refer to this as BTL: Bootstrap Template Library, and while we're on it, BTC : Bootstrap Template Creation)
- Create a template matcher (Tim is currently working on this)
- Create a training program which will add to the bootstrap template library
- ???
-Profit

Backtrack a little: so far we've dealt with staff removal, a task which may or may not help with character recognition. It does more harm than good if our staff removal algorithm mangles symbols that overlap with the staves. No staff removal algorithm is better than a bad one, BUT no staff removal algorithm means more tedious BTC. Creating and using template matcher with a dataset that has staff lines means we would have to account all ways staff lines could pass through a symbol: a near-exponential blowup of the number of templates.

Luckily for us, we found a really cool Gamera library called musicStaves which contains functions that implement some of the current best staff removal algorithms. That thing is Tim and I have heard of all these algorithms in various papers we read about staff removal. It's really cool to actually see them in action.

Here is overview of the algorithms after some testing with the sample set, which consists of super high quality (I'm talking 10+ MB per file) png's which were created when Tim took photographs of music sheets with his fancy camera.

Linetracking - the worst. Makes sense, since the algorithm is a very simple one
Fujinaga - really good, staves were removed cleanly, for the most part
Carter - also really good, not that much different than Fujinaga
Roach-Tatem - this one actually caused the Gamera GUI to crash, resulting in a bunch of errors. Never using this again.
Skeleton - this one was also good, very similar to Fujinaga and Carter.

It seems like Fujinaga, Carter, and Skeleton are the best. I stored sample results with Fujinaga, just because the name is cooler.

See for yourself in this sample result.

Also, another sample result.

As you can see, the results are pretty good, but fails to remove staves in seemingly random locations. This might be due to Tim's lack of photography skills; nevertheless, we still need to account for the fact that the staff removal algorithms are not perfect and will fail on some, albeit rare cases.

This presents an annoying problem for templating: a very good staff removal algorithm, apparently, is not good enough to control the size of our BTL. We can't neglect the potential areas of the music sheet where the staff line removal algorithm failed.

I guess BTC will have to include stafflines. That's something I personally don't look forward to. BTC is not difficult, just tedious and highly repetitive. It will look something like this:

mFile = open(musicfile)
while(true){
copyRegion(mFile);
pasteRegion(new ImageFile() );
cry();
}

Now if only this function really existed.

16.4.11

Small Update, Staff Removal

Tested a toolkit for Gamera that does stave removal called MusicStaves.  I checked it out using CVS and compiled it on my computer.  Before, I was using a primitive runlength algorithm coded by myself to remove staves.  Here is a comparison of it against one of MusicStaves' algorithms:

My Algorithm
MusicStaves

As you can see, my algorithm produces a lot more noise.  Probably should rerun some of the previous tests to see if they will be affected.

13.4.11

Classification and its difficulties

We have started building our training set.  However, we are running into some difficulties on the way.  One of the main problems is that the connected component analysis did not work as well as we hoped it would.  Some of the note symbols would be stuck together, while others would be much too segmented.  Nevertheless, we still tried labeling some of these symbols and running a k-NN algorithm on the rest of the unlabeled symbols and seeing what kind of results we would get.  The results were not too bad; it was not complete noise, but it was definitely not as accurate as it could be.  To improve the results, one thing we can try is increasing the amount of manually labeled symbols included in the training data. 

30.3.11

Go Gamera!

A discovery that changed everything for us was Gamera, a very powerful computer vision framework that works specifically well with documents (not to be confused with this). Its library functions are written in C++ but a Python wrapper is available. Perfect. We were seriously considering Open CV in the beginning due to its popularity, but Gamera seems to cater to our needs better since we will be dealing with music documents.

We also decided on this repository/version control system. I has two levels of committing: committing file(s) to a local directory and then pushing those changes to be stored on the website. We quickly learned the appropriate commands and tested the system.

Back to Gamera: So how exactly did Gamera "change everything"?

1) From the get-go, we wanted to code in Python, since we really liked its ease of use and terseness, and for this project, we could get away with its relatively slow speed. Our preference, in order, was Python, Java, C++. OpenCV was pretty easy to set up with C++ and Java. There was a wrapper for Python, but it was not easy to figure out. Gamera is not too difficult to set up and use (unless you're using Windows).

2) Gamera comes prepackaged with a pretty well-designed interpreter/GUI which does all the regular interpreter tasks but also has an intuitive and easy to use interface for viewing and storing image objects dynamically. The interpreter also boasts Eclipse-like features such as auto-suggest and pulling up brief function documentation on the fly.

2) It turns out that Gamera has pretty good documentation. The Gamera website not only has documentation for its library functions, but also tutorials for carrying out AI tasks. For example, there is a tutorial for building machine learning training sets using the GUI.

Coming out with some concrete results toward our goal was relatively painless after we learned how to use our resources. We wrote a cute little function that, more or less takes care the first step of our project: staff removal. Turns out Gamera has a function called filter_wide_runs which removes horizontal runs longer than a specified length. Go figure. I say more or less because this function does not address the common problem of staff removal: parts of symbols that overlap with the staff lines are also removed. For more complicated sheets of music this would be a major problem, but for something trivial like Twinkle,Twinkle, Little Star we think it would be sufficient. Using this function purely without any cleanup is a good first step. As you can see, the result isn't too badly mangled:



Image Hosted by ImageShack.us

Our next step would be segmenting and classifying symbols. Remember that training tutorial I mentioned earlier? That came in handy for this. Tim and I started playing around with the GUI's classifying tool to figure out how it can benefit us and I can certainly say that it is very powerful; it even has many options for the image-splitting algorithm used to get the training segments. This is the training in a nutshell: the user draws a bounding box around part of the image they want to train, and the GUI finds image segments that are part of the selection and chooses the symbol it think the user just selected. The user can then correct the GUI if it is wrong or it can create a new symbol name for the selection. For more details, refer to the link to the tutorial.

I can safely say we barely scratched the surface of what Gamera can offer, in terms of its library functions and also its GUI features. More research and testing needs to be done before we can come up with some results for the recognition component of our project, and also refinements for the staff removal.

So, until then, faithful blog readers.

28.3.11

Set up repository

We set up a Hg repository today for the project at bitbucket.org.  Michael and I looked around the web for Python wrappers for OpenCV, but the ones that we found were either deprecated, poorly documented, or both.  I installed a Java wrapper for OpenCV called JavaCV on my laptop.  It appears to work properly and that may be what we use.

We also looked a little bit into the algorithms.  For music OCR, one of the most important steps is staff recognition and removal.  We read about a few common algorithms used to accomplish this task, but we are still undecided on which one to use.  Another important step for music OCR is symbol segmentation.  This comes after staff recognition and removal; after reading a paper, we have picked the k-NN technique for its simplicity and effectiveness.

20.3.11

Hello World

This blog will track the progress of Tim Kang and Michael Perry's CSE 190 project.  Our project is on applying OCR to sheet music (project proposal here).  Our goal is to eventually be able to read handwritten or badly deformed printed sheets, but we will start with well-formed printed sheets in order to get everything working.  For this project, we are going to stick with sheet music written in normal western notation; none of that hokey stuff.

For the people with less experience, Music OCR consists of several steps:
-- preprocessing ->  staff recognition/removal -> symbol segmentation/classification
after of which the sheet music can be reconstructed in a format such as MusicXML or MIDI and read by a scorewriter program such as the commercial Finale or the open-source MuseScore.