UCSD CSE 190: Projects in Vision and Learning: 03.11

A discovery that changed everything for us was Gamera, a very powerful computer vision framework that works specifically well with documents (not to be confused with this). Its library functions are written in C++ but a Python wrapper is available. Perfect. We were seriously considering Open CV in the beginning due to its popularity, but Gamera seems to cater to our needs better since we will be dealing with music documents.

We also decided on this repository/version control system. I has two levels of committing: committing file(s) to a local directory and then pushing those changes to be stored on the website. We quickly learned the appropriate commands and tested the system.

Back to Gamera: So how exactly did Gamera "change everything"?

1) From the get-go, we wanted to code in Python, since we really liked its ease of use and terseness, and for this project, we could get away with its relatively slow speed. Our preference, in order, was Python, Java, C++. OpenCV was pretty easy to set up with C++ and Java. There was a wrapper for Python, but it was not easy to figure out. Gamera is not too difficult to set up and use (unless you're using Windows).

2) Gamera comes prepackaged with a pretty well-designed interpreter/GUI which does all the regular interpreter tasks but also has an intuitive and easy to use interface for viewing and storing image objects dynamically. The interpreter also boasts Eclipse-like features such as auto-suggest and pulling up brief function documentation on the fly.

2) It turns out that Gamera has pretty good documentation. The Gamera website not only has documentation for its library functions, but also tutorials for carrying out AI tasks. For example, there is a tutorial for building machine learning training sets using the GUI.

Coming out with some concrete results toward our goal was relatively painless after we learned how to use our resources. We wrote a cute little function that, more or less takes care the first step of our project: staff removal. Turns out Gamera has a function called filter_wide_runs which removes horizontal runs longer than a specified length. Go figure. I say more or less because this function does not address the common problem of staff removal: parts of symbols that overlap with the staff lines are also removed. For more complicated sheets of music this would be a major problem, but for something trivial like Twinkle,Twinkle, Little Star we think it would be sufficient. Using this function purely without any cleanup is a good first step. As you can see, the result isn't too badly mangled:

Our next step would be segmenting and classifying symbols. Remember that training tutorial I mentioned earlier? That came in handy for this. Tim and I started playing around with the GUI's classifying tool to figure out how it can benefit us and I can certainly say that it is very powerful; it even has many options for the image-splitting algorithm used to get the training segments. This is the training in a nutshell: the user draws a bounding box around part of the image they want to train, and the GUI finds image segments that are part of the selection and chooses the symbol it think the user just selected. The user can then correct the GUI if it is wrong or it can create a new symbol name for the selection. For more details, refer to the link to the tutorial.

I can safely say we barely scratched the surface of what Gamera can offer, in terms of its library functions and also its GUI features. More research and testing needs to be done before we can come up with some results for the recognition component of our project, and also refinements for the staff removal.

So, until then, faithful blog readers.

This blog will track the progress of Tim Kang and Michael Perry's CSE 190 project. Our project is on applying OCR to sheet music (project proposal here). Our goal is to eventually be able to read handwritten or badly deformed printed sheets, but we will start with well-formed printed sheets in order to get everything working. For this project, we are going to stick with sheet music written in normal western notation; none of that hokey stuff.

For the people with less experience, Music OCR consists of several steps:
-- preprocessing -> staff recognition/removal -> symbol segmentation/classification
after of which the sheet music can be reconstructed in a format such as MusicXML or MIDI and read by a scorewriter program such as the commercial Finale or the open-source MuseScore.

UCSD CSE 190: Projects in Vision and Learning

30.3.11

Go Gamera!

28.3.11

Set up repository

20.3.11

Hello World

Contributors

Blog Archive

About