Today I spoke to my advisor lecturer about my final project. It's called POME (in case you don't remember you can check
my previous post).
In my campus there's a rule that the student need to see the advisors at least 6 times per semester. I haven't seen him for a couple of month, so.. yeah, my final project also got stuck. By the way I have two advisor, but I haven't got time to see the second ones.
Before I saw my advisor I had prepared my questions correlated about my project. We talked a bit, and here's the resume.
First, I misunderstood about erosion and dilation. These two processes is to do morphological change. At first, I thought this process expand every pixel in structuring element. But I was wrong. I rechecked the sample at HIPR2. It's only the center of pixel affected.
Here's an example. Assume there's a 5x5 binary image like this (0 is background and 1 is foreground)
00000
00100
00000
00000
00100
and we have a structuring element for dilation of 3x3
111
111
111
At first I thought the image is going to be like this
11111
11111
11111
11111
11111
Why? Because I change every 3x3 pixel around the foreground's. But, the right answer is this
01110
01110
01110
01110
01110
I reported my finding to my advisor and he told me to rewrite my second chapter of my paper.
Second, I rethought for data decode and need to learn scaling. For this project at first I have to create a trained network for my classifier so I need to do training. The input is every correct input letters. I thought it's going to be more convenient to decode those extracted training forms to XML data. Because I need to input the same amount of pixels for every data but the data itself can be different for each extraction so I have to scale them appropriately.
My advisor told me to use the available library out there for scaling. He said scaling can be quite complex. But... I don't know, I thought a simpler scaling because it's binary image. I had read about bilinear interpolation and I think it's quite simple for 2D image. The concept is the original pixels spread out then a structuring element count the gray value.
Err.. that's convenient for shade of grays, maybe I'll find another simpler method for binary images.
I'll try to learn it a little bit and do some experimenting to use it. If it's quite complex (it took me 5 days to develop it, haha) then it's better for me to use image processing library, even though that's rather overkill.
Third, I made up my finding pattern. I made my paper forms. It's consist of white boxes for write text and black boxes for my finding pattern algorithm. I made it up and have no reference. Then I thought it will be great to use QR Code method to find the mark.
Fourth, feature extraction of text.
It's quite long story.
At first, I thought Sobel Edge Detector is great to extract the feature of the text, but then I realize it's no need because I had thresholded it. Sobel Edge Detector maybe will be more useful for noisy and many object on image, not like my paper forms which is quite obvious to remove the background colors.
Then I found thinning from my elder alumni's paper. His method is quite sophisticated. He detect japanese characters. A letter can be 3 or 5 pixel width. He then do "thinning" to it, find its intersection, and count the strokes. Really cool, but complicated. I don't want to use the intersection and stroke counting, but I like the way he do thinning.
Then I tried the thinning method from HIPR2 (which I think still bad, maybe my algorithm is wrong) but I found the image looks so strange. The N letter become looks like W after it's thinned. I'm not sure thinning is good, but maybe it's just because my thinning algorithm.
There's also a paper from Farhad Soleimanian, he put the letter's pixel directly to its neural network.
So what should I use? Directly put it to neural network, thinning method, or Sobel edge detector? My advisor told me to STOP TINKERING IMAGE PROCESSING FOR FEATURE EXTRACTION and do the simplest thing first. Use the "direct input" and do the training to see how much the classification correctness.
What's next? I think I have to do these
- Implement "Finding Pattern" method in extractor app
- Implement "segmentation" in extractor app
- Implement "export form to XML" in extractor app
- Implement "import form XML" in training app
- Implement "Neural Network Training" in training app
- Implement "export network XML" in training app
- Implement "import network XML" in testing app
- Implement "testing" in testing app
- Try each "direct input", Sobel edge detector, and thinning method then count how big are the correctness
- Recheck thinning method, because my algorithm output differ with HIPR2's images
Long post eh? That's my documentation about my paper progress. It's become more and more exciting. Wish me luck then!
Bonus: Here's how I am making it
P.S: Excuse my
Engrish though, it's one of my program to learn english. So grammar nazis out there, have yourself tons of corrections