Machine learning / vision fundis

lkpat

Executive Member
Joined
Apr 9, 2023
Messages
7,139
Reaction score
5,965
Location
In your head
If I wanted to perform ICR on text within predictable regions on paper - e.g. a form, what techniques do I look at for:
1) Form region detection - i.e. the business of identifying the (predictable) areas of the form where handwritten text will be found
2) ICR - for the most part predictable text from a limited set of options depending on the section.... e.g. one section may contain only numbers, another may contain a limited set of text responses, etc.

Thanks in advance
 
It really depends on what you're capturing. A page from a book might have no setup, but a question/answer sheet will have rectangle areas where for example the answer to question 1 would be, etc.
 
I’ve played a bit with AWS Textract. It was really good straight out the box.
 
It really depends on what you're capturing. A page from a book might have no setup, but a question/answer sheet will have rectangle areas where for example the answer to question 1 would be, etc.
As per 1) it's going to be structured forms, so predictable to some degree.
 
I have a similar query to 2. I have a region of interest and have about 2000 items (supplier names). Using Tesseract gives me starting results but it is not perfect. I am trying to figure out how to match the results (supplier names) to an item in the list of 2000. Sort of spellcheck but for phrases.

I never did spend much time on regexp so that is a mystery to me.
 
Yes, I'm angling towards tensoedlow. Anyone know the techniques to consider for recognising regions on a form?
Techniques? Do you mean what type of neural network to train? CNN would be a good one.
 
I have a similar query to 2. I have a region of interest and have about 2000 items (supplier names). Using Tesseract gives me starting results but it is not perfect. I am trying to figure out how to match the results (supplier names) to an item in the list of 2000. Sort of spellcheck but for phrases.

I never did spend much time on regexp so that is a mystery to me.
Levenshtein distance.
 
If you have the original form and a scanned form subtract one (binary image) from another? Not perfect but if you able to scale and align the two (using line detection and keyword ocr on a region to align transform the skew) you may be able to get a noisy difference and the busy areas should be considered the regions with handwritten (or filled) data? And ICR those detected regions on the original scan?

Easier said than done I guess. I would try something like this https://stackoverflow.com/questions...-in-an-image-while-keeping-text-programmatica
 
Why not hard-code the regions?
Because scanned documents can be skewed. So you’ll need to first apply some vision algo to to determine how much to skew it and then apply the recognition to determine what’s written.

We did this many years ago with application forms, we kept training our model on the horrific hand writing we were receiving.
 
Because scanned documents can be skewed. So you’ll need to first apply some vision algo to to determine how much to skew it and then apply the recognition to determine what’s written.

We did this many years ago with application forms, we kept training our model on the horrific hand writing we were receiving.

All scanned documents will come out differently with scale, skew, rotation, color etc. Step 1 is to normalize all of that.

I know there are techniques - visual cues to identify the boundaries of unique regions. I just need to find the name again. I need the app to be flexible.

You can make regions inside the document, e.g. a block with a unique 2d barcode or whatever in the corner. But that's a lot more AI work than having pre-defined rectangle regions that you specify once off.
 
All scanned documents will come out differently with scale, skew, rotation, color etc. Step 1 is to normalize all of that.



You can make regions inside the document, e.g. a block with a unique 2d barcode or whatever in the corner. But that's a lot more AI work than having pre-defined rectangle regions that you specify once off.
No worries with specifying the regions, I'd rather have that user defined "once off" so the app can be adapted.
 
Levenshtein distance.
I have only spent a few minutes looking at that, if I have a single line OCR result like :
T oOLGATE-PALMOLIVE (PTY) LTE
and I have a dictionary of customer names, which include COLGATE-PALMOLIVE PTY LTD, I should run through each customer name against the OCR result and score the Levenshtein distance of each to match the correct entry? I will investigate the Levenshtein distance further, but am asking for a confirmation that this is indeed the track I need to pursue based on my example.

If my dictionary has a n of 2000, I could pick a couple of random 3 letter sequences from the OCR result ( example PAL, or GAT), and use regex to filter down the dictionary. Although I note a poor OCR may exclude the target dictionary phrase.
 
see if you can get a license for Alteryx for education purposes. It has additional tools for DS that you can use to achieve what you trying to do.
 
Top
Sign up to the MyBroadband newsletter
X