Machine learning / vision fundis

lkpat · Sep 21, 2023

If I wanted to perform ICR on text within predictable regions on paper - e.g. a form, what techniques do I look at for:
1) Form region detection - i.e. the business of identifying the (predictable) areas of the form where handwritten text will be found
2) ICR - for the most part predictable text from a limited set of options depending on the section.... e.g. one section may contain only numbers, another may contain a limited set of text responses, etc.

Thanks in advance

animal531 · Sep 21, 2023

It really depends on what you're capturing. A page from a book might have no setup, but a question/answer sheet will have rectangle areas where for example the answer to question 1 would be, etc.

_kabal_ · Sep 21, 2023

I’ve played a bit with AWS Textract. It was really good straight out the box.

lkpat · Sep 22, 2023

One more requirement is this needs to run offline.

lkpat · Sep 22, 2023

animal531 said:
It really depends on what you're capturing. A page from a book might have no setup, but a question/answer sheet will have rectangle areas where for example the answer to question 1 would be, etc.

As per 1) it's going to be structured forms, so predictable to some degree.

Vis1/0N · Sep 22, 2023

I have a similar query to 2. I have a region of interest and have about 2000 items (supplier names). Using Tesseract gives me starting results but it is not perfect. I am trying to figure out how to match the results (supplier names) to an item in the list of 2000. Sort of spellcheck but for phrases.

I never did spend much time on regexp so that is a mystery to me.

semaphore · Sep 22, 2023

lkpat said:
One more requirement is this needs to run offline.

Tensorflow and build a training set.

lkpat · Sep 22, 2023

semaphore said:
Tensorflow and build a training set.

Yes, I'm angling towards tensorflow. Anyone know the techniques to consider for recognising regions on a form?

semaphore · Sep 22, 2023

lkpat said:
Yes, I'm angling towards tensoedlow. Anyone know the techniques to consider for recognising regions on a form?

Techniques? Do you mean what type of neural network to train? CNN would be a good one.

semaphore · Sep 22, 2023

Vis1/0N said:
I have a similar query to 2. I have a region of interest and have about 2000 items (supplier names). Using Tesseract gives me starting results but it is not perfect. I am trying to figure out how to match the results (supplier names) to an item in the list of 2000. Sort of spellcheck but for phrases.

I never did spend much time on regexp so that is a mystery to me.

Levenshtein distance.

animal531 · Sep 22, 2023

lkpat said:
Yes, I'm angling towards tensoedlow. Anyone know the techniques to consider for recognising regions on a form?

Why not hard-code the regions?

Vis1/0N · Sep 22, 2023

If you have the original form and a scanned form subtract one (binary image) from another? Not perfect but if you able to scale and align the two (using line detection and keyword ocr on a region to align transform the skew) you may be able to get a noisy difference and the busy areas should be considered the regions with handwritten (or filled) data? And ICR those detected regions on the original scan?

Easier said than done I guess. I would try something like this https://stackoverflow.com/questions...-in-an-image-while-keeping-text-programmatica

semaphore · Sep 22, 2023

animal531 said:
Why not hard-code the regions?

Because scanned documents can be skewed. So you’ll need to first apply some vision algo to to determine how much to skew it and then apply the recognition to determine what’s written.

We did this many years ago with application forms, we kept training our model on the horrific hand writing we were receiving.

lkpat · Sep 22, 2023

animal531 said:
Why not hard-code the regions?

I know there are techniques - visual cues to identify the boundaries of unique regions. I just need to find the name again. I need the app to be flexible.

animal531 · Sep 23, 2023

semaphore said:
Because scanned documents can be skewed. So you’ll need to first apply some vision algo to to determine how much to skew it and then apply the recognition to determine what’s written.

We did this many years ago with application forms, we kept training our model on the horrific hand writing we were receiving.

All scanned documents will come out differently with scale, skew, rotation, color etc. Step 1 is to normalize all of that.

lkpat said:
I know there are techniques - visual cues to identify the boundaries of unique regions. I just need to find the name again. I need the app to be flexible.

You can make regions inside the document, e.g. a block with a unique 2d barcode or whatever in the corner. But that's a lot more AI work than having pre-defined rectangle regions that you specify once off.

ChakalakaChuckles · Sep 23, 2023

lkpat said:
I know there are techniques - visual cues to identify the boundaries of unique regions. I just need to find the name again. I need the app to be flexible.

Hough lines and contours?

lkpat · Sep 23, 2023

ChakalakaChuckles said:
Hough lines and contours?

Will take a look, thanks

lkpat · Sep 23, 2023

animal531 said:
All scanned documents will come out differently with scale, skew, rotation, color etc. Step 1 is to normalize all of that.

You can make regions inside the document, e.g. a block with a unique 2d barcode or whatever in the corner. But that's a lot more AI work than having pre-defined rectangle regions that you specify once off.

No worries with specifying the regions, I'd rather have that user defined "once off" so the app can be adapted.

Vis1/0N · Oct 11, 2023

semaphore said:
Levenshtein distance.

I have only spent a few minutes looking at that, if I have a single line OCR result like :
T oOLGATE-PALMOLIVE (PTY) LTE
and I have a dictionary of customer names, which include COLGATE-PALMOLIVE PTY LTD, I should run through each customer name against the OCR result and score the Levenshtein distance of each to match the correct entry? I will investigate the Levenshtein distance further, but am asking for a confirmation that this is indeed the track I need to pursue based on my example.

If my dictionary has a n of 2000, I could pick a couple of random 3 letter sequences from the OCR result ( example PAL, or GAT), and use regex to filter down the dictionary. Although I note a poor OCR may exclude the target dictionary phrase.

JSRJJ · Oct 11, 2023

see if you can get a license for Alteryx for education purposes. It has additional tools for DS that you can use to achieve what you trying to do.

Join the MyBroadband community

Get started

Machine learning / vision fundis

Executive Member

Expert Member

Executive Member

Executive Member

Executive Member

Expert Member

Honorary Master

Executive Member

Honorary Master

Honorary Master

Expert Member

Expert Member

Honorary Master

Executive Member

Expert Member

Member

Executive Member

Executive Member

Expert Member

Well-Known Member