OCR: what scanner should I get for scanning in bank statements

j4ck455

Executive Member
Joined
Jan 2, 2006
Messages
7,503
I've been given a heap of ABSA bank statements to scan in and convert to text, so I need to buy a scanner that has really good OCR software, which scanner do you recommend that I buy?

I'd prefer to get a dedicated flatbead scanner as opposed to a multifunction device (I have had problems in the past with a multifunction HP scanner that's lying about in a broken heap).

I would also like to use the scanner with Linux at a later stage, initially I will use it with Windows :(
 

nocilah

Banned
Joined
Sep 2, 2004
Messages
7,624
i am using a Canon LIDE20 which comes with fairly decent OCR support.

nice things about it is that it uses the usb for power which relieves having extra wires n crap.
 

thisgeek

Expert Member
Joined
Apr 22, 2005
Messages
3,372
Yeah, I'd go with a Canon - I've got an LIDE35. Prolly has the same software that halicon has - but the OCR is quite good.
 

dabouncer

Expert Member
Joined
Jan 2, 2006
Messages
1,405
Epson has a pretty good range of Photo quality scanners with pretty decent OCR software.
 

Ap0c

ParcelNinja CEO Justin Drennan
Company Rep
Joined
Aug 15, 2005
Messages
761
How many is 'a heap' ? Reason I ask is 'cos it may be worth your while getting soembody else to do it. (Somebody who has proper OCR software).

Also, how liable are you for mistakes? as OCR isnt exactly perfect....
 

bekdik

Honorary Master
Joined
Dec 5, 2004
Messages
12,860
A better bet would be to purchase more robust OCR software, such as Read Iris Pro, or Omnipage. The 'free' OCR software is normally pretty lightweight and cannot be tuned to improve accuracy. Incorrect OCR of numbers will be a major risk as default OCR checking procedure of using spell checker doesn't apply to numbers.
 

albert123

Expert Member
Company Rep
Joined
Aug 30, 2005
Messages
1,938
according to my experiec, which SCANNER you use will have very little effect on the results, the software you use is more important, but ja i agree 100% with apoc, you are scanning bank statements? which i presume you are not willing to have even 1 mistake? so i think you will have to type it all or get it in electronic format from the bank rather? OCR will NEVER get it right 100% even if you have the best scanner.
 

j4ck455

Executive Member
Joined
Jan 2, 2006
Messages
7,503
How many is 'a heap' ? Reason I ask is 'cos it may be worth your while getting soembody else to do it. (Somebody who has proper OCR software).

Also, how liable are you for mistakes? as OCR isnt exactly perfect....
Several years worth, several different accounts (all ABSA). I had intended to get another scanner at some point, but now that I've been tasked with this job I am the person that needs proper OCR software.
according to my experiec, which SCANNER you use will have very little effect on the results, the software you use is more important, but ja i agree 100% with apoc, you are scanning bank statements? which i presume you are not willing to have even 1 mistake? so i think you will have to type it all or get it in electronic format from the bank rather? OCR will NEVER get it right 100% even if you have the best scanner.
I'm not too worried about inaccuracies in the OCR process. After the data has been converted to text I'm going to feed it into a program that I've written, and all the data will be validated. Mistakes should be easy to weed out due to the bank account balance on each transaction line, if the after transaction balance does not match then that line will have to be manually correct, which is much less work than typing everything in manually :) :) :)
A better bet would be to purchase more robust OCR software, such as Read Iris Pro, or Omnipage. The 'free' OCR software is normally pretty lightweight and cannot be tuned to improve accuracy. Incorrect OCR of numbers will be a major risk as default OCR checking procedure of using spell checker doesn't apply to numbers.
AFAIK the HP scanners are boxed with some version of Read Iris, while the Canons are boxed with Omnipage, but I take your point these could be scaled down versions of the proper software :(
 

albert123

Expert Member
Company Rep
Joined
Aug 30, 2005
Messages
1,938
quite honestly boet, from my experience, the time it will take you to scan, OCR and then check for errors and fix it, will take longer than retyping it. but go for it, you will spend a fortune on the software then retype everything anyway.
 

Ap0c

ParcelNinja CEO Justin Drennan
Company Rep
Joined
Aug 15, 2005
Messages
761
Well, I've had quite a bit of experience with this sort of stuff, so I might as well help out here ;). My wife is in the 'document management industry' and they do OCR/Scanning etc for quite a few of the major organizations in SA.

The scanner only really makes a difference when you are dealing with different types of paper. Some scanners will detect the thickness of the paper, and will use the optimum amount of light. This is needed to that text on the back of the document does not get scanned as well. The only other things you need to worry about are:

1) Speed of scan
2) Do you need to feed in the documents manually?

The biggest thing then is the actual OCR software. The more advanced stuff automatically:
1) De-skews images if they are scanned in at an angle.
2) If text can’t be read correctly, it prompts for user input.
3) Uses different OCR engines for different types of text on different places on the document. There is no single OCR engine that is best for all tasks, thats why they use a few for different types, fonts, sizes of text.

If you need specific info on how this stuff works, I can put you in contact with somebody.

From my experience, just pay somebody to scan the docs in for you using proper software, and let them send you a database or xml with the results. Sitting there all day scanning in documents, with stuff being skew etc, is just not worth it ;(
 

bekdik

Honorary Master
Joined
Dec 5, 2004
Messages
12,860
AFAIK the HP scanners are boxed with some version of Read Iris, while the Canons are boxed with Omnipage, but I take your point these could be scaled down versions of the proper software :(

The ones that come with the scanner are a light version, which is why I suggested the pro version :).

If budget isn't too much of a problem you could always look at the Kodak range, but you will be into serious money here - upwards of R10G
 
Top