cancel
Showing results for 
Search instead for 
Did you mean: 

Convert a typewritten document to text

ReedRichards
Seasoned Pro
Posts: 4,927
Thanks: 145
Fixes: 25
Registered: ‎14-07-2009

Convert a typewritten document to text

I am looking for some software that can convert an old document written on a typewriter to a text file you can edit.  Typed documents have some unique features:

  • There is only one font
  • If a character is slightly wrong or misaligned then that same character is always the same slightly wrong or misaligned.

So you might think this would be a easy task for OCR (Optical Character Recognition) software and you would get even better results if the software is prepared to let you teach it how to recognise individual letters.  But no.  You end up with something that is ridiculously over-complicated with copious errors in a whole range of different fonts and with peppered with other features that could not be achieved using a typewriter.  And I have not found any OCR software that is prepared to learn.

Any suggestions?

  

15 REPLIES 15
Alex
Community Veteran
Posts: 5,500
Thanks: 921
Fixes: 13
Registered: ‎05-04-2007

Re: Convert a typewritten document to text

I think all you can do is try different OCR software.

Not used it for years, someone may have suggestions on a up to date good application to use.

Depends on the size of the document, but it is never 100% in my experience so you will have to go through it correcting errors manually.

I suppose using good software means there are less of them to deal with.

Baldrick1
Moderator
Moderator
Posts: 11,631
Thanks: 5,167
Fixes: 416
Registered: ‎30-06-2016

Re: Convert a typewritten document to text

You should be able to correct font variances by simply opening the document in a word processor, highlighting the complete document and selecting your preferred font.

With respect to OCR accuracy, have you tried expanding the typed text by scanning a portrait page and printing it again in landscape? it will need two pages to get the complete portrait page but it might give the OCR a bit more detail to go at.

Failing that you may just have to suffer the pain using a word processor or seek professional help, I guess it depends on the size of the document when deciding if that's practicable.

Moderator and Customer
If this helped - select the Thumb
If it fixed it,  help others - select 'This Fixed My Problem'

shutter
Community Veteran
Posts: 22,206
Thanks: 3,769
Fixes: 65
Registered: ‎06-11-2007

Re: Convert a typewritten document to text

How big ( number of pages ? ) is the document?  Is it "plain typing"   as in a manuscript/novel or is it tabulated etc?

You may find it cheaper to employ some self employed office typewriting person, than to squander oondles of money on various OCR softwares that you may never need to use again..

 

daveplus
Pro
Posts: 630
Thanks: 132
Fixes: 10
Registered: ‎25-08-2010

Re: Convert a typewritten document to text


And I have not found any OCR software that is prepared to learn.

Any suggestions?

  


Have you tried Adobe Acrobat? Not sure about learning but Omnipage can be taught. Both have time limited trial versions.

ReedRichards
Seasoned Pro
Posts: 4,927
Thanks: 145
Fixes: 25
Registered: ‎14-07-2009

Re: Convert a typewritten document to text

So far I have tried Iris OCR and, since my original post, ABBYY FineReader.  The ABBY software produced a much better result.  You can tell it the input is typewritten and it then produces output that is all in one font and with a basic and simple layout.  But it has no learning capability that I have found so far.  I'll look into the other suggestions.   

daveplus
Pro
Posts: 630
Thanks: 132
Fixes: 10
Registered: ‎25-08-2010

Re: Convert a typewritten document to text

OmniPage Pro is definitely the one to use. See http://supportcontent.nuance.com/omnipage/18/doc/OP18Guide.pdf

ReedRichards
Seasoned Pro
Posts: 4,927
Thanks: 145
Fixes: 25
Registered: ‎14-07-2009

Re: Convert a typewritten document to text

I had a go with OmniPage but was disappointed in the result.  Some of the resulting text was laid out in document format but random sections were placed in text boxes.  It's another example of software trying to be too clever and thereby giving a stupid result.  Typed and printed documents typically have a very simple layout and the last thing you want is some OCR software that renders this more complicated then the original.  If there is a "Don't put text in boxes" setting or a "Don't create random lines" option then I failed to find them.  

daveplus
Pro
Posts: 630
Thanks: 132
Fixes: 10
Registered: ‎25-08-2010

Re: Convert a typewritten document to text

If you go to Options then you can set the layout to be the simplest possible

ReedRichards
Seasoned Pro
Posts: 4,927
Thanks: 145
Fixes: 25
Registered: ‎14-07-2009

Re: Convert a typewritten document to text

I was trying to deal with a document that had a double-line margin around, and quite close to, the text.  I wanted to ignore this but the Nuance software got fixated with it, seeing it either as characters or rendering it as bits of vertical line - which appeared even in a .rtf format output.  I would have got a better result if I had printed out the pages, cut-off these margins with a pair of scissors and re-scanned them.  So for me, Omnipage is in the "stupid result by trying to be too clever" category.    

daveplus
Pro
Posts: 630
Thanks: 132
Fixes: 10
Registered: ‎25-08-2010

Re: Convert a typewritten document to text

Acrobat has a cropping feature and a redaction feature so you could scan your document using Acrobat, crop out the lines or redact them then save the result as a PDF. Omnipage can open a PDF and do the OCR.

Alex
Community Veteran
Posts: 5,500
Thanks: 921
Fixes: 13
Registered: ‎05-04-2007

Re: Convert a typewritten document to text

Yes I see what @daveplus means.

A basic cropping feature will do what you want, many programs have them.

More hassle than if you didn't need to, but less if you tried your manual way. 

VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: Convert a typewritten document to text

Why do you need this document as text, anyway?

A print of an image is just as good as a typewritten document.

"In The Beginning Was The Word, And The Word Was Aardvark."

shutter
Community Veteran
Posts: 22,206
Thanks: 3,769
Fixes: 65
Registered: ‎06-11-2007

Re: Convert a typewritten document to text

@VileReynard  It seems you have jumped in with both feet.... the first line of the original post gives a clue...

 

I am looking for some software that can convert an old document written on a typewriter to a text file you can edit.

VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: Convert a typewritten document to text

Fair enough...

How about pre-processing the image scans?

e.g. Increase the contrast so you have just black & white - no greys; then remove speckles (dirt etc) before asking OCR to do its thing?

"In The Beginning Was The Word, And The Word Was Aardvark."