Skip to Main Content

00 Archives and Special Collections: Resources

Optical Character Recognition (OCR)

OCR is a character recognition technology that allows characters and words on a document to be read and converted into an editable and searchable format.

Software Name:  ABBYY FineReader
How we use it:  When digitizing text materials, we primarily use ABBYY FineReader OCR software to convert scanned files into fully searchable text. We use the standard languages. We typically operate with automatic segmentation and save the resulting OCR files as PDF. We supplement the internal dictionaries with local terminologies of common names and locations from around Florida. Depending on the quality of the original and consistency of the font, a single page may take up to three minutes or more to perform OCR.

Even though it is a time consuming activity, OCR resulted output file can be indexed and fully searchable.

Strategies to Preserve and Ensure Access to Information