A brief history of OCR

OCR 2mp

Optical Character Recognition (OCR) used to be seen as the unsung hero of the business world.

As you probably know, OCR is the process by which text images are converted into machine-encoded text. Digitising text means it can be easily presented, edited, stored and searched, optimising key administrative tasks such as invoicing and sales processing.

But how was this technology developed? And in the era of digitisation, can OCR remain relevant?

The genius of Emanuel Goldberg

OCR traces its roots back to telegraphy. On the eve of the First World War, physicist Emanuel Goldberg invented a machine that could read characters and convert them into telegraph code. In the 1920s, he went a step further and created the first electronic document retrieval system.

At this time, businesses were microfilming financial records – great in principle, but quickly retrieving specific records from spools of film was nigh on impossible. To overcome this, Goldberg used a photoelectric cell to do pattern recognition with the help of a movie projector. By repurposing existing technologies, he took the first steps towards the automation of record keeping. The US patent for his "Statistical Machine" was later acquired by IBM.

Since then, OCR technology has proliferated, with businesses all over the world relying on it to help reduce overheads when it comes to converting extracting data from paper documents.

The problem with OCR

Early versions of OCR had to be trained with images of each character and were limited to recognising one font at a time. In the 1970s, inventor Ray Kurzweil commercialised “omni-font OCR”, which could process text printed in almost any font. In the early 2000s, OCR became available online as a cloud-based service, accessible via desktop and mobile applications.

Today, there’s a host of OCR service providers offering technology (often accessible via APIs) capable of recognising most characters and fonts to a high level of accuracy. Even though the technology continues to improve, there is always scope for errors. And that means costly human intervention to validate information and ensure it’s ready to be used by the wider organisation – not to mention the economic and environmental cost of persisting with paper.

What’s changed?

For decades, OCR was the only way to turn print outs into data that could be processed by computers, and it remains the tool of choice (outside of EDI and invoice portals) for converting paper invoices into extractable data that can be integrated into finance systems as an example.

But e-document submission now offers businesses a far superior approach to areas such as invoicing and sales processing, cutting costs and freeing up staff to focus on higher value tasks. Going forward, we expect advances in AI and Machine Learning to further accelerate the demise of data extraction.

The future of OCR

OCR will continue to be a valuable tool for filling in gaps whereby an application-generated electronic document cannot be generated. Ultimately, the truly “paperless business” doesn’t (yet) exist and data extraction is still a useful tool that can augment e-document processing. That’s why CloudTrade has teamed up with partners to help clients who are still grappling with paper. These partnerships bridge the gap by bringing together paper and electronic invoicing into the same processing platform, enabling customers to target savings of up to 80%.

We accept that a complete mail-room service may well require OCR until the world becomes completely digitised. Recognising demand from business to go digital, CloudTrade was formed in 2009 to help accelerate a paradigm shift in the data acquisition process. We recognised that many solutions presented barriers to entry, often requiring IT projects that once completed, still required human intervention in the process.

CloudTrade was formed to remove all barriers for the sending party. We make it easy for trading parties to transact and provide receiving party data that’s 100% accurate and meets their specific business rules. By doing this we negate the need for human intervention, saving businesses time and money.