jpg to text / Image to Text

Converting images to text is also known as OCR (Optical Character Recognition), and is very useful when scanning paper documents into your computer and then converting them to, say, MS Word format. OCR is considered a 'solved problem' these days as the OCR engines are now quite good and support a plethora of languages, alphabets and characters.

There are some freeware and open source applications which do a very good job of this. Sometimes, you may have only an image or screenshot of a text document and you need to manipulate the text. In such cases, an OCR application is indispensible.

The best Open Source (Free) software for Optical Character Recognition to date is Tesseract.

About Tesseract

Tesseract is a free optical character recognition engine. It was originally developed as proprietary software at Hewlett-Packard between 1985 until 1995. For ten years it languished without any development, then Hewlett Packard and UNLV released it as open source in 2005. Tesseract is now being actively developed by Google and released under the Apache License, Version 2.0.

Listing of other OCR applications (Sortable table):

Name License Operating systems Notes
ExperVision TypeReader & OpenRTK Commercial Windows,Mac OS X,Unix,Linux,OS/2
ABBYY FineReader OCR Commercial Windows For working with localized interfaces, corresponding language support is required.
OmniPage Commercial (Nuance EULA) Windows, Mac OS Product of Nuance Communications
Readiris Commercial Windows, Mac OS I.R.I.S. Group of Belgium. Asian and Middle Eastern editions.
SmartZone (formerly known as Zonal OCR) Commercial Windows SmartZone is the process by which Optical Character Recognition (OCR) applications "read" specifically zoned text from a scanned image.
Computhink's ViewWise & AnyDoc Commercial Windows Document Management system
CuneiForm BSD variant Windows, Linux, BSD, MacOSX. Enterprise-class system, multi language, can save text formatting and recognizes complicated tables of any structure
CVISION Technologies, Inc. PdfCompressor and Maestro Recognition Server Commercial Windows Fast, accurate, high volume OCR
GOCR GPL Many (open source) Early development
Microsoft Office Document Imaging Commercial Windows, Mac OS X Microsoft Office has some OCR capabilities built-in.
Microsoft Office OneNote 2007 Commercial Windows $99.00 from Microsoft.
NovoDynamics VERUS Commercial?  ? Specializes in languages of the Middle East
Ocrad GPL Unix-like, OS/2 Open Source
Brainware Commercial Windows Data extraction and processing of data from documents into any backend system; sample document types include invoices, remittance statements, bills of lading and POs
HOCR GPL Linux Hebrew OCR
OCRopus Apache Linux Pluggable framework which can use Tesseract. State of the Art OCR for Linux.
OOCR Open Source (GPL) Windows Open OCR
ReadSoft Commercial Windows Scan, capture and classify business documents such forms, invoices and POs.
Alt-N Technologies'
RelayFax Network Fax Manager
Commercial Windows Multi-language OCR Plug-in is used to convert faxed pages into editable document formats (doc, pdf, etc...) in many different languages.
Scantron Cognition Commercial Windows For working with localized interfaces, corresponding language support is required.
SimpleOCR Freeware and commercial versions Windows Free!
SmartScore Commercial Windows, Mac OS For musical scores
Tesseract Apache Windows, Mac OS X, Linux, OS/2 HP initiative; now under development by Google