OCR.NET 2.0 SDK Tesseract Engine

 

**NEW** 1D barcode reader engine included

The 2.0 engine now includes a fast barcode detector engine capable of interpreting 1D Barcodes on a page image.

The codes that the engine is able to interpret are:

  • Code 2 of 5 
  • Code Interleaved 2 of 5
  • Code 93
  • Code 39
  • Codabar
  • UPC A
  • EAN 13
  • Code 128
  • UPC E
  • EAN 8
  • RSS 14
**UPDATED** Windows Phone 8.1 Updated Demo App with full source code

The ocr sdk 2.0 will soon be available for download and contains the source code of a full demo app for Windows phone 8.1, that includes advanced techniques and components for speed and memory management in picture handling like custom camera control with auto-detection of optimum capture resolution, image viewer with pinch and zoom, image re-coding for low memory usage using wp8 native DecodePixelHeight, and more goodies...

Some screenshots of the App:

The sdk also includes a Windows Forms Application with full c# source code that demonstrates the use of the ocr engine in Windows using .NET 4.5 framework.

 

Introduction

DevScope OCR SDK is a Optical Character Recognition toolkit engine based on Google's open-source Tesseract OCR v3  that allows to develop applications using Microsoft .NET frameworks, that accurately recognizes characters in a scanned document image without the need to track and pay for each desktop, server or mobile deployment.

 

It's 100% royalty free.

Available as free trial download or full featured license. Is compatible with Microsoft.NET framework and also the first to support Windows Desktop And Server, Windows Phone 8.1 and Windows Store Apps.

The Tesseract OCR engine was originally developed by Hewlett-Packard UK. It was one of the top three engines in the 1995 UNLV Accuracy test and is probably one of the most accurate open source OCR engines available. Since then it has been extensively revised with sponsorship from Google.

Quick Price List
 
OCR.NET SDK
Windows only
1 Developer License
no support


99€

Buy Now
PROMOTION
OCR.NET SDK
Windows+WP8.1+WINSTORE
3 Developer License
+
1 Year Support
399€
199€

Buy Now
 
OCR.NET SDK
Windows+WP8.1+WINSTORE
5 Developer License
          no support             

                        999€

Buy Now
Click here to view more pricing options 

 

Licensing
  • Per developer licenses: This license type entitles the specified number of developer/build machine at a single physical address to write software with access to DevScope OCR SDK.
  •    
Main Features
  • New 1D Barcode Reader Engine
  • New imageProcesing Actions for pre-processing an image before running the OCR engine
  • New image format load in Windows Store and WP8 versions - PNG, TIFF, JPEG and BMP
  • New dictionaries for MIRC and OCR-A/B optimized for reading numbers
  • New dlls for Windows x86,x64
  • New dlls for WP8 ARM, x86
  • New dlls for Windows Store apps ARM,x86
  • New class reference and usage documentation
  • New ability to ocr na image directly from a writeableBitmap raw buffer.
  • Full Unicode Support.
  • Multi-thread, Multi-Instance Support. Optimal for batch processing. 
  • Works as async task on mobile devices for keeping the UI responsive
  • Full-Featured C# demo application included.
  • Character recognition confidence retrieval.
  • Outputs a Document Object Model of easy navigation and extraction of the result OCR entities - block, paragraph, line, word and character location.
  • Output results as Searchable PDF, Text, HOCR and UNVL format.
  • Outputs the optimized thresholded image used for OCR.
  • Included support for document Auto-Deskew and Auto-Orientation detection.
  • Stand-Alone document Auto-Orientation detection feature.
  • Included support for Local Adaptative Binarization for processing camera captured documents.
  • Support for nearly 60 languages such as English, French, Italian, German, Spanish, Brazilian Portuguese, Dutch, Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish, German (standard and Fraktur script), Greek, Finnish, French, Hebrew, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian and Vietnamese, etc.
  • Can recognize only digits, only alpha or only "white listed" characters.
  • Can skip "black listed" characters.
  • Outputs a Document Object Model of easy navigation and extraction of the result entities.
  • Multiple OCR engine context support. Allows for the engine to process document image as a single word, single character, text block, line, uniform block of texto, vertical text etc...
  • Highly Optimized for fast area processing.  
  • Available in 32 bit, 64 bit and ARM versions.
  • Includes na enhanced image viewer componente with zoom with mouse-wheel and region highlighting.

 

So why wouldn't I just use Tesseract? What are DevScope OCR benefits?
  • Stablility. The original Tesseract is based around a command line process which means that it does not matter if it occasionally terminates, crashes or leaks memory. If you are running a modern in-process application you need a safer behavior. DevScope OCR resolves these issues and presents you with a 100% stable platform.
  • Performance. DevScope OCR is highly optimized for fast code and for Windows based operation systems. It also adds multithread support so you can spread load over multiple CPUs or cores and you can use it safely from multithreaded APIs like ASP.NET.
  • Compatibility. Tesseract is 32-bit process and cannot be used in 64-bit applications. This is a significant issue when so many operating systems are now based around 64-bit address space. DevScope OCR eliminates this restriction and allows you to run in either x86 or x64 mode by just referencing the appropriate assembly.
  • Mobile. DevScope OCR is the first to run on ARM based Windows phone 8 and WinRT devices.
  • Simplicity. We provide a single library dll component and and its needed is to reference it in your project, It presents a clean and straight-forward API and also a full featured exemple so that you start using it right away.

 

FAQ

Where can I use or evaluate the DevScope OCR SDK?
You can get the library and a 30 day trial license by clicking the get free trial version button. You can also purchase licenses here. The library will need to be unlocked with a supplied key, see "How can I unlock the DevScope OCR SDK?".

 

How can I unlock the DevScope OCR SDK?
Just call the SetLicense() method passing your license key and info as parameters: I.E.,

 

TesseractOcrEngine.SetLicense(CompanyNameEmail,Supplied Key);  

 

Can I use DevScope OCR SDK for barcode recognition?

No. Tesseract is for text recognition.

 

What image formats are supported ?

The supported image formats, that can be processed by the ocr engine are :

  • bmp, jpg, png and tiff in the Windows version.
  • bmp, jpg and png in the WP8 and WINRT version (no tiff support yet).

 

Is there a Minimum Text Size? (It won't read screen text!)
There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".

 

I am getting poor recognition performance. 
DevScope OCR SDK is targeted for books and articles scanned on a flatbed scanner at 300-600dpi. It works  well on a variety of other printed materials, in multiple  languages . Inputs it will not work on are:

 

  • Handwriting
  • unprocessed digital camera-captured documents
  • text in photographic images
  • CAPTCHAs

Where can I download all supported dictionary languages ? 
You can download the each of the supported OCR languages by clicking on the following links.
Please Note: you must unzip and put each dictionary files inside the tessdata folder.
Afrikaans, Albanian, Ancient Greek, Arabic, Arabic, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Catalan, Cherokee, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Esperanto alternative, Estonian, Finnish, Frankish, French, Galician, German, Greek, Hebrew, Hebrew (community), Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Middle English (1100-1500), Middle French (ca. 1400-1600), Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Latin), Slovakian, Slovakian Fraktur, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese.

Additional you can also download specific dictionaries for extended funcionality of tesseract such as Orientation & Script Detection and Math / equation detection module.

How do I perform OCR on a specific zone of an image? 

  1. Load the image.
  2. Define the zone (also called region of interest) by setting the appropriate request property .
    I.E., request.ScanArea=new Rectangle(100,100,250,50)
  3. Perform the OCR using the DoOCR() method.
Rules and advices
  • If you found a bug - please create issue by contacting our service support: Please make sure you are able to replicate problem with DevScope OCR SDK on the specific platform. Also please check our forums.
  • Use the latest official release (optionally: try to check if problem is not solved in new versions).
  • Use the correct language dictionary files.
  • If you have a question - put it to the  DevScope OCR SDK developer forum.
  • Do not ask for support in comments - it will be deleted.
  • Post example files e.g. if you have problema, just posting error messages is not sufficient if you used input file. Source of problem is hidden in input files usually.
  • Do not post programs or libraries - post link where they can be downloaded
  • Try to find optimal format for example images - 20Mb image is not helpful. Multi-page tiff useful only in case you have problem with multi-page functionality. E.g. 2 colour png provide same information as truecolour uncompressed tiff (tesseract will convert it to 2 colours anyway).
  • Copy error message from terminal/console/command line windows instead of sending screen-shot.
  • Read FAQ, Forum and search issues (also closed), search in forum before you post your issues/question. Maybe it was solved already.


 

 
Buy Now Full Version

 


Support

Suggestions & Feedback

 

OCR SDK 2.0 Requirements

  • Microsoft .NET 4.0/4.5 versions for Windows Desktop and Server
  • Windows Phone 8.1 SDK
  • Windows Store 8.1 SDK
  • Visual Studio 2013 x86/x64/Arm Runtime Components

 

OCR SDK 2.0.0 Help

 

How-To's

 

New in version 2.0.0

  • The engine is now using the new Windows 8.1 and Windows Phone 8.1 sdks and .NET 4.5
  • New multiple output to Searchable PDF, HOCR, Text, DOM
  • New 1D Barcode Reader Engine
  • New imageProcesing Actions for pre-processing an image before running the OCR engine
  • New image format load in Windows Store and WP8 versions - PNG, TIFF, JPEG and BMP
  • New dictionaries for MIRC and OCR-A/B optimized for reading numbers
  • New dlls for Windows x86,x64
  • New dlls for WP8 ARM, x86
  • New dlls for Windows Store apps ARM,x86
  • New class reference and usage documentation
  • New ability to ocr na image directly from a writeableBitmap raw buffer.
  • New OCR engine control settings allowing disabling of impostant features like UseDictionaries, EnableOcrAdaptation, DisableWordChopping, DetectVerticalText
  • New enable multi-language dictionaries usage on a single pass
  • New image pre-processing filters : auto noise reduction, auto clean black borders, auto invert image, image scaling.
  • New multi-page image document viewer control for Windows included
  • Performance optimizations of the engine - Up to 30% faster
  • Memory fixes and optimizations
  • Improved ocr results object model
  • Improved demos
  • Fixed a list of bugs and tweaks on the engine

 

New in version 1.5.1

  • Fix a problem with ocr Initialize in x64
  • fix a wrong reference in demo for windows

 

New in version 1.5.0

  • Performance optimizations of the recognizer - Up to 40% faster
  • Memory optimizations - The memory usage was reduced 30%
  • New full-featured demo App for WP8
  • Improvements on the demo for Windows
  • Upgrade of the ocr core engine to version 3.03
  • Refactoring of the object model and engine interface
  • Image used by the ocr internally is now available
  • Added more events for control of the recogition steps
  • Ability to set tesseract variables from cliente side
  • Improvements on the image pre-processing
  • Added a noise reduction algorithm for better recognition
  • Fixed bug with characters to recognize
  • Fixed a list of bugs that was causing memory leaks and errors under special conditions

 

New in version 1.0.2

  • Enhancements of Windows Phone 8 demo  app
  • Performance improvements

 

New in version 1.0.1

  • Native support for bmp, jpeg and png in wp8 version
  • small bug fixes

 

product image