Arabic OCR

Arabic OCRKirtas Technologies provides the fastest high-resolution scanning machinery in the world and now - most comprehensive Arabic OCR solution.

Kirtas' proprietary Arabic OCR engine, enables the management of Arabic information allowing you to quickly locate critical content, and accurately identify the key concepts required for information extraction, categorization and summarization.

Kirtas' new Arabic software can identify both Arabic and English characters on the same page.

Users can also digitize large volumes of documents, save them as graphic files, and classify them for later recognition.

Capabilities

Our new Arabic OCR Engine can do batch OCR for large volumes of books and other documents.

It can handle 15 left-to-right languages including (but not limited to) English, French, German, and Dutch, 5 right-to-left languages including Arabic, Farsi, Jawi, Pashto, and Urdu, and 3 bilingual languages including Arabic/English, Arabic/French, and Farsi/English.

The Arabic font types include Naskh and Kofi. It can skip and or recognize Diacritics and images can be rotated during processing: 0, 90, 180, 270 & auto rotate

OCR processing rates are fast, processing at am unheard of 1 page per second.

It supports OCR input of 3 different file formats including JPG, TIff & BMP, and control output can be generated in 4 different formats: ART, TEXT, Word, and XML.

Benefits

Kirtas has produced an Arabic OCR tool that is the first of its kind. Not only is it completely integrated into the BSE Software Suite, the digital files it generates can be output for Print on Demand, E-book and Audio books.

Now, important middle eastern documents can be searchable and accessible like never before, whether it involves databases or the Internet.

Key Features

  • Batch OCR for large volumes
  • 15 left to right languages
  • 5 right to left languages
  • 3 bilingual languages
  • Fast OCR processing rates at about 1 page per second
  • Supports OCR input of 3 different file formats including JPG, TIFF & BMP
  • Control output generated in 4 different formats: ART, TEXT Word, XML