E-aksharayan

Last updated
e-Aksharayan
Written in C++
Operating system Linux (32 & 64-bit), Windows (32-bit)
Available inInterface: English
Recognition: Assamese, Bengali, Bodo, Devanagari, Kannada, Gujarati, Gurumukhi, Oriya, Malayalam, Meitei, Marathi, Tamil, Telugu, Tibetan and Urdu
Type Optical character recognition
Website ocr.tdil-dc.gov.in
Bangla typos Bangla typos.png
Bangla typos

e-Aksharayan is an optical character recognition engine for Indian languages. Some of research work from e-Aksharayan has been published in different conferences and journals. [1] [2] [3] [4]

Contents

Screenshots

Related Research Articles

<span class="mw-page-title-main">Devanagari</span> Writing script for many North Indian and Nepalese languages

Devanāgarī or Devanagari, also called Nāgari, is a left-to-right abugida, based on the ancient Brāhmi script, used in the northern Indian subcontinent. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 7th century CE and achieved its modern form by 1000 CE. The Devanāgari script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Brahmic scripts</span> Family of abugida writing systems

The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.

The National Library at Kolkata romanisation is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as (American) Library of Congress and is nearly identical to one of the possible ISO 15919 variants. The scheme is an extension of the IAST scheme that is used for transliteration of Sanskrit.

<span class="mw-page-title-main">Meitei language</span> Tibeto-Burman language of India

Meitei, officially known as Manipuri, is a Tibeto-Burman language of northeast India. It is the official language and the lingua franca of Manipur, as well as one of the 22 official languages of the Indian Republic, included in the 8th Schedule to the Indian Constitution. It is one of the advanced literary languages, recognised by Sahitya Akademi, India's National Academy of Letters. It serves as one of the recognised educational and literary languages in Assam and Tripura. Native to the Meitei people, it has around 3 million total speakers, and is used as L1 by around 1.8 million people, predominantly in the state of Manipur, and as L2 by different ethnic groups, in different parts of India, Myanmar and Bangladesh. It was used as a court language in the historic Manipur Kingdom, in accordance to the Manipur State Constitution Act 1947.

<span class="mw-page-title-main">Meitei script</span> Writing system used to write Meitei language

The Meitei script, also known as the Meetei script, is an abugida used for the Meitei language, the official language of Manipur state and one of the 22 official languages of India. It is one of the official scripts of the Indian Republic. It is also popularly known as the Kanglei script and the Kok Sam Lai script. Its earliest known evidence of existence dates back to the 6th century AD coins, engraving the Meitei letters, as verified by the various publications of the National Sahitya Akademi. It was used until the 18th century, when it was replaced by the Bengali alphabet. A few manuscripts survive. In the 20th century, the script has experienced a resurgence, and is again being used. Starting from 2021, Meitei script was officially used by the Government of Manipur, along with the Bengali-Assamese script, to write the Meitei language, as per "The Manipur Official Language (Amendment) Act, 2021".

<span class="mw-page-title-main">Tesseract (software)</span> Free optical character recognition engine

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.

The Republic of India has many official names expressing linguistic diversity. Hindi is the official language of the Union Government of India as per Article 343 of the Constitution of India and there is no national language for the country. English has the status of a "subsidiary official language" according to clause 3 of Official Languages Act, 1963. Hindi romanisation uses Hunterian transliteration, which is the "national system of romanisation in India" and the one officially used by the Government of India. The Eighth Schedule of the Indian Constitution lists 22 languages, which have been referred to as scheduled languages and given recognition, status, and official encouragement.

<span class="mw-page-title-main">OCRopus</span>

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

<span class="mw-page-title-main">Bengali alphabet</span> Abugida script used in writing Bengali

The Bengali script or Bangla alphabet is the alphabet used to write the Bengali language based on the Bengali-Assamese script, and has historically been used to write Sanskrit within Bengal. It is one of the most widely adopted writing systems in the world . It is one of the official scripts of the Indian Republic. It is used as the official script of the Bengali language in Bangladesh, West Bengal, Tripura and Barak valley of Assam as well as the Meitei language in Manipur, two of the official languages of India.

<span class="mw-page-title-main">OCRFeeder</span>

OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make them accessible to visually impaired users.

Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system.

<span class="mw-page-title-main">Meitei input methods</span>

Meitei input methods are the methods that allow users of computers to input texts in the Meitei script, systematically for Meitei language.

<span class="mw-page-title-main">Directorate of Language Planning and Implementation</span> Directorate of the Government of Manipur

Directorate of Language Planning and Implementation (DLPI) is a directorate of the Government of Manipur in charge of the language planning and the implementation of policies of Meitei language as well as other indigenous vernaculars of Manipur.

<span class="mw-page-title-main">Meitei linguistic purism movement</span>

The social movement of Meitei language to attain linguistic purism is advocated by literary, political, social associations and organisations as well as notable individual personalities of Bangladesh, Myanmar and Northeast India.

<span class="mw-page-title-main">Bharati script</span> Proposed common script for Indian languages

Bharati Script is a constructed script, and an abugida created by a research team led by V. Srinivasa Chakravarthy at IIT Madras. It is designed to serve as a common script or link script for Indian languages.

<span class="mw-page-title-main">Official scripts of the Republic of India</span> Officially used writing systems of India

The official scripts of the 22 official languages of the Republic of India include abugidas (pseudo-alphabets), alphabetical writing systems and abjads.

<i>Loiyumpa Silyel</i> Ancient Meitei Constitution

The Loiyumpa Silyel, also termed as the Loyumpa Silyel or the Loiyumpa Shilyel or the Loyumpa Shilyel or the Loyamba Sinyen, is an 11th-12th century ancient Meitei language written constitution, regulated in the Ancient Kangleipak during the rule of King Loiyumba. In 1110 CE, its format was finalised from a promulgation of the proto-constitution, drafted in 429 CE by King Naophangba. Historically, it is the first written constitution, and one of the well recorded Ancient Meitei language texts of the kingdom. Its Constitutionalism was replaced by the Manipur State Constitution Act 1947, that was functional until Manipur was merged into Republic of India on October 15, 1949.

References

  1. Greedy Search for Active Learning of OCR Greedy Search for Active Learning of OCR
  2. Text graphic separation in Indian newspapers Text graphic separation in Indian newspapers
  3. An OCR System for the Meetei Mayek Script An OCR System for the Meetei Mayek Script
  4. Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts