Indic computing

Last updated

Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.

Contents

Unicode standard version 15.0 specifies codes for 9 Indic scripts in Chapter 12 titled "South and Central Asia-I, Official Scripts of India". The 9 scripts are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil and Telugu.

A lot of Indic Computing projects are going on. They involve some government sector companies, some volunteer groups and individual people.

Government sector

Indian Union Government made it mandatory for Mobile phone companies whose handsets manufactured, stored, sold and distributed in India to have support for displaying and typing text using fonts for all 22 languages. [1] This move has seen rise in use of Indian languages by millions of users. [2]

TDIL

The Department of Electronics and Information Technology, India initiated the TDIL [3] (Technology Development for Indian Languages) with the objective of developing Information Processing Tools and Techniques to facilitate human-machine interaction without language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services.

In 2005, it started distributing language software tools developed by Government/Academic/Private companies in the form of CD for non commercial use.

Some of the outcome of TDIL program deployed on Indian Language Technology Proliferation & Deployment Centre. This Centre disseminate all the linguistic resources, tools & applications which have been developed under TDIL funding. This programme took to exponential expansion under the leadership of Dr. Swaran Lata who also created international foot-print of the programme. She has now retired.

C-DAC

C-DAC is an India based government software company which is involved in developing language related software. It is best known for developing InScript Keyboard, the standard keyboard for Indian languages. It has also developed lot of Indic language solutions including Word Processors, typing tools, text to speech software, OCR in Indian languages etc.

BharateeyaOO.org

The work developed out of CDAC, Bangalore (earlier known as NCST, Bangalore) became BharateeyaOO. [4] OpenOffice 2.1 had support for over 10 Indian languages.

BOSS

BOSS linux was developed by the Centre for Development of Advanced Computing (CDAC) to promote use of open-source software in India.

NGO and Volunteer groups

Indlinux

Indlinux organisation helped organise the individual volunteers working on different indic language versions of Linux and its applications.

Sarovar

Sarovar.org is India's first portal to host projects under Free/Open source licenses. It is located in Trivandrum, India and hosted at Asianet data center. Sarovar.org is customised, installed and maintained by Linuxense as part of their community services and sponsored by River Valley Technologies. Sarovar.org is built on Debian Etch and GForge and runs off METTLE.

Pinaak

Pinaak is a non-government charitable society devoted to Indic language computing. It works for software localization, developing language software, localizing open source software, enriching online encyclopedias etc. In addition to this Pinaak works for educating people about computing, ethical use of Internet and use of Indian languages on Internet.

Ankur Group

Ankur Group is working toward supporting Bengali language (Bengali) on Linux operating system including localized Bengali GUI, Live CD, English-to-Bengali translator, Bengali OCR and Bengali Dictionary etc. [5]

BhashaIndia

SMC

SMC is a free software group, working to bridge the language divide in Kerala in the technology front and is today the biggest language computing community in India. [6]

Input methods

Reverie Keypad lists the supported Indian languages for typing, in Android Multilingual Indic Keyboard Options.png
Reverie Keypad lists the supported Indian languages for typing, in Android

Full size keyboards

With the advent of Unicode inputting Indic text on computer has become very easy. A number of methods exist for this purpose, but the main ones are:-

InScript

Inscript is the standard keyboard for Indian languages. Developed by C-DAC and standardized by Government of India. Nowadays it comes inbuilt in all major operating systems including Microsoft Windows (2000, XP, Vista, 7), Linux and Macintosh.

Phonetic transliteration

This is a typing method in which, for instance, the user types text in an Indian language using Roman characters and it is phonetically converted to equivalent text in Indian script in real time. This type of conversion is done by phonetic text editors, word processors and software plugins. Building up on the idea, one can use phonetic IME tools that allow Indic text to be input in any application.

Some examples of phonetic transliterators are Xlit, Google Indic Transliteration, BarahaIME, Indic IME, Rupantar, SMC's Indic Keyboard and Microsoft Indic Language Input Tool. SMC's Indic Keyboard has support for as many as 23 languages whereas Google Indic Keyboard only supports 11 Indian languages. [6]

They can be broadly classified as:

Remington (typewriter)

This layout was developed when computers had not been invented or deployed with Indic languages, and typewriters were the only means to type text in Indic scripts. Since typewriters were mechanical and could not include a script processor engine, each character had to be placed on the keyboard separately, which resulted in a very complex and difficult to learn keyboard layout.

With the advent of Unicode, the Remington layout was added to various typing tools for sake of backward compatibility, so that old typists did not have to learn a new keyboard layout. Nowadays this layout is only used by old typists who are used to this layout due to several years of usage. One tool to include Remington layout is Indic IME. A font that is based on the Remington keyboard layout is Kruti Dev. Another online tool that very closely supports the old Remington keyboard layout using Kruti Dev is the Remington Typing tool.

Braille

IBus Sharada Braille, which supports seven Indian languages was developed by SMC. [6]

Mobile phones with Numeric keyboards

Nokia 1108 Hindi Nokia1108HindiKeypad.jpg
Nokia 1108 Hindi

Mobile/Hand/cell phone basic models have 12 keys like the plain old telephone keypad. Each key is mapped to 3 or 4 English letters to facilitate data entry in English. For inputting Indian languages with this kind of keypad, there are two ways to do so. First is the Multi-tap Method and second uses visual help from the screen like Panini Keypad. The primary usage is SMS. 140 characters size used for English/Roman languages can be used to accommodate only about 70 language characters when Unicode [7] Proprietary compression is used some times to increase the size of single message for Complex script languages like Hindi. A research study [8] of the available methods and recommendations of proposed standard was released by Broadband Wireless Consortium of India (BWCI).

Transliteration/Phonetic methods

English is used to type in Indian languages. QuillPad [9] IndiSMS [10]

Native methods

In native methods, the letters of the language are displayed on the screen corresponding to the numeral keys based on the probabilities of those letters for that language. Additional letters can be accessed by using a special key. When a word is partially typed, options are presented from which the user can make a selection. [11]

Smart phones with Qwerty keyboards

Most smart phones have about 35 keys catering primarily to English language. Numerals and some symbols are accessed with a special key called Alt. Indic input methods are yet to evolve for these types of phones, as support of Unicode for rendering is not widely available.

For Smart Phones with Soft/Virtual keyboards

Inscript is being adopted for smart phone usage. For Android phones which can render Indic languages, Swalekh Multilingual Keypad [12] Multiling Keyboard app [13] [14] are available. Gboard offers support for several Indian languages. [15]

Localization

Localization means translating software, operating systems, websites etc. various applications in Indian language. Various volunteers groups are working in this direction.

Mandrake Tamil Version

A notable example is the Tamil version of Mandrake linux(defunct since 2011). Tamil speakers in Toronto (Canada) released Mandrake, a Linux software, in coming out with a Tamil version. [16] It can be noted that all the features can be accessed in Tamil. By this, the prerequisite of English knowledge for using computers has been eliminated, for those who know Tamil.

IndLinux

IndLinux is a volunteer group aiming to translate the Linux operating system into Indian languages. By the efforts of this group, Linux has been localized almost completely in Hindi and other Indian languages.

Nipun

Nipun is an online translation system aimed to translate various application in Hindi. It is part of Akshargram Network.

Localising Websites

GoDaddy has localised its website in Hindi, Marathi and Tamil and also noted that 40% of the call volume for IVR is in Indian Languages. [17]

Indic blogging

Indic blogging refers to blogging in Indic languages. Various efforts have been done to promote blogging in Indian languages.

Social Networks

Some Social networks are started in Indian languages. [18]

Programming

Indic programming languages

Frameworks

Gherkin, a popular Domain-specific language has support for Gujarati, Hindi, Kannada, Punjabi, Tamil, Telugu and Urdu [19]

Libraries

Natural Language processing in Indian languages is on rise. There are several libraries such as iNLTK, StanfordNLP are available. [20]

Translation

Google offers improved translation feature for Hindi, Bengali, Marathi, Tamil, Telugu, Gujarati, Punjabi, Malayalam and Kannada, [15] with offline support as well. [21] Microsoft also offers translation for some of these languages.

Software

Indic Language Stack

In a symposium jointly organized by FICCI and TDIL, Mr. Ajay Prakash Sawhney, Secretary, Ministry of Electronics and IT, Government of India said that India Language Stack can help overcome the barriers of communication. [22]

Spell Checkers

Transliteration tools

Transliteration tools allow users to read a text in a different script. As of now, Aksharamukha is the tool that allows most Indian scripts. Google also offers Indic Transliteration. Text from any of these scripts can be converted to any other scripts and vice versa. Whereas Google and Microsoft allow transliteration from Latin letters to Indic scripts.

Speech-to-Text

Voice Recognition

Apple Inc. added support for major Indian languages in Siri. [23] Amazon's Alexa has support for Hindi and recognises major Indian languages partially. [24] Google Assistant also has support for major Indian languages. [25]

Internationalized Domain Names

Operating Systems

Virtual Assistants

AI based Virtual Assistants Google Assistant provides support to various Indian languages.

Usage and Growth

According to GoDaddy, Hindi, Marathi and Tamil languages accounted for 61% of India's internet traffic. [17] Less than 1% of online content is in Indian languages. The newly created top apps have support for multiple Indian languages and/or promote Indian language content. 61% of the Indian users of WhatsApp primarily use their native languages to communicate with it. [26] A recent study revealed that adoption of Internet is highest among local languages such as Tamil, Hindi, Kannada, Bengali, Marathi, Telugu, Gujarati and Malayalam. It estimates that Marathi, Bengali, Tamil, and Telugu will form 30% of the total local-language user base in the country. Currently, Tamil at 42% has the highest Internet adoption levels, followed by Hindi at 39% and Kannada at 37%. [27] Intex also reported that 87% of its regional language usage came from Hindi, Bengali, Tamil, Gujarati and Marathi speakers. [2] Lava mobiles reported that Tamil and Malayalam are the most popular on their phones, more than even Hindi. [2]

See also

Related Research Articles

<span class="mw-page-title-main">Devanagari</span> Script used to write Indian and Nepalese languages

Devanagari is an Indic script used in the northern Indian subcontinent. Also simply called Nāgari, it is a left-to-right abugida, based on the ancient Brāhmi script. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 7th century CE and achieved its modern form by 1000 CE. The Devanāgari script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanised Devanagari is also called Romanagari.

The National Library at Kolkata romanisation is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as (American) Library of Congress and is nearly identical to one of the possible ISO 15919 variants. The scheme is an extension of the IAST scheme that is used for transliteration of Sanskrit.

<span class="mw-page-title-main">Input method</span> Method for generating non-native characters on devices

An input method is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters that are available to them. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the 19th century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

ISO 15919 is one of a series of international standards for romanization by the International Organization for Standardization. It was published in 2001 and uses diacritics to map the much larger set of consonants and vowels in Brahmic and Nastaliq scripts to the Latin script.

The "Indian languages TRANSliteration" (ITRANS) is an ASCII transliteration scheme for Indic scripts, particularly for the Devanagari script.

InScript is the decreed standard keyboard layout for Indian scripts using a standard 104- or 105-key layout. This keyboard layout was standardised by the Government of India for inputting text in languages of India written in Brahmic scripts, as well as the Santali language, written in the non-Brahmic Ol Chiki script. It was developed by the Indian Government and supported by several public and private organisations. This is the standard keyboard for 12 Indian scripts including Devanagari, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Odia, Tamil and Telugu, among others. The InScript layout is built into most of the major operating systems including Windows, and most Linux and Mac OS systems. It is also available in some mobile phones and in Apple's iOS 5 and higher. It is available in Android 4.0 and higher but removed from latest Google Keyboard application (Gboard) and Google Indic Keyboard. It is also available for Windows Mobile 5.x and 6.x from third parties.

Sinhala language software for computers have been present since the late 1980s but no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. In the wake of this CINTEC introduced Sinhala within the UNICODE standard. ICTA concluded the work started by CINTEC for approving and standardizing Sinhala Unicode in Sri Lanka.

<span class="mw-page-title-main">Avro Keyboard</span> Graphical keyboard software

Avro Keyboard is a free and open source graphical keyboard software developed by OmicronLab for the Microsoft Windows, Linux, MacOS, and several other software additionally adapted its phonetic layout for Android and iOS operating system. It is the first free Unicode and ANSI compliant Bengali keyboard interface for Windows that was published on 26 March 2003.

<span class="mw-page-title-main">Google IME</span> Set of typing tools by Google

Google IME, also known as Google Input Tools, is a set of input method editors by Google for 22 languages, including Amharic, Arabic, Bengali, Chinese, Greek, Gujarati, Hindi, Japanese, Kannada, Malayalam, Marathi, Nepali, Persian, Punjabi, Russian, Sanskrit, Serbian, Tamil, Telugu, Tigrinya, and Urdu. It is a virtual keyboard that allows users to type in their local language text directly in any application without the hassle of copying and pasting.

Microsoft Indic Language Input Tool is a typing tool for languages written in Indic scripts. It is a virtual keyboard which allows to type Indic text directly in any application without the hassle of copying and pasting. It is available for both, online and offline use. It was released in December 2009.

The Indian blogosphere is the online predominantly community of Indian weblogs that is part of the larger blogosphere.

Bengali input methods refer to different systems developed to type the characters of the Bengali script for Bengali language and others, using a typewriter or a computer keyboard.

Clip fonts or split fonts are non-Unicode fonts that assign glyphs of Brahmic scripts, such as Devanagari, at code positions intended for glyphs of the Latin script or to produce glyphs not found in Unicode by using its Private Use Area (PUA).

<span class="mw-page-title-main">Azhagi (software)</span>

Azhagi is a freeware transliteration tool, which enables its users to type in a number of regional Indian languages, including Tamil, Hindi, and others, using an English keyboard. In 2002, The Hindu dubbed Azhagi as a tool that "stand[s] out" among various similar software "emerg[ing] nearly every other day". Since year 2000, Azhagi has provided support for Tamil transliteration; this was later expanded to nearly 13 Indian languages, featuring 16 total built-in languages as of the day of writing.

The Panini Keypad is a typing technology which has been developed by Luna Ergonomics, a subsidiary of Noida. It is an application that offers single key press input in Indian language on mobile. So far, it supports Hindi, Bengali, Assamese, Telugu, Marathi, Tamil, Gujarati, Kannada, Malayalam and Punjabi.

<span class="mw-page-title-main">Tamil keyboard</span> Keyboard layout

The Tamil keyboard is used in computers and mobile devices to input text in the Tamil script.

Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system.

<span class="mw-page-title-main">Meitei input methods</span>

Meitei input methods are the methods that allow users of computers to input texts in the Meitei script, systematically for Meitei language.

References

  1. Deadline for making handsets Indic language capable extended till 1 October 2017
  2. 1 2 3 Centre’s push for regional language support paying off – The Hindu Business Line
  3. "TDIL: Technology Development for Indian Languages Programme, India". Archived from the original on 23 March 2015. Retrieved 28 March 2015.
  4. "BharateeyaOOo" . Retrieved 28 March 2015.
  5. "Archive of Ankur Home". Ankur group, Bengalinux.org group. Archived from the original on 29 May 2005. Retrieved 26 December 2015.
  6. 1 2 3 Helping Malayalam Take the Digital Leap – The New Indian Express
  7. "Quillpad Mobile – FAQs". Archived from the original on 2 April 2015. Retrieved 28 March 2015.
  8. "SIG Report on Indian Language SMS, Nov 2010" (PDF). Archived from the original (PDF) on 2 April 2015. Retrieved 11 November 2011.
  9. "Quillpad Mobile – Hindi SMS application for your mobile phone". Archived from the original on 2 April 2015. Retrieved 28 March 2015.
  10. "Eterno Infotech". Archived from the original on 28 March 2015. Retrieved 28 March 2015.
  11. "Keypad for mobile-Keyboard for mobile-Keyboard for typing on mobile-Keypad for typing on mobile" . Retrieved 28 March 2015.
  12. This Bengaluru-Based Startup Believes It Has Built a Smarter Indic Keyboard – NDTV
  13. Honso. "MultiLing Keyboard – Android Apps on Google Play". Archived from the original on 30 January 2012. Retrieved 28 March 2015.
  14. Honso. "Plugin Hindi हिन्दी – Android Apps on Google Play" . Retrieved 28 March 2015.
  15. 1 2 Google Translate for 9 Indian languages, 11 more get keyboard support – Live Mint
  16. Frederick Noronha. "Indian-language computing: The long road ahead – Features – Technology". Infochange India. Archived from the original on 23 May 2011. Retrieved 28 March 2015.{{cite web}}: CS1 maint: unfit URL (link)
  17. 1 2 GoDaddy launches services in Hindi, Marathi and Tamil – EconomicTimes.com
  18. Google and Facebook’s attention to India might speed up Indic computing – Live Mint
  19. Gherkin Language reference
  20. TOP NLP LIBRARIES & DATASETS FOR INDIAN LANGUAGES - Analytics India Mag
  21. Google will let you translate to 7 Indian languages – Deccan Chronicle
  22. Sawhney, Shri. Ajay Prakash (30 July 2018). "India Language Stack needed to overcome the barriers of communication: IT Secretary". Press Information Bureau GoI. Retrieved 27 September 2020.
  23. Use multiple languages to speak to Siri in India
  24. Amazon India to rollout voice shopping in Hindi soon
  25. Change your language or use multiple languages
  26. We haven’t yet built the Indian internet!
  27. No English Only Vinglish: 90% New Internet Users Coming Online In India Are Non-English Speakers