Hindi-to-Punjabi Machine Translation System

Last updated

Hindi to Punjabi machine translation system, developed at Punjabi University, Patiala by Gurpreet Singh Lehal [1] and Dr. Vishal Goyal, [2] is aimed to translate Hindi text into Punjabi text. It is based on the direct approach. It includes Preprocessing (Text Normalization, Replacing Collocations, Replacing Proper Nouns), Translation Engine (Identifying Surnames, Identifying Titles, Lexicon Lookup, word-sense disambiguation, Inflection Analysis, Transliteration) and Post processing module. This system has been available online [3]

Related Research Articles

Gurmukhi Script used to write Punjabi language

Gurmukhī is an abugida developed from the Sharada/Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). Commonly regarded as a Sikh script, Gurmukhi is used in Punjab, India as the official script of the Punjabi language.

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

Punjabi language Indo-Aryan language spoken in India and Pakistan

Punjabi is an Indo-Aryan language spoken by the Punjabi people and native to the Punjab region of Pakistan and India. Punjabi is the 9th most widely spoken language in the world. Punjabi is the most widely spoken language in Pakistan and the 11th most widely spoken language in India, and the third most-spoken native language in the Indian Subcontinent. Punjabi is the 3rd most spoken language in England and the fifth most spoken native language in Canada. It also has a significant presence in the United Arab Emirates, the United States, France, Australia, New Zealand, Italy, and the Netherlands.

Devanāgarī is an Indian script used for many languages of India and Nepal, including Hindi, Marathi, Nepali and Sanskrit. There are several somewhat similar methods of transliteration from Devanāgarī to the Roman script, including the influential and lossless IAST notation.

Languages with official status in India Languages designated officiall status by the Constitution of India

There is no national language in India. However, article 343(1) of the Indian constitution specifically mentions that, "The official language of the Union shall be Hindi in Devanagari script. The form of numerals to be used for the official purposes of the Union shall be the international form of Indian numerals." The business in Indian parliament can only be transacted in Hindi or in English. English is allowed to be used in official purposes such as parliamentary proceedings, judiciary, communications between the Central Government and a State Government. There are various official languages in India at the state/territory level. States within India have the liberty and powers to specify their own official language(s) through legislation. In addition to the official languages, the constitution recognizes 22 regional languages, which include Hindi but not English, as scheduled languages.

Angami is a Naga language spoken in the Naga Hills in the northeastern part of India, in Kohima district, Nagaland. In 2001, there is an estimate of 125,000 first language (L1) Angami speakers. Under the UNESCO's Language Vitality and Endangerment framework, Angami is at the level of "vulnerable", meaning that it is still spoken by most children, but "may be restricted to certain domains".

Mahajani is a Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari, Hindi and Punjabi. It is a Brahmic script and is written left-to-right. Mahajani refers to the Hindi word for 'bankers', also known as 'sarrafi' or 'kothival' (merchant).

Google Translate Multilingual neural machine translation service

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API that helps developers build browser extensions and software applications. As of February 2022, Google Translate supports 109 languages at various levels, and as of April 2016, claimed over 500 million total users, with more than 100 billion words translated daily, after the company stated in May 2013 that it served over 200 million people daily.

Sahitya Akademi Award Literary honour awarded to authors of outstanding literary works in India

The Sahitya Akademi Award is a literary honour in India, which the Sahitya Akademi, India's National Academy of Letters, annually confers on writers of the most outstanding books of literary merit published in any of the 24 major Indian languages such as Tamil, English, Bengali, Punjabi and the 22 listed languages in the Eighth Schedule of the Indian Constitution recognised by the Sahitya Akademi, New Delhi.

Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.

Google IME Set of typing tools by Google

Google IME, also known as Google Input Tools, is a set of input method editors by Google for 22 languages, including Amharic, Arabic, Bengali, Chinese, Greek, Gujarati, Hindi, Japanese, Kannada, Malayalam, Marathi, Nepali, Persian, Punjabi, Russian, Sanskrit, Serbian, Tamil, Telugu, Tigrinya, and Urdu. It is a virtual keyboard that allows users to type in their local language text directly in any application without the hassle of copying and pasting.

Anusaaraka is an English to Hindi language accessing (translation) software, which employs algorithms derived from Pāṇini's Ashtadhyayi. The software is being developed by the Chinmaya International Foundation (CIF) at the International Institute of Information Technology, Hyderabad (IIIT-H) and the University of Hyderabad . Anusaaraka is viewed as the fusion of traditional advanced Indian shastras and advanced contemporary technologies.

Gurpreet Singh Lehal

Dr Gurpreet Singh Lehal is a professor in the Computer Science Department, Punjabi University, Patiala and Director of the Advanced Centre for Technical Development of Punjabi Language Literature and Culture. He is noted for his work in the application of computer technology in the use of the Punjabi language both in the Gurmukhi and Shahmukhi script.

Truecasing is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. This commonly comes up due to the standard practice of automatically capitalizing the first word of a sentence. It can also arise in badly cased or noncased text.

Bhai Jodh Singh was a Sikh theologian, author, mentor and social activist. He played an important role in the Singh Sabha movement. He was a recipient of the civilian honour of the Padma Bhushan.

The Eighth Schedule to the Constitution of India lists the official languages of the Republic of India. At the time when the Constitution was enacted, inclusion in this list meant that the language was entitled to representation on the Official Languages Commission, and that the language would be one of the bases that would be drawn upon to enrich Hindi and English, the official languages of the Union. The list has since, however, acquired further significance. The Government of India is now under an obligation to take measures for the development of these languages, such that "they grow rapidly in richness and become effective means of communicating modern knowledge." In addition, a candidate appearing in an examination conducted for public service is entitled to use any of these languages as the medium in which he or she answers the paper.

Yandex Translate Translation web service by Yandex

Yandex Translate is a web service provided by Yandex, intended for the translation of text or web pages into another language.

Google Neural Machine Translation (GNMT) is a neural machine translation (NMT) system developed by Google and introduced in November 2016, that uses an artificial neural network to increase fluency and accuracy in Google Translate.

References

  1. "Archived copy" (PDF). Archived from the original (PDF) on 2016-12-31. Retrieved 2015-10-16.{{cite web}}: CS1 maint: archived copy as title (link)
  2. "Translate Hindi Text to Punjabi Text". H2p.learnpunjabi.org. Retrieved 2015-07-02.