Gurpreet Singh Lehal

Last updated

Gurpreet Singh Lehal
Gurpreet Singh Lehal.jpg
Gurpreet Singh Lehal
Born (1963-02-06) 6 February 1963 (age 60)
Alma mater Panjab University
Occupation(s)Professor in the Computer Science Department, Punjabi University, Patiala

Gurpreet Singh Lehal (born 6 February 1963) is a professor in the Computer Science Department, Punjabi University, Patiala and Director of the Advanced Centre for Technical Development of Punjabi Language Literature and Culture. He is noted for his work in the application of computer technology in the use of the Punjabi language both in the Gurmukhi and Shahmukhi script.

Contents

A post graduate in Mathematics from Panjab University, he did his master's degree in Computer Science from Thapar Institute of Engineering and Technology and Ph.D. in Computer Science on Gurmukhi Optical Character Recognition (OCR) System from Punjabi University, Patiala.

Background

As a researcher Lehal's main contribution has been development of technologies related to the computerization of the Punjabi language. [1] Prominent among these are first Gurmukhi OCR, first bilingual Gurmukhi/Roman OCR, first Punjabi font identification and conversion system, first multi-font Punjabi spell checker, first high accuracy Gurmukhi-Shahmukhi and Shahmukhi-Gurmukhi transliteration systems and first Intelligent Predictive Roman-Gurmukhi transliteration techniques for simplifying Punjabi typing. Lehal has published more than 100 research papers in various national and international journals and conference proceedings. [2] Lehal has handled research projects worth more than 43 million Rupees, including three international projects, which were awarded in an open competition among contestants from more than 30 countries. As a software engineer, Lehal has developed more than 25 software systems, including the first commercial Punjabi word processor, Akhar. As an academician, Lehal has taught and supervised research activities of postgraduate and doctorate students. He has guided more than 100 postgraduate Research scholars and 11 PhD students on various topics related to the computerization of Punjabi, Hindi, Urdu and Sindhi languages. [3]

Work

Lehal has been working for more than fifteen years on different projects related to computerization of Punjabi, Hindi, Urdu and Sindhi languages and has been a pioneer in developing technical solutions for these languages. For the first time, many new technologies have been developed by him including Intelligent Predictive Roman-Gurmukhi transliteration techniques for simplifying Punjabi typing, Punjabi spell checker, Intelligent Punjabi and Hindi font converter, bilingual Gurmukhi/Roman OCR and Sindhi-Devnagri transliteration. Many other products for popularizing Punjabi and breaking the script and language barriers have been developed under his leadership. Some of these products which are being widely used include a multi-media based website for Punjabi teaching, Gurmukhi-Shahmukhi transliteration utility, Punjabi-Hindi translation software, Urdu-Hindi transliteration software, Punjabi Search Engine, Punjabi Text-to-Speech Synthesis System, Punjabi text summarization system and Punjabi grammar checker. [4] [5] [6]

Language Software and Technologies developed

Related Research Articles

<span class="mw-page-title-main">Gurmukhi</span> Script used to write the Punjabi language

Gurmukhī or Gurumukhī is an Indic script predominantly used in present-day Punjab, India. It is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). It is commonly regarded as a Sikh script, used by Punjabi Sikhs to write the Punjabi language, and is one of the official scripts of the Indian Republic, while the Arabic-based Shahmukhi script is used in Punjab, Pakistan.

<span class="mw-page-title-main">Punjabi language</span> Indo-Aryan language native to the Punjab

Punjabi, sometimes spelled Panjabi, is an Indo-Aryan language native to the Punjab region of Pakistan and India. It has approximately 113 million native speakers.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanized Devanagari is also called Romanagari.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

<span class="mw-page-title-main">Shahmukhi</span> Perso-Arabic script used to write the Punjabi language

Shahmukhi is an Abjad developed from the Perso-Arabic alphabet script, used for the Punjabi language. It came into use in Punjabi Sufi literature, from the 12th century and onwards. It is generally written in the Nastaʿlīq calligraphic hand, which is also used for Urdu. Shahmukhi script is the standard script in Pakistani Punjab used for Punjabi. Perso-Arabic is one of two scripts used for Punjabi, the other being Gurmukhi used in the Indian Punjab.

<span class="mw-page-title-main">Kafi</span>

Kafi is a classical form of Sufi music mostly in the Siraiki, Punjabi and Sindhi languages and originating from the Punjab, and Sindh regions in the Indian subcontinent. Some well-known Kafi poets are Baba Farid, Bulleh Shah, Shah Hussain, Shah Abdul Latif Bhittai, Sachal Sarmast and Khwaja Ghulam Farid. This poetry style has also lent itself to the Kafi genre of singing, popular throughout South Asia, especially Pakistan, Bangladesh and India. Over the years, both Kafi poetry and its rendition have experienced rapid growth phases as various poets and vocalists added their own influences to the form, creating a rich and varied poetic form, yet through it all it remained centered on the dialogue between the Soul and the Creator, symbolized by the murid (disciple) and his Murshid (Master), and often by lover and his Beloved.

The nuqta, also known as bindu, is a diacritic mark that was introduced in Devanagari and some other Indic scripts to represent sounds not present in the original scripts. It takes the form of a dot placed below a character. This idea is inspired from the Arabic script; for example, there are some letters in Urdu that share the same basic shape but differ in the placement of dots(s) or nuqta(s) in the Perso-Arabic script: the letter ع ayn, with the addition of a nuqta on top, becomes the letter غ g͟hayn.

<span class="mw-page-title-main">Tesseract (software)</span> Free optical character recognition engine

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.

Lahnda, also known as Lahndi or Western Punjabi, is a group of north-western Indo-Aryan language varieties spoken in parts of Pakistan and India. It is defined in the ISO 639 standard as a "macrolanguage" or as a "series of dialects" by other authors. Its validity as a genetic grouping is not certain. The terms "Lahnda" and "Western Punjabi" are exonyms employed by linguists, and are not used by the speakers themselves.

<span class="mw-page-title-main">OCRopus</span>

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.

There are three writing systems for Saraiki:

<span class="mw-page-title-main">Punjabi Wikipedia</span> Punjabi language edition of Wikipedia

The Punjabi Wikipedia is the Punjabi language edition of Wikipedia, the free encyclopedia. There are two Punjabi Wikipedia editions viz. Eastern Punjabi Wikipedia and Western Punjabi Wikipedia.

Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system.

Hindi–Urdu is the lingua franca of modern-day Northern India and Pakistan. Modern Standard Hindi is officially registered in Indian Republic as a standard written using Devanagari script, and Urdu is officially registered in Pakistan as a standard written using extended Perso-Arabic script.

Sindhi is a language broadly spoken by the people of the historical Sindh region in the Indo subcontinent. Modern Sindhi is written in an extended Perso-Arabic script in Sindh province of Pakistan and (formally) in extended-Devanagari by Sindhis in partitioned India. Historically, Sindhi was written in various forms of Landa scripts and various other Indic scripts.

Kashmiri Transliteration refers to the conversion of the Kashmiri language between different scripts that is used to write the language in the Kashmir region of the Indo subcontinent. The official script to write Kashmiri is extended-Perso-Arabic script in both Jammu-Kashmir and Azad-Kashmir cutting across religious boundaries. Some sections of the Kashmiri Hindu community use an extended-Devanagari script to write the language. Transliteration is hence essential to cross this script-barrier imposed by religious affiliations and convert texts to cater all the Kashmiri people.

<span class="mw-page-title-main">Bharati script</span> Proposed common script for Indian languages

Bharati Script is a constructed script, and an abugida created by a research team led by V. Srinivasa Chakravarthy at IIT Madras. It is designed to serve as a common script or link script for Indian languages.

References

  1. Singh, Roopinder (21 August 2004). "Balle Balle software". Tribune India.
  2. Dhaliwal, Sarabjit (6 September 2004). "Software to convert Punjabi script to Shahmukhi script". Tribune India. Tribune News Service.
  3. "Software to melt India, Pakistan's Sindhi script barrier". 3 September 2014. Archived from the original on 23 September 2023. Retrieved 23 September 2023 via The Economic Times - The Times of India.
  4. "Breaking the script barrier". Archived from the original on 1 June 2009. Retrieved 23 September 2023.
  5. "Punjabi varsity develops 'text-to-speech' software for blind". 22 December 2012. Archived from the original on 23 September 2023. Retrieved 23 September 2023 via The Economic Times - The Times of India.
  6. "Patiala University's online Punjabi spellchecker hailed". Archived from the original on 31 August 2014. Retrieved 6 October 2014.