The Medical Intelligence and Language Engineering Laboratory, also known as the MILE lab, is a research laboratory at the Indian Institute of Science, Bangalore under the Department of Electrical Engineering. The lab is known for its work on Image processing, online handwriting recognition, Text-To-Speech and Optical character recognition [1] systems, all of which are focused mainly on documents and speech in Indian languages. [2] The lab is headed by A. G. Ramakrishnan. [3]
One of the commitments of the MILE lab is the development of technology for people with visual impairment to harness knowledge from any available printed material in Indian languages. [4] The lab is working towards reaching this goal. Its work so far has included: document mosaicing of coloured, camera captured images ; text extraction from complex colour images, including camera captured images; document layout analysis; detection of broken and merged characters; OCR technology for Tamil and Kannada; [5] text to speech conversion in Tamil and Kannada; [6] pitch modification using discrete cosine transform in the source domain; [7] automated part of speech tagging; phrase prediction and prosody modeling.
Mozhi Vallan, the Tamil OCR [8] product developed by MILE Lab, is being used by Worth Trust and Karna Vidya Technology Centre, Chennai [9] for the conversion of printed school and college books to Braille format. Sri Ramakrishna Math, Chennai [10] is using it to convert their printed philosophical books in Tamil to computer readable text. Lipi Gnani, the Kannada OCR developed by MILE Lab is being used by Braille Transcription Centers of Mitrajyothi [11] and Canara Bank Relief & Welfare Society, [12] Bangalore for similar purposes. Also, Thirukkural, [13] the Tamil TTS system [14] developed by MILE Lab is being used by some school teachers in Singapore for assignments. Madhura, the Kannada TTS [15] developed by the lab, is being used by two blind students, integrated with a screen reader, to read aloud text OCR'ed with Lipi Gnani from Kannada books. Currently, the lab is researching on machine listening [16] and a novel temporal feature named as plosion index has been proposed, which has been shown to be extremely effective in detecting closure-burst transitions of stop consonants and affricates from continuous speech, even in noise. [17] Another feature proposed is DCTILPR, [18] which is a voice source based feature vector that improves the recognition performance of a speaker identification system.
In the early days, significant work was carried out in medical signal and image processing. A unique algorithm was proposed for ECG compression by treating each cardiac cycle as a vector, and applying linear prediction on the discrete wavelet transform of this vector, after normalizing its period using multirate processing based interpolation. [19] The maturity of the fetal lung was predicted using image texture features obtained from the liver and lung regions of the ultrasound images obtained from pregnant women [20] An effective technique was proposed for lossless compression of 3D magnetic resonance images of the brain. Each MRI slice was represented by uniform or adaptive mesh; affine transformation was applied between the corresponding mesh elements of adjacent slices and context-based entropy coding, on the residues. [21]
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.
A point cloud is a discrete set of data points in space. The points may represent a 3D shape or object. Each point position has its set of Cartesian coordinates. Points may contain data other than position such as RGB colors, normals, timestamps and others. Point clouds are generally produced by 3D scanners or by photogrammetry software, which measure many points on the external surfaces of objects around them. As the output of 3D scanning processes, point clouds are used for many purposes, including to create 3D computer-aided design (CAD) or geographic information systems (GIS) models for manufactured parts, for metrology and quality inspection, and for a multitude of visualizing, animating, rendering, and mass customization applications.
In image processing, a Gabor filter, named after Dennis Gabor, who first proposed it as a 1D filter. The Gabor filter was first generalized to 2D by Gösta Granlund, by adding a reference direction. The Gabor filter is a linear filter used for texture analysis, which essentially means that it analyzes whether there is any specific frequency content in the image in specific directions in a localized region around the point or region of analysis. Frequency and orientation representations of Gabor filters are claimed by many contemporary vision scientists to be similar to those of the human visual system. They have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave.
Abraham Lempel was an Israeli computer scientist and one of the fathers of the LZ family of lossless data compression algorithms.
Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.
Mylswamy Annadurai is an Indian scientist working as vice president for Tamil Nadu State Council for Science and Technology, Chairman, Board of Governors, National Design and Research Forum. He is often dubbed as the "Moon Man of India".
The Centre for Development of Advanced Computing (C-DAC) is an Indian autonomous scientific society, operating under the Ministry of Electronics and Information Technology.
Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.
Nasir Ahmed is an Indian-American electrical engineer and computer scientist. He is Professor Emeritus of Electrical and Computer Engineering at University of New Mexico (UNM). He is best known for inventing the discrete cosine transform (DCT) in the early 1970s. The DCT is the most widely used data compression transformation, the basis for most digital media standards and commonly used in digital signal processing. He also described the discrete sine transform (DST), which is related to the DCT.
Optical braille recognition is technology to capture and process images of braille characters into natural language characters. It is used to convert braille documents for people who cannot read them into text, and for preservation and reproduction of the documents.
Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.
An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.
Angarai Ganesan Ramakrishnan is a senior professor of electrical engineering and an associate faculty of Centre for Neuroscience, both at Indian Institute of Science, Bangalore, India. He also holds an adjunct faculty position in the Department of Heritage Science and Technology at the Indian Institute of Technology, Hyderabad. He heads the Medical intelligence and language engineering lab. He also won the Manthan Award 2015 for his project, “Madhura - the gift of voice”, under the category, e-education, learning, and employment. He is also one of the founder directors of RaGaVeRa Indic Technologies private limited recognized by the Karnataka Government as one of the Elevate 2019 Startup winners. He is one of the advisors of Bhashini AI Solutions private limited also recognized by the Karnataka Government as one of the Elevate 2019 Startup winners. From January 2017 to June 2020, he was a Member of the Karnataka Knowledge Commission. He is a Fellow of the Indian National Academy of Engineering (INAE) since November 2019. Since August 2022, he is also the Advisor - Neuroscience to Feedfront Technologies Pvt. Ltd., a startup based in Bangalore.
Debasish Ghose is a professor at Department of Aerospace Engineering, Indian Institute of Science. He is believed to have initiated work on cooperative control in India, having pioneered research on Intelligent control and multi-agents. He founded the first mobile robotics lab in India i.e. Mobile Robotics Laboratory at IISc in 2002. He is known for his early work in swarm intelligence, distributed computing and game theory. His primary research is in Guidance and control of autonomous vehicles, although, current interest is in Computational intelligence i.e. Machine Learning for Aerial Robotics.
Vaidyeswaran Rajaraman is an Indian engineer, academic and writer, known for his pioneering efforts in the field of Computer Science Education in India. He is credited with the establishment of the first academic program in computer science in India, which he helped initiate at the Indian Institute of Technology, Kanpur in 1965. An elected fellow of all the Indian science academies, he is a recipient of Shanti Swarup Bhatnagar Prize, the highest Indian award in Science and Technology category for young scientists and several other honors including Om Prakash Bhasin Award and Homi Bhabha Prize. The Government of India awarded him the third highest civilian honor of the Padma Bhushan, in 1998, for his contributions to science.
Scene text is text that appears in an image captured by a camera in an outdoor environment.