Medical intelligence and language engineering lab

Last updated

The Medical Intelligence and Language Engineering Laboratory, also known as the MILE lab, is a research laboratory at the Indian Institute of Science, Bangalore under the Department of Electrical Engineering. The lab is known for its work on Image processing, online handwriting recognition, Text-To-Speech and Optical character recognition [1] systems, all of which are focused mainly on documents and speech in Indian languages. [2] The lab is headed by A. G. Ramakrishnan. [3]

Contents

Research focus

One of the commitments of the MILE lab is the development of technology for people with visual impairment to harness knowledge from any available printed material in Indian languages. [4] The lab is working towards reaching this goal. Its work so far has included: document mosaicing of coloured, camera captured images ; text extraction from complex colour images, including camera captured images; document layout analysis; detection of broken and merged characters; OCR technology for Tamil and Kannada; [5] text to speech conversion in Tamil and Kannada; [6] pitch modification using discrete cosine transform in the source domain; [7] automated part of speech tagging; phrase prediction and prosody modeling.

Mozhi Vallan, the Tamil OCR [8] product developed by MILE Lab, is being used by Worth Trust and Karna Vidya Technology Centre, Chennai [9] for the conversion of printed school and college books to Braille format. Sri Ramakrishna Math, Chennai [10] is using it to convert their printed philosophical books in Tamil to computer readable text. Lipi Gnani, the Kannada OCR developed by MILE Lab is being used by Braille Transcription Centers of Mitrajyothi [11] and Canara Bank Relief & Welfare Society, [12] Bangalore for similar purposes. Also, Thirukkural, [13] the Tamil TTS system [14] developed by MILE Lab is being used by some school teachers in Singapore for assignments. Madhura, the Kannada TTS [15] developed by the lab, is being used by two blind students, integrated with a screen reader, to read aloud text OCR'ed with Lipi Gnani from Kannada books. Currently, the lab is researching on machine listening [16] and a novel temporal feature named as plosion index has been proposed, which has been shown to be extremely effective in detecting closure-burst transitions of stop consonants and affricates from continuous speech, even in noise. [17] Another feature proposed is DCTILPR, [18] which is a voice source based feature vector that improves the recognition performance of a speaker identification system.

In the early days, significant work was carried out in medical signal and image processing. A unique algorithm was proposed for ECG compression by treating each cardiac cycle as a vector, and applying linear prediction on the discrete wavelet transform of this vector, after normalizing its period using multirate processing based interpolation. [19] The maturity of the fetal lung was predicted using image texture features obtained from the liver and lung regions of the ultrasound images obtained from pregnant women [20] An effective technique was proposed for lossless compression of 3D magnetic resonance images of the brain. Each MRI slice was represented by uniform or adaptive mesh; affine transformation was applied between the corresponding mesh elements of adjacent slices and context-based entropy coding, on the residues. [21]

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

<span class="mw-page-title-main">Point cloud</span> Set of data points in three-dimensional space

A point cloud is a discrete set of data points in space. The points may represent a 3D shape or object. Each point position has its set of Cartesian coordinates. Points may contain data other than position such as RGB colors, normals, timestamps and others. Point clouds are generally produced by 3D scanners or by photogrammetry software, which measure many points on the external surfaces of objects around them. As the output of 3D scanning processes, point clouds are used for many purposes, including to create 3D computer-aided design (CAD) or geographic information systems (GIS) models for manufactured parts, for metrology and quality inspection, and for a multitude of visualizing, animating, rendering, and mass customization applications.

<span class="mw-page-title-main">Gabor filter</span> Linear filter used for texture analysis

In image processing, a Gabor filter, named after Dennis Gabor, who first proposed it as a 1D filter. The Gabor filter was first generalized to 2D by Gösta Granlund, by adding a reference direction. The Gabor filter is a linear filter used for texture analysis, which essentially means that it analyzes whether there is any specific frequency content in the image in specific directions in a localized region around the point or region of analysis. Frequency and orientation representations of Gabor filters are claimed by many contemporary vision scientists to be similar to those of the human visual system. They have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave.

<span class="mw-page-title-main">Abraham Lempel</span> Israeli computer scientist (1936–2023)

Abraham Lempel was an Israeli computer scientist and one of the fathers of the LZ family of lossless data compression algorithms.

<span class="mw-page-title-main">Thomas Huang</span> Chinese-American engineer and computer scientist (1936–2020)

Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.

<span class="mw-page-title-main">Mylswamy Annadurai</span> Indian scientist (born 1958)

Mylswamy Annadurai is an Indian scientist working as vice president for Tamil Nadu State Council for Science and Technology, Chairman, Board of Governors, National Design and Research Forum. He is often dubbed as the "Moon Man of India".

<span class="mw-page-title-main">Centre for Development of Advanced Computing</span> Autonomous scientific society

The Centre for Development of Advanced Computing (C-DAC) is an Indian autonomous scientific society, operating under the Ministry of Electronics and Information Technology.

Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.

<span class="mw-page-title-main">Nasir Ahmed (engineer)</span> Indian-American electrical engineer and computer scientist (born 1940)

Nasir Ahmed is an Indian-American electrical engineer and computer scientist. He is Professor Emeritus of Electrical and Computer Engineering at University of New Mexico (UNM). He is best known for inventing the discrete cosine transform (DCT) in the early 1970s. The DCT is the most widely used data compression transformation, the basis for most digital media standards and commonly used in digital signal processing. He also described the discrete sine transform (DST), which is related to the DCT.

<span class="mw-page-title-main">Optical braille recognition</span> Automated recognition of braille characters

Optical braille recognition is technology to capture and process images of braille characters into natural language characters. It is used to convert braille documents for people who cannot read them into text, and for preservation and reproduction of the documents.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

<span class="mw-page-title-main">A. G. Ramakrishnan</span> Indian electrical engineering professor (born 1958)

Angarai Ganesan Ramakrishnan is a senior professor of electrical engineering and an associate faculty of Centre for Neuroscience, both at Indian Institute of Science, Bangalore, India. He also holds an adjunct faculty position in the Department of Heritage Science and Technology at the Indian Institute of Technology, Hyderabad. He heads the Medical intelligence and language engineering lab. He also won the Manthan Award 2015 for his project, “Madhura - the gift of voice”, under the category, e-education, learning, and employment. He is also one of the founder directors of RaGaVeRa Indic Technologies private limited recognized by the Karnataka Government as one of the Elevate 2019 Startup winners. He is one of the advisors of Bhashini AI Solutions private limited also recognized by the Karnataka Government as one of the Elevate 2019 Startup winners. From January 2017 to June 2020, he was a Member of the Karnataka Knowledge Commission. He is a Fellow of the Indian National Academy of Engineering (INAE) since November 2019. Since August 2022, he is also the Advisor - Neuroscience to Feedfront Technologies Pvt. Ltd., a startup based in Bangalore.

Debasish Ghose is a professor at Department of Aerospace Engineering, Indian Institute of Science. He is believed to have initiated work on cooperative control in India, having pioneered research on Intelligent control and multi-agents. He founded the first mobile robotics lab in India i.e. Mobile Robotics Laboratory at IISc in 2002. He is known for his early work in swarm intelligence, distributed computing and game theory. His primary research is in Guidance and control of autonomous vehicles, although, current interest is in Computational intelligence i.e. Machine Learning for Aerial Robotics.

<span class="mw-page-title-main">Vaidyeswaran Rajaraman</span> Indian engineer, academic and writer (born 1933)

Vaidyeswaran Rajaraman is an Indian engineer, academic and writer, known for his pioneering efforts in the field of Computer Science Education in India. He is credited with the establishment of the first academic program in computer science in India, which he helped initiate at the Indian Institute of Technology, Kanpur in 1965. An elected fellow of all the Indian science academies, he is a recipient of Shanti Swarup Bhatnagar Prize, the highest Indian award in Science and Technology category for young scientists and several other honors including Om Prakash Bhasin Award and Homi Bhabha Prize. The Government of India awarded him the third highest civilian honor of the Padma Bhushan, in 1998, for his contributions to science.

<span class="mw-page-title-main">Scene text</span> Text captured as part of outdoor surroundings in a photograph

Scene text is text that appears in an image captured by a camera in an outdoor environment.

References

  1. "MILE Lab at IISc: Developing technologies to enable the specially abled".
  2. MILE Lab. "MILE Lab in news" . Retrieved 28 April 2013.
  3. MILE Lab. "People". Archived from the original on 3 September 2014. Retrieved 28 April 2013.
  4. "Walking an extra MILE for the specially abled - Bangalore Mirror".
  5. Pati, Peeta Basa; Ramakrishnan, A.G. (2008). "Word level multiscript identification". Pattern Recognition Letters. 29 (9): 1218–1229. doi:10.1016/j.patrec.2008.01.027.
  6. "Shiva Kumar H R, Ashwini J K, Rajaram B S R and A G Ramakrishnan, "MILE TTS for Tamil and Kannada for blizzard challenge 2013," Proc. Blizzard Challenge Workshop, Barcelona, Spain, Sept 3, 2013" (PDF).
  7. "Pitch synchronous pitch modification". Speech Communication. 42: 143–154. doi:10.1016/j.specom.2003.05.001.
  8. Subramanian, Karthik (17 January 2014). "Article in The Hindu on MILE Lab Tamil OCR". The Hindu.
  9. "Karna Vidya Technology Centre, Guindy, Chennai".
  10. "Sri Ramakrishna Math, Chennai".
  11. "Mitrajyothi Braille Transcription Centre, Bangalore". Archived from the original on 3 February 2011.
  12. "Braille Transcription Centre, Canara Bank Relief & Welfare Society, Bangalore".
  13. Jayavardhana Rama, G.L.; Ramakrishnan, A.G.; Muralishankar, R.; Prathibha, R. (2002). "A complete text-to-speech synthesis system in Tamil" (PDF). Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. pp. 191–194. doi:10.1109/WSS.2002.1224406. ISBN   0-7803-7395-2. S2CID   13870581.
  14. "Blog in Tamil Manam on Thirukkural Tamil TTS".
  15. "Deccan Herald: IISc develops text-to-speech software for Kannada, Tamil". 26 June 2010.
  16. "MILE Lab research focus".
  17. Ananthapadmanabha, T. V.; Prathosh, A. P.; Ramakrishnan, A. G. (2014). "Plosion index, a temporal feature to detect bursts in stops and affricates". The Journal of the Acoustical Society of America. 135 (1): 460–71. doi:10.1121/1.4836055. PMID   24437786.
  18. Ramakrishnan, A. G.; Abhiram, B.; Prasanna, S. R. (2015). "A G Ramakrishnan, B Abhiram and S R Mahadeva Prasanna, "Voice source characterization using pitch synchronous discrete cosine transform for speaker identification," Journal of the Acoustical Society of America Express Letters, Vol. 137(), pp., 2015". The Journal of the Acoustical Society of America. 137 (6): EL469-75. doi: 10.1121/1.4921679 . PMID   26093457.
  19. Ramakrishnan, A. G.; Saha, S. (1997). "Cardiac cycle synchronized compression of ECG" (PDF). IEEE Transactions on Bio-Medical Engineering. 44 (12): 1253–61. doi:10.1109/10.649997. PMID   9401225. S2CID   8834327.
  20. Prakash, K. N.; Ramakrishnan, A. G.; Suresh, S.; Chow, T. W. (2002). "Predicting maturity of fetal lung from ultrasound image features" (PDF). IEEE Transactions on Information Technology in Biomedicine. 6 (1): 38–45. doi:10.1109/4233.992160. PMID   11936595. S2CID   14662967.
  21. Srikanth, R.; Ramakrishnan, A. G. (2005). "3D brain MRI compression using adaptive mesh and contextual encoding" (PDF). IEEE Transactions on Medical Imaging. 24 (9): 1199–206. doi:10.1109/TMI.2005.853638. PMID   16156357. S2CID   7523030.