Rita Singh

Last updated
Rita Singh
Born
India
Alma mater
Known for Artificial intelligence, deep learning, voice analysis, voice forensics
Scientific career
Fields Artificial intelligence, machine learning, cryptography
Institutions Carnegie Mellon University, University of Pittsburgh
Website mlsp.cs.cmu.edu/people/rsingh/index.html

Rita Singh is a computer scientist known for her work in the algorithmic dimensions of voice recognition technologies and the application of artificial intelligence in voice forensics. [1] She holds a position as a Research Faculty in the Language Technologies Institute of the School of Computer Science of Carnegie Mellon University. She led global conversations at the World Economic Forum on topics related to voice technologies. [2] [3] Singh is a founder and technology director of Center for Voice Intelligence and Security (CVIS), an organization dedicated to pioneering developments in voice technology and its security implications. [4]

Contents

Biography

Early Education and Career

Rita Singh completed her early academic pursuits in India, where she received a Bachelor of Science (Hons.) degree in physics and a Master of Science degree in Exploration Geophysics, both from Banaras Hindu University. Her academic journey in geophysics further extended to earning a PhD from the National Geophysical Research Institute of the Council of Scientific and Industrial Research, India, in 1996. [5]

Postdoctoral Fellowship and Initial Research

After completing her PhD, Singh joined the Tata Institute of Fundamental Research in India as a postdoctoral fellow from March 1996 to November 1997. During her fellowship, she was part of the Condensed Matter Physics and Computer Systems and Communications Groups. Her research focused on nonlinear dynamical systems and signal processing, building upon her doctoral work in nonlinear geodynamics and chaos. [5]

Transition to Carnegie Mellon University

In November 1997, Singh transitioned to Carnegie Mellon University (CMU) in Pittsburgh, PA, USA, where she became a member of the research faculty at Language Technologies Institute in the School of Computer Science. At CMU, she affiliated herself with the Robust Speech Recognition and SPHINX Groups. [6] In 2020, she founded Center for Voice Intelligence and Security (CVIS) and currently serves as a technology director.[ citation needed ]

Research

Singh researches primarily in machine learning, deep learning for computer voice recognition, and artificial intelligence applied to voice forensics. [7]

Contributions to Speech Recognition and Audio Processing

Singh's work in computer speech recognition and general audio processing began in 1997. Her research until 2014 encompassed a broad spectrum of topics within this domain. [8] She developed algorithms that contributed to making speech processing systems language-agnostic, automated the discovery and learning of information from speech, and enabled speech processing with minimal reliance on external human-generated knowledge. Her objectives were to enhance automation, devise more effective search strategies, scale up learning algorithms for voice processing systems, and improve their accuracy in complex acoustic environments, including those with high levels of noise. [9]

Human Profiling through Voice Analysis

In December 2014, Singh pioneered the development of the science of profiling humans from their voice. This innovative field involves deducing various human parameters based solely on voice analysis. [10] Singh posits that the human voice, akin to DNA and fingerprints, is unique to each individual and contains an abundance of information about physical, physiological, medical, psychological, sociological, and behavioral aspects, among others. [11] [12] [13] Her approach is grounded in the quantitative analysis of voice signals, leveraging the principles of physics and bio-mechanics of human voice production - leading to an estimation of 3D portrait of a person. [2] [14] A key feature of her methodology is its language-agnostic nature, focusing on the voice signal rather than the semantic or pragmatic content. [15] [16] This language-agnostic nature significantly enhances the application of vocal biomarker technology in medical diagnostics, including the detection of neuromuscular conditions affecting the upper-respiratory tract, as well as diseases such as COVID-19, [17] [18] Alzheimer's, Parkinson's, and coronary artery disease. [19] [20]

Current Endeavors and Future Aspirations

Singh's current research involves designing advanced AI systems to delve into the rich information harbored in the human voice. These systems are being developed for various purposes, including genetic discovery, biomarker identification, and exploring aspects of the human physical state and psyche, such as emotions and personality. Parallel to her work in human profiling, Singh is also engaged in developing core designs for universal speech and audio processing AI systems. Her vision is to create a system capable of replicating the brain's response to multi-sensory inputs. This ambitious project involves not only advanced computing but also integrating aspects of mobility. Singh is actively working on these dimensions to bring her vision to fruition. [1]

Teaching

She teaches the CMU course 11-785 Introduction to Deep Learning and 11-860 Quantum Computing, Cryptography, and Machine Learning Lab.

Previously, she taught:

Books

Singh contributed one chapter to "Techniques for Noise Robustness in Automatic Speech Recognition" (2012) by Wiley. [21] In 2019, she wrote and published "Profiling Humans from their Voice" (2019) by Springer, Singapore [22]

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Carnegie Mellon School of Computer Science</span> School for computer science in the United States

The School of Computer Science (SCS) at Carnegie Mellon University in Pittsburgh, Pennsylvania, US is a school for computer science established in 1988. It has been consistently ranked among the top computer science programs over the decades. As of 2022 U.S. News & World Report ranks the graduate program as tied for second with Stanford University and University of California, Berkeley. It is ranked second in the United States on Computer Science Open Rankings, which combines scores from multiple independent rankings.

<span class="mw-page-title-main">Raj Reddy</span> Indian-American computer scientist (born 1937)

Dabbala Rajagopal "Raj" Reddy is an Indian-born American computer scientist and a winner of the Turing Award. He is one of the early pioneers of artificial intelligence and has served on the faculty of Stanford and Carnegie Mellon for over 50 years. He was the founding director of the Robotics Institute at Carnegie Mellon University. He was instrumental in helping to create Rajiv Gandhi University of Knowledge Technologies in India, to cater to the educational needs of the low-income, gifted, rural youth. He was the founding chairman of International Institute of Information Technology, Hyderabad. He is the first person of Asian origin to receive the Turing Award, in 1994, known as the Nobel Prize of Computer Science, for his work in the field of artificial intelligence.

The following outline is provided as an overview of and topical guide to artificial intelligence:

<span class="mw-page-title-main">Jaime Carbonell</span> American computer scientist (1953–2020)

Jaime Guillermo Carbonell was a computer scientist who made seminal contributions to the development of natural language processing tools and technologies. His extensive research in machine translation resulted in the development of several state-of-the-art language translation and artificial intelligence systems. He earned his B.S. degrees in Physics and in Mathematics from MIT in 1975 and did his Ph.D. under Dr. Roger Schank at Yale University in 1979. He joined Carnegie Mellon University as an assistant professor of computer science in 1979 and lived in Pittsburgh from then. He was affiliated with the Language Technologies Institute, Computer Science Department, Machine Learning Department, and Computational Biology Department at Carnegie Mellon.

<span class="mw-page-title-main">Manuela M. Veloso</span> Portuguese-American computer scientist

Manuela Maria Veloso is the Head of J.P. Morgan AI Research & Herbert A. Simon University Professor Emeritus in the School of Computer Science at Carnegie Mellon University, where she was previously Head of the Machine Learning Department. She served as president of Association for the Advancement of Artificial Intelligence (AAAI) until 2014, and the co-founder and a Past President of the RoboCup Federation. She is a fellow of AAAI, Institute of Electrical and Electronics Engineers (IEEE), American Association for the Advancement of Science (AAAS), and Association for Computing Machinery (ACM). She is an international expert in artificial intelligence and robotics.

<span class="mw-page-title-main">Xuedong Huang</span> American computer scientist

Xuedong David Huang is a Chinese American computer scientist and technology executive who has made contributions to spoken language processing and artificial intelligence, including Azure AI Services. He is Zoom's chief technology officer after serving as Microsoft's Technical Fellow and Azure AI Chief Technology Officer for 30 years. Huang is a strong advocate of AI for Accessibility, and AI for Cultural Heritage.

Louis-Philippe Morency is a French Canadian researcher interested in human communication and machine learning applied to a better understanding of human behavior.

Artificial intelligence and music (AIM) is a common subject in the International Computer Music Conference, the Computing Society Conference and the International Joint Conference on Artificial Intelligence. The first International Computer Music Conference (ICMC) was held in 1974 at Michigan State University. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Alex Waibel made pioneering contributions to speech translation systems, breaking down language barriers through cross-lingual speech communication. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN 1987 at ATR in Japan.

<span class="mw-page-title-main">Fei-Fei Li</span> Chinese American computer scientist (born 1976)

Fei-Fei Li is a China-born American computer scientist, known for establishing ImageNet, the dataset that enabled rapid advances in computer vision in the 2010s. She is Sequoia Capital professor of computer science at Stanford University and former board director at Twitter. Li is a co-director of the Stanford Institute for Human-Centered Artificial Intelligence and a co-director of the Stanford Vision and Learning Lab. She served as the director of the Stanford Artificial Intelligence Laboratory from 2013 to 2018.

Artificial empathy or computational empathy is the development of AI systems—such as companion robots or virtual agents—that can detect emotions and respond to them in an empathic way.

Kristen Lorraine Grauman is a Professor of Computer Science at the University of Texas at Austin on leave as a research scientist at Facebook AI Research (FAIR). She works on computer vision and machine learning.

William "Chuck" Easttom II is an American computer scientist specializing in cyber security, cryptography, quantum computing, and systems engineering.

<span class="mw-page-title-main">Noor Shaker</span> Syrian-Danish computer scientist and entrepreneur

Noor Shaker is a Syrian British entrepreneur and computer scientist who co-founded the AI for drug discovery start-up Glamorous AI. Glamorous AI was acquired by the US-based company X-Chem in Nov 2021. Before Glamorous AI, Noor founded the drug discovery start-up GTN Ltd and served as CEO for more than two years. She stepped down as CEO in August 2019. The company entered liquidation in March 2020. In 2018, she received a CogX UK Rising Star Award from Prime Minister Theresa May for "AI technology that will transform drug discovery to treat chronic diseases".

<span class="mw-page-title-main">Voice computing</span> Discipline in computing

Voice computing is the discipline that develops hardware or software to process voice inputs.

Emily Mower Provost is a professor of computer science at the University of Michigan. She directs the Computational Human-Centered Artificial Intelligence (CHAI) Laboratory.

<span class="mw-page-title-main">Rita Cucchiara</span> Italian electrical and computer engineer (born 1965)

Rita Cucchiara is an Italian electrical and computer engineer, and professor in Computer engineering and Science in the Enzo Ferrari Department of Engineering at the University of Modena and Reggio Emilia (UNIMORE) in Italy. She helds the courses of “Computer Architecture” and “Computer Vision and Cognitive Systems”. Cucchiara's research work focuses on artificial intelligence, specifically deep network technologies and computer vision for human behavior understanding (HBU) and visual, language and multimodal generative AI. She is the scientific coordinator of the AImage Lab at UNIMORE and is director of the Artificial Intelligence Research and Innovation Center (AIRI) as well as the ELLIS Unit at Modena. She was founder and director from 2018 to 2021 of the Italian National Lab of Artificial Intelligence and intelligent systems AIIS of CINI. Cucchiara was also president of the CVPL from 2016 to 2018. Rita Cucchiara is IAPR Fellow since 2006 and ELLIS Fellow since 2020.

An audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

References

  1. 1 2 "Rita Singh". mlsp.cs.cmu.edu. Retrieved 2024-01-29.
  2. 1 2 Brandon, Simon. "How to catch a criminal using only milliseconds of audio". World Economic Forum.
  3. "The Mind-Blowing Promise of AI-Driven Voice Profiling". secure.dashdigital.com. Retrieved 2024-02-06.
  4. "Center for Voice Intelligence and Security". cvis.cs.cmu.edu. Retrieved 2024-01-29.
  5. 1 2 "Rita Singh". IEEE author profile. Retrieved 2024-01-29.
  6. "CMU Robust Speech Recognition Home Page". www.cs.cmu.edu. Retrieved 2024-01-29.
  7. "Snapshot: Voice Forensics Can Help the Coast Guard Catch Hoax Callers | Homeland Security". www.dhs.gov. Retrieved 2024-02-06.
  8. Lambert, Benjamin; Raj, Bhiksha; Singh, Rita (2013-08-25). "Discriminatively trained dependency language modeling for conversational speech recognition". Interspeech 2013. ISCA: ISCA: 3414–3418. doi:10.21437/interspeech.2013-748. S2CID   13538544.
  9. Raj, Bhiksha; Virtanen, Tuomas; Singh, Rita (2012-10-05). "The Problem of Robustness in Automatic Speech Recognition". Techniques for Noise Robustness in Automatic Speech Recognition: 31–50. doi:10.1002/9781118392683.ch3. ISBN   978-1-119-97088-0.
  10. Singh, Rita; Raj, Bhiksha; Baker, James (2016). "Short-term analysis for estimating physical parameters of speakers". 2016 4th International Conference on Biometrics and Forensics (IWBF). IEEE. pp. 1–6. doi:10.1109/iwbf.2016.7449696. ISBN   978-1-4673-9448-2. S2CID   9531585.
  11. "The Mind-Blowing Promise of AI-Driven Voice Profiling". secure.dashdigital.com. Retrieved 2024-02-05.
  12. Singh, Rita; Keshet, Joseph; Gencaga, Deniz; Raj, Bhiksha (2016). "The relationship of voice onset time and Voice Offset Time to physical age". 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. pp. 5390–5394. doi:10.1109/icassp.2016.7472707. ISBN   978-1-4799-9988-0. S2CID   8227134.
  13. Burgess, Matt. "The Race to Hide Your Voice". Wired. ISSN   1059-1028 . Retrieved 2024-02-03.
  14. "Computer Science Speaker Series Master Calendar". calendars.illinois.edu. Retrieved 2024-02-05.
  15. Singh, Rita; Raj, Bhiksha; Gencaga, Deniz (2016). "Forensic anthropometry from voice: An articulatory-phonetic approach". 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE. pp. 1375–1380. doi:10.1109/mipro.2016.7522354. ISBN   978-953-233-086-1. S2CID   15344343.
  16. "AI in Voice Forensics with Rita Singh". The Women in Tech Show. 2017-11-21. Retrieved 2024-01-29.
  17. AI and Coronavirus Japan NHK Documentary, NHK, Japan TV Documentary, Aired Saturday, 27 June 2020.
  18. Tobias, Marc Weber. "AI And Medical Diagnostics: Can A Smartphone App Detect Covid-19 From Speech Or A Cough?". Forbes. Retrieved 2024-02-05.
  19. Leonard, Ben (2022-01-12). "Talk to me: How AI can diagnose disease". POLITICO. Retrieved 2024-02-05.
  20. How Voice Profiling Will Revolutionize Health Care, with Rita Singh | News Items Podcast with John Ellis, 2021-08-09, retrieved 2024-02-05
  21. Virtanen, Tuomas; Singh, Rita; Raj, Bhiksha (2012-11-02), Virtanen, Tuomas; Singh, Rita; Raj, Bhiksha (eds.), "Introduction", Techniques for Noise Robustness in Automatic Speech Recognition (1 ed.), Wiley, pp. 1–5, doi:10.1002/9781118392683.ch1, ISBN   978-1-119-97088-0 , retrieved 2024-02-05
  22. Singh, Rita (2019). "Profiling Humans from their Voice". Springer. doi:10.1007/978-981-13-8403-5. ISBN   978-981-13-8402-8. S2CID   196188160.