Rupal Patel (scientist)

Last updated
Rupal Patel
Alma mater University of Toronto
Scientific career
Fieldsspeech science; synthetic speech
Institutions Northeastern University, USA
Thesis Identifying information-bearing prosodic parameters in severely dysarthric vocalizations  (2000)
Doctoral advisor Bernard O'Keefe

Rupal Patel is a professor at Northeastern University, in Boston, USA in the area of speech science, human computer interaction and artificial intelligence. She is the director of the university's Communication Analysis and Design Laboratory.

Contents

Education

Patel gained her B. Sc. degree in neuropsychology from University of Calgary, graduating in 1993. She undertook further study at University of Toronto and gained her doctorate in the subject of speech acoustics in 2000 and completed a post doctorate at Massachusetts Institute of Technology. [1]

Career

From 2001-2003 she was an assistant professor at Teacher’s College Columbia University. In 2003 she was appointed as an assistant professor at Northeastern University, and was promoted to professor in 2014. [1] Her post is jointly between the university's Bouvé College of Health Sciences and Khoury College of Computer Sciences, reflecting her research. This has concentrated on the acquisition and impairment of speech, specifically prosody, in healthy speakers and those with neuromotor disorders. This has led to the very practical design of speech enhancement to generate naturalistic synthetic voices for those with speech disorders by making use of their residual speaking ability and learning technologies to help kids read with natural inflection and rhythm. [2]

Since the mid 2000s she has led development of computer systems that can generate a naturalistic synthetic voice. This resulted from her work on speech analysis. Those with speech disorders can often produce a sound, but cannot shape it into speech with their mouths. Her research group developed a computer system that allowed speech to be different for each individual based on their natural sound. The pitch, loudness, breathiness and clarity of normal speech was generated by applying the computer system to a recording of a sample of the sound the individual was able to produce. By 2013 she could produce synthetic voices in the laboratory. [3]

She founded the spin-out company VOCALiD in 2014 and has continued development of the machine learning and speech blending used for generating the synthetic voices. [4] By the early 2020s the systems were able to reproduce existing voices as well as synthesise new ones. One use was for voice actors to be able to have an exact copy, or clone, of their voice to use in their work. [5]

Publications

Patel is the author or co-author of over 70 scientific publications or book chapters. In 2013 she was invited to present a TED talk about Synthetic voices, as unique as fingerprints. [6]

Related Research Articles

<span class="mw-page-title-main">Human voice</span> Sound made by a human being using the vocal tract

The human voice consists of sound made by a human being using the vocal tract, including talking, singing, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically a part of human sound production in which the vocal folds are the primary sound source.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Northeastern University</span> Private university in Boston, Massachusetts, US

Northeastern University is a private research university with its main campus in Boston, Massachusetts. Established in 1898, it was founded by the Boston Young Men's Christian Association as an all-male institute before being incorporated as Northeastern College in 1916, gaining university status in 1922. With more than 36,000 students, Northeastern is one of the largest universities in Massachusetts by enrollment.

<span class="mw-page-title-main">Human image synthesis</span> Computer generation of human images

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work .

Northeastern University School of Pharmacy is the pharmacy school at Northeastern University in Boston, Massachusetts. It is one of four schools that comprise the Bouvé College of Health Sciences. Northeastern's Doctor of Pharmacy (PharmD) program is the only PharmD cooperative education, or co-op, program in the United States. Students who participate in the co-op program are placed in paid, full-time positions that provide profession experience.

A virtual human, virtual persona, or digital clone is the creation or re-creation of a human being in image and voice using computer-generated imagery and sound, that is often indistinguishable from the real actor.

Speech and language impairment are basic categories that might be drawn in issues of communication involve hearing, speech, language, and fluency.

<span class="mw-page-title-main">Carla Scaletti</span> Musical artist

Carla Scaletti is an American harpist, composer, music technologist and the inventor of the Kyma Sound Design Environment as well as president of Symbolic Sound.

Teresa Thomas "Terry" Fulmer, is the current president of The John A. Hartford Foundation. Earlier positions include distinguished professor and dean of the Bouvé College of Health Sciences at Northeastern University and dean of the College of Nursing at New York University. She is known for her extensive research in geriatrics and elder abuse. She has received funding from the National Institute on Aging, the National Institute of Nursing Research and other foundations for her research regarding elder abuse.

<span class="mw-page-title-main">Northeastern University Bouvé College of Health Sciences</span> Private health college in Boston, Massachusetts, US

The Bouvé College of Health Sciences is the allied health education college of Northeastern University in Boston, Massachusetts. It encompasses four schools: School of Community Health and Behavioral Sciences, School of Nursing, School of Clinical and Rehabilitation Sciences, and School of Pharmacy and Pharmaceutical Sciences. The college offers more than 80 undergraduate and graduate programs, including its online-based accelerated nursing program. In addition to Boston, Bouvé College of Health Science programs are offered at satellite locations in Burlington, Massachusetts and Charlotte, North Carolina and online.

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.

Hortensia Amaro is a Cuban-American educator, and formerly Distinguished Professor at Northeastern University and Associate Vice Provost of Community Research and Dean's Professor of Social Work and Preventative Medicine at the University of Southern California. Amaro was born in Cuba and moved to Los Angeles, California as a child. From a young age, she recognized that there was a demand for public health services in her area, particularly by immigrants and minorities. Amaro assisted in the development and implementation of numerous treatment and prevention models as well as the creation and establishment of several clinical interventions and programs dedicated to substance abuse, mental health and HIV/AIDS treatment and prevention that target women and minorities.

<span class="mw-page-title-main">Hazel Sive</span> American South-African-born Biologist & scholar

Hazel L. Sive is a South African-born biologist and educator. She is Dean of the College of Science, and Professor of Biology at Northeastern University. Sive is a research pioneer, award-winning educator and innovator in the higher education space who was elected as a Fellow of the American Association for the Advancement of Science in November 2021. Prior to June 2020, she was a Member of Whitehead Institute for Biomedical Research, Professor of Biology at Massachusetts Institute of Technology and Associate Member of the Broad Institute of MIT and Harvard. Sive studies development of the vertebrate embryo, and has made unique contributions to understanding how the face forms and how the brain develops its structure. Her lab also seeks to understand the origins of neurological and neurodevelopmental disorders, such as epilepsy, autism, Pitt–Hopkins syndrome and 16p11.2 deletion syndrome.

Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

<span class="mw-page-title-main">15.ai</span> Real-time text-to-speech tool using artificial intelligence

15.ai is a non-commercial freeware artificial intelligence web application that generates natural emotive high-fidelity text-to-speech voices from an assortment of fictional characters from a variety of media sources. Developed by a pseudonymous MIT researcher under the name 15, the project uses a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate and serve emotive character voices faster than real-time, particularly those with a very small amount of trainable data.

Ann Kristen Syrdal was an American psychologist and computer science researcher who worked with speech synthesis technology. She developed the first female-sounding voice synthesizer.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Andrea M. Matwyshyn is an American law professor and engineering professor at The Pennsylvania State University. She is known as a scholar of technology policy, particularly as an expert at the intersection of law and computer security and for her work with government. She is credited with originating the legal and policy concept of the Internet of Bodies.

Andrea Grimes Parker is an American computer scientist, researcher, and Associate Professor, known for her interdisciplinary study of human computer interaction (HCI) and personal health informatics. Parker is currently an associate professor at Georgia Institute of Technology School of Interactive Computing. She also currently serves as an Adjunct Associate Professor in the Rollins School of Public Health at Emory University.

<span class="mw-page-title-main">Nikisha Jariwala</span> Indian computer scientist

Nikisha B. Jariwala is an Indian professor and computer science researcher known for her work in devising a computer model for converting Indian text into Braille. She is a Professor of Computer Science at Smt. Tanuben & Dr. Manubhai Trivedi College of Information Science.

References

  1. 1 2 "Rupal Patel CV" (PDF). Bouve Northeastern University website. Retrieved 27 July 2021.
  2. "Rupal Patel, PhD, CCC-SLP". Northeastern University Bouvé College of Health Sciences. Retrieved 27 July 2021.
  3. Spiegel, Alix. "New Voices For The Voiceless: Synthetic Speech Gets An Upgrade". NPR - National Public Radio. Retrieved 27 July 2021.
  4. "About us". VOCALiD. Retrieved 27 July 2021.
  5. Palmai, Kitti. "Voice cloning of growing interest to actors and cybercriminals". BBC News. Retrieved 27 July 2021.
  6. "Rupal Patel: Synthetic voices, as unique as fingerprints". YouTube. Retrieved 27 July 2021.