International Speech Communication Association

Last updated

The International Speech Communication Association (ISCA) [1] is a non-profit organization and one of the two main professional associations for speech communication science and technology, the other association being the IEEE Signal Processing Society.

Contents

Purpose

The purpose of the association is to promote the study and application of automatic speech processing (in two directions: speech recognition and speech synthesis) with several sub-topics such as speaker recognition or speech compression. Activities concern all aspects of speech processing, including the computational, linguistic, and theoretical aspects.

Conferences

ISCA organizes yearly the INTERSPEECH conference.

Most recent INTERSPEECH:

Forthcoming INTERSPEECH:

ISCA board

Current ISCA president is Sebastian Möller. The Vice president is Odette Scharenborg and the other members are professionals in the field. [2]

History of ISCA

ISCA is the result of the merge of ESCA (European Speech Communication Association created in 1987 in Europe) and PC-ICSLP (Permanent Council of the organization of International Conference on Spoken Language Processing created in 1986 in Japan). The first ISCA event was held in 2000 in Beijing, China. [3]

See also

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Gunnar Fant</span>

Carl Gunnar Michael Fant was a leading researcher in speech science in general and speech synthesis in particular who spent most of his career as a professor at the Swedish Royal Institute of Technology (KTH) in Stockholm. He was a first cousin of the actors and directors George Fant and Kenne Fant.

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and stress. Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

<span class="mw-page-title-main">Philip Rubin</span> American cognitive scientist and science administrator

Philip E. Rubin is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, technology, and medicine, at a national level. His research career is noted for his theoretical contributions and pioneering technological developments, starting in the 1970s, related to speech synthesis and speech production, including articulatory synthesis and sinewave synthesis, and their use in studying complex temporal events, particularly understanding the biological bases of speech and language.

Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. As such it is a type of electronic lip reading. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Alex Waibel made pioneering contributions to speech translation systems, breaking down language barriers through cross-lingual speech communication. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.

<span class="mw-page-title-main">Text, Speech and Dialogue</span>

Text, Speech and Dialogue (TSD) is an annual conference involving topics on natural language processing and computational linguistics. The meeting is held every September alternating in Brno and Plzeň, Czech Republic.

<span class="mw-page-title-main">Mariusz Ziółko</span>

Mariusz Ziółko is a Polish Professor of automatics and signal processing. He graduated in 1970 from AGH University of Science and Technology, where he also received PhD in 1973 and habilitation in 1990. In 2001 he received a title of a professor of technical studies from a president of Poland. He was a DAAD stipendist at University of Wuppertal. He was a visiting professor at Fachhochule St. Pölten at Austria in 2005. He is a professor in AGH Department of Electronics, where he leads a Digital Signal Processing Group. He was a member of several scientific organisations including IEEE, SIAM, EURASIP, ISCA and a chair of several conferences. He is a coauthor of pioneer publications from mathematical optimization models in biology. His works had also an important impact on Polish speech recognition technologies.

Google Brain was a deep learning artificial intelligence research team under the umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

The BABEL speech corpus is a corpus of recorded speech materials from five Central and Eastern European languages. Intended for use in speech technology applications, it was funded by a grant from the European Union and completed in 1998. It is distributed by the European Language Resources Association.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

<span class="mw-page-title-main">Joseph Mariani</span>

Joseph Mariani is a French computer science researcher and pioneer in the field of speech processing.

Bayya Yegnanarayana is an INSA Senior Scientist at International Institute of Information Technology (IIIT) Hyderabad, Telangana, India. He is an eminent professor and is known for his contributions in Digital Signal Processing, Speech Signal Processing, Artificial Neural Networks and related areas. He has guided about 39 PhD theses, 43 MS theses and 65 MTech projects. He was the General Chair for the international conference, INTERSPEECH 2018, held at Hyderabad. He also holds the positions as Distinguished Professor, IIT Hyderabad and an Adjunct Faculty, IIT Tirupati.

<span class="mw-page-title-main">Steve Young (software engineer)</span> British researcher (born 1951)

Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.

<span class="mw-page-title-main">John Makhoul</span> American computer scientist

John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Chin-Hui Lee is an information scientist, best known for his work in speech recognition, speaker recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as a professor in the school of electrical and computer engineering

References

  1. "Welcome to ISCA Web".
  2. "ISCA board". ISCA. Retrieved 2022-03-16.
  3. Interspeech 2016 http://www.interspeech2016.org/About-the-Conference . Retrieved 23 January 2018.{{cite web}}: Missing or empty |title= (help)