International Speech Communication Association

Last updated November 24, 2024

The International Speech Communication Association (ISCA)^[1] is a non-profit organization and one of the two main professional associations for speech communication science and technology, the other association being the IEEE Signal Processing Society.

Purpose

The purpose of the International Speech Communication Association (ISCA) is to promote the study and application of automatic speech processing, including speech recognition and synthesis, as well as related areas such as speaker recognition and speech compression. The association's activities cover all aspects of speech processing, including computational, linguistic, and theoretical aspects.^[2]

The primary goal of the International Speech Communication Association (ISCA)^[3] is to advance the field of automatic speech processing and communication technology through research, education, and collaboration. By promoting the study and application of speech technologies such as speech recognition, speech synthesis, speaker recognition, and speech compression, ISCA aims to foster innovation and development in the areas of human-computer interaction, telecommunications, and multimedia applications.

ISCA serves as a platform for researchers, academics, industry professionals, and students to exchange knowledge, share best practices, and foster interdisciplinary dialogue in the field of speech communication science. Through conferences, workshops, publications, and educational initiatives, ISCA seeks to enhance the understanding of speech processing mechanisms, improve the accuracy and efficiency of speech technologies, and explore new frontiers in the realm of human language communication.

Furthermore, ISCA plays a crucial role in promoting international collaboration and networking among professionals in the speech communication community. By facilitating partnerships and cooperation between individuals and organizations worldwide, ISCA seeks to drive global progress in speech technology research and application, ultimately contributing to the advancement of communication systems, accessibility tools, and interactive interfaces that benefit society as a whole.

Conferences

ISCA organizes yearly the INTERSPEECH conference.

Most recent INTERSPEECH:

Forthcoming INTERSPEECH:

ISCA board

The ISCA president for 2023-2025 is Odette Scharenborg.^[4] The vice president is Bhuvana Ramabhadran and the other members are professionals in the field.^[5]

History of ISCA

The precursor to Interspeech was a conference called Eurospeech, first held in 1989 and organised by Jean-Pierre Tubach. It was the conference of the European Speech Communication Association (ESCA), itself the precursor of the International Speech Communication Association (ISCA). A year later another conference on speech science and technology was started: the International Conference on Spoken Language Processing (ICSLP), which was founded in 1990 by Hiroya Fujisaki. The first ISCA (vs. ESCA) event was the merging of Eurospeech and ICSLP to create ICSLP-Interspeech, held in Beijing, China in 2000. This was followed by Eurospeech-Interspeech, which was held in Aalborg, Denmark in 2001. In 2007, the Eurospeech and ICSLP parts of the conference names were dropped and Interspeech became the name of the yearly conference (first Interspeech location: Antwerp, Belgium).

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based. A silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis.

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction.

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems.

Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) methods.

Manfred Robert Schroeder was a German physicist, most known for his contributions to acoustics and computer graphics. He wrote three books and published over 150 articles in his field.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

Text, Speech and Dialogue (TSD) is an annual conference involving topics on natural language processing and computational linguistics. The meeting is held every September alternating in Brno and Plzeň, Czech Republic.

Mariusz Ziółko is a Polish Professor of automatics and signal processing. He graduated in 1970 from AGH University of Science and Technology, where he also received PhD in 1973 and habilitation in 1990. In 2001 he received a title of a professor of technical studies from a president of Poland. He was a DAAD stipendist at University of Wuppertal. He was a visiting professor at Fachhochule St. Pölten at Austria in 2005. He is a professor in AGH Department of Electronics, where he leads a Digital Signal Processing Group. He was a member of several scientific organisations including IEEE, SIAM, EURASIP, ISCA and a chair of several conferences. He is a coauthor of pioneer publications from mathematical optimization models in biology. His works had also an important impact on Polish speech recognition technologies.

William John Barry is a phonetician in Germany.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

The BABEL speech corpus is a corpus of recorded speech materials from five Central and Eastern European languages. Intended for use in speech technology applications, it was funded by a grant from the European Union and completed in 1998. It is distributed by the European Language Resources Association.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

<span class="mw-page-title-main">Joseph Mariani</span>

Joseph Mariani is a French computer science researcher and pioneer in the field of speech processing.

Peter John Roach is a British retired phonetician. He taught at the Universities of Leeds and Reading, and is best known for his work on the pronunciation of British English.

John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.

RIPAC was a VLSI single-chip microprocessor designed for automatic recognition of the connected speech, one of the first of this use.

Abeer Alwan is an American electrical engineer and speech processing researcher. She is a professor of electrical and computer engineering in the UCLA Henry Samueli School of Engineering and Applied Science, and vice chair for undergraduate affairs in the Department of Electrical & Computer Engineering.

References

↑ "Welcome to ISCA Web".
↑ "ISCA - Objectives". isca-speech.org. Retrieved 2024-07-04.
↑ "International Speech Communication Association (ISCA) Products - proceedings.com". www.proceedings.com. Retrieved 2024-07-31.
↑ "Odette Scharenborg". Odette Scharenborg (in Dutch). Retrieved 2024-09-08.
↑ "ISCA board". ISCA. Retrieved 2024-07-24.

External links

ISCA web page

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Welcome to ISCA Web".

[2] "ISCA - Objectives". isca-speech.org. Retrieved 2024-07-04.

[3] "International Speech Communication Association (ISCA) Products - proceedings.com". www.proceedings.com. Retrieved 2024-07-31.

[4] "Odette Scharenborg". Odette Scharenborg (in Dutch). Retrieved 2024-09-08.

[ISCA_Board-5] "ISCA board". ISCA. Retrieved 2024-07-24.

[1]

[2]

[3]

[4]

[5]