International Speech Communication Association

Last updated

The International Speech Communication Association (ISCA) [1] is a non-profit organization and one of the two main professional associations for speech communication science and technology, the other association being the IEEE Signal Processing Society.

Contents

Purpose

The purpose of the ISCA is to promote the study and application of automatic speech processing including speech recognition and speech synthesis, as well as sub-topics such as speaker recognition and speech compression. The association's activities encompass all aspects of speech processing, including computational, linguistic, and theoretical aspects. [2]

Purpose:

The primary goal of the International Speech Communication Association (ISCA) is to advance the field of automatic speech processing and communication technology through research, education, and collaboration. By promoting the study and application of speech technologies such as speech recognition, speech synthesis, speaker recognition, and speech compression, ISCA aims to facilitate innovation and development in the realm of human-computer interaction, telecommunications, and multimedia applications.

ISCA serves as a platform for researchers, academics, industry professionals, and students to exchange knowledge, share best practices, and foster interdisciplinary dialogue in the field of speech communication science. Through conferences, workshops, publications, and educational initiatives, ISCA endeavors to enhance the understanding of speech processing mechanisms, improve the accuracy and efficiency of speech technologies, and explore new frontiers in the domain of human language communication.

Furthermore, ISCA plays a crucial role in promoting international collaboration and networking among professionals in the speech communication community. By facilitating partnerships and cooperation between individuals and organizations worldwide, ISCA seeks to drive global progress in speech technology research and application, ultimately contributing to the advancement of communication systems, accessibility tools, and interactive interfaces that benefit society as a whole.

Conferences

ISCA organizes yearly the INTERSPEECH conference.

Most recent INTERSPEECH:

Forthcoming INTERSPEECH:

ISCA board

Current ISCA president is Odette Scharenborg. The vice president is Bhuvana Ramabhadran and the other members are professionals in the field. [3]

History of ISCA

ISCA is the result of the merge of ESCA (European Speech Communication Association created in 1987 in Europe) and PC-ICSLP (Permanent Council of the organization of International Conference on Spoken Language Processing created in 1986 in Japan). The first ISCA event was held in 2000 in Beijing, China. [4]

See also

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and syllable and word stress. Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

<span class="mw-page-title-main">Manfred R. Schroeder</span>

Manfred Robert Schroeder was a German physicist, most known for his contributions to acoustics and computer graphics. He wrote three books and published over 150 articles in his field.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology (KIT). Waibel’s research focuses on automatic speech recognition, translation and human-machine interaction. His work has introduced cross-lingual communication systems, such as consecutive and simultaneous interpreting systems on a variety of platforms. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.

<span class="mw-page-title-main">Text, Speech and Dialogue</span>

Text, Speech and Dialogue (TSD) is an annual conference involving topics on natural language processing and computational linguistics. The meeting is held every September alternating in Brno and Plzeň, Czech Republic.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Mariusz Ziółko</span>

Mariusz Ziółko is a Polish Professor of automatics and signal processing. He graduated in 1970 from AGH University of Science and Technology, where he also received PhD in 1973 and habilitation in 1990. In 2001 he received a title of a professor of technical studies from a president of Poland. He was a DAAD stipendist at University of Wuppertal. He was a visiting professor at Fachhochule St. Pölten at Austria in 2005. He is a professor in AGH Department of Electronics, where he leads a Digital Signal Processing Group. He was a member of several scientific organisations including IEEE, SIAM, EURASIP, ISCA and a chair of several conferences. He is a coauthor of pioneer publications from mathematical optimization models in biology. His works had also an important impact on Polish speech recognition technologies.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

<span class="mw-page-title-main">Joseph Mariani</span>

Joseph Mariani is a French computer science researcher and pioneer in the field of speech processing.

<span class="mw-page-title-main">John Makhoul</span> American computer scientist

John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.

openSMILE is source-available software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction". The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community. The openSMILE project exists since 2008 and is maintained by the German company audEERING GmbH since 2013. openSMILE is provided free of charge for research purposes and personal use under a source-available license. For commercial use of the tool, the company audEERING offers custom license options.

<span class="mw-page-title-main">Voice computing</span> Discipline in computing

Voice computing is the discipline that develops hardware or software to process voice inputs.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Lori Faith Lamel is a speech processing researcher known for her work with the TIMIT corpus of American English speech and for her work on voice activity detection, speaker recognition, and other non-linguistic inferences from speech signals. She works for the French National Centre for Scientific Research (CNRS) as a senior research scientist in the Spoken Language Processing Group of the Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur.

Chin-Hui Lee is an information scientist, best known for his work in speech recognition, speaker recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as a professor in the school of electrical and computer engineering

References

  1. "Welcome to ISCA Web".
  2. "ISCA - Objectives". isca-speech.org. Retrieved 2024-07-04.
  3. "ISCA board". ISCA. Retrieved 2024-07-24.
  4. Interspeech 2016 http://www.interspeech2016.org/About-the-Conference . Retrieved 23 January 2018.{{cite web}}: Missing or empty |title= (help)