Non-native speech database

Last updated December 16, 2024

A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems.^[1]

List

Table 1: Abbreviations for languages used in Table 2

Arabic	A	Japanese	J
Chinese	C	Korean	K
Czech	Cze	Malaysian	M
Danish	D	Norwegian	N
Dutch	Dut	Portuguese	P
English	E	Russian	R
French	F	Spanish	S
German	G	Swedish	Swe
Greek	Gre	Thai	T
Indonesian	Ind	Vietnamese	V
Italian	I

The actual table with information about the different databases is shown in Table 2.

Table 2: Overview of non-native Databases

Corpus	Author	Available at	Languages	#Speakers	Native Language	#Utt.	Duration	Date	Remarks
AMI ^[2]		EU	E		Dut and other		100h		meeting recordings
ATR-Gruhn ^[3]	Gruhn	ATR	E	96	C G F J Ind	15000		2004	proficiency rating
BAS Strange Corpus 1+10 ^[4]		ELRA	G	139	50 countries	7500		1998
Berkeley Restaurant ^[5]		ICSI	E	55	G I H C F S J	2500		1994
Broadcast News ^[6]		LDC	E					1997
Cambridge-Witt ^[7]	Witt	U. Cambridge	E	10	J I K S	1200		1999
Cambridge-Ye ^[8]	Ye	U. Cambridge	E	20	C	1600		2005
Children News ^[9]	Tomokiyo	CMU	E	62	J C	7500		2000	partly spontaneous
CLIPS-IMAG ^[10]	Tan	CLIPS-IMAG	F	15	C V		6h	2006
CLSU ^[11]		LDC	E		22 countries	5000		2007	telephone, spontaneous
CMU ^[12]		CMU	E	64	G	452	0.9h		not available
Cross Towns ^[13]	Schaden	U. Bochum	E F G I Cze Dut	161	E F G I S	72000	133h	2006	city names
Duke-Arslan ^[14]	Arslan	Duke University	E	93	15 countries	2200		1995	partly telephone speech
ERJ ^[15]	Minematsu	U. Tokyo	E	200	J	68000		2002	proficiency rating
Fischer ^[16]		LDC	E		many		200h		telephone speech
Fitt ^[17]	Fitt	U. Edinburgh	F I N Gre	10	E	700		1995	city names
Fraenki ^[18]		U. Erlangen	E	19	G	2148
Hispanic ^[19]	Byrne		E	22	S		20h	1998	partly spontaneous
HLTC ^[20]		HKUST	E	44	C		3h	2010	available on request
IBM-Fischer ^[21]		IBM	E	40	S F G I	2000		2002	digits
iCALL ^[22]^[23]	Chen	I²R, A*STAR	C	305	24 countries	90841	142h	2015	phonetic and tonal transcriptions (in Pinyin), proficiency ratings
ISLE ^[24]	Atwell	EU/ELDA	E	46	G I	4000	18h	2000
Jupiter ^[25]	Zue	MIT	E	unknown	unknown	5146		1999	telephone speech
K-SEC ^[26]	Rhee	SiTEC	E	unknown	K			2004
LDC WSJ1 ^[27]		LDC		10		800	1h	1994
LeaP ^[28]	Gut	University of Münster	E G	127	41 different ones	73.941 words	12h	2003
MIST ^[29]		ELRA	E F G	75	Dut	2200		1996
NATO HIWIRE ^[30]		NATO	E	81	F Gre I S	8100		2007	clean speech
NATO M-ATC ^[31]	Pigeon	NATO	E	622	F G I S	9833	17h	2007	heavy background noise
NATO N4 ^[32]		NATO	E	115	unknown		7.5h	2006	heavy background noise
Onomastica ^[33]			D Dut E F G Gre I N P S Swe			(121000)		1995	only lexicon
PF-STAR ^[34]		U. Erlangen	E	57	G	4627	3.4h	2005	children speech
Sunstar ^[35]		EU	E	100	G S I P D	40000		1992	parliament speech
TC-STAR ^[36]	Heuvel	ELDA	E S	unknown	EU countries		13h	2006	multiple data sets
TED ^[37]	Lamel	ELDA	E	40(188)	many		10h(47h)	1994	eurospeech 93
TLTS ^[38]		DARPA	A		E		1h	2004
Tokyo-Kikuko ^[39]		U. Tokyo	J	140	10 countries	35000		2004	proficiency rating
Verbmobil ^[40]		U. Munich	E	44	G		1.5h	1994	very spontaneous
VODIS ^[41]		EU	F G	178	F G	2500		1998	about car navigation
WP Arabic ^[42]	Rocca	LDC	A	35	E	800	1h	2002
WP Russian ^[43]	Rocca	LDC	R	26	E	2500	2h	2003
WP Spanish ^[44]	Morgan	LDC	S		E			2006
WSJ Spoke ^[45]			E	10	unknown	800		1993

Legend

In the table of non-native databases some abbreviations for language names are used. They are listed in Table 1. Table 2 gives the following information about each corpus: The name of the corpus, the institution where the corpus can be obtained, or at least further information should be available, the language which was actually spoken by the speakers, the number of speakers, the native language of the speakers, the total amount of non-native utterances the corpus contains, the duration in hours of the non-native part, the date of the first public reference to this corpus, some free text highlighting special aspects of this database and a reference to another publication. The reference in the last field is in most cases to the paper which is especially devoted to describe this corpus by the original collectors. In some cases it was not possible to identify such a paper. In these cases a paper is referenced which is using this corpus is.

Some entries are left blank and others are marked with unknown. The difference here is that blank entries refer to attributes where the value is just not known. Unknown entries, however, indicate that no information about this attribute is available in the database itself. As an example, in the Jupiter weather database^[46] no information about the origin of the speakers is given. Therefore this data would be less useful for verifying accent detection or similar issues.

Where possible, the name is a standard name of the corpus, for some of the smaller corpora, however, there was no established name and hence an identifier had to be created. In such cases, a combination of the institution and the collector of the database is used.

In the case where the databases contain native and non-native speech, only attributes of the non-native part of the corpus are listed. Most of the corpora are collections of read speech. If the corpus instead consists either partly or completely of spontaneous utterances, this is mentioned in the Specials column.

Related Research Articles

American English, sometimes called United States English or U.S. English, is the set of varieties of the English language native to the United States. English is the most widely spoken language in the United States; an official language in 32 of the 50 U.S. states; and the de facto common language used in government, education, and commerce throughout the nation. Since the late 20th century, American English has become the most influential form of English worldwide.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

In sociolinguistics, an accent is a way of pronouncing a language that is distinctive to a country, area, social class, or individual. An accent may be identified with the locality in which its speakers reside, the socioeconomic status of its speakers, their ethnicity, their caste or social class, or influence from their first language.

Scottish English is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standard English may be defined as "the characteristic speech of the professional class [in Scotland] and the accepted norm in schools". IETF language tag for "Scottish Standard English" is en-scotland.

Australian English (AuE) is a non-rhotic variety of English spoken by most native-born Australians. Phonologically, it is one of the most regionally homogeneous language varieties in the world. Australian English is notable for vowel length contrasts which are absent from many English dialects.

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction.

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

A speech corpus is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models. In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

The English language spoken and written in England encompasses a diverse range of accents and dialects. The language forms part of the broader British English, along with other varieties in the United Kingdom. Terms used to refer to the English language spoken and written in England include English English and Anglo-English.

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt. It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software. Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academics and industry.

Speaker adaptation is an important technology to fine-tune either features or speech models for mis-match due to inter-speaker variation. In the last decade, eigenvoice (EV) speaker adaptation has been developed. It makes use of the prior knowledge of training speakers to provide a fast adaptation algorithm. Inspired by the kernel eigenface idea in face recognition, kernel eigenvoice (KEV) is proposed. KEV is a non-linear generalization to EV. This incorporates Kernel principal component analysis, a non-linear version of Principal Component Analysis, to capture higher order correlations in order to further explore the speaker space and enhance recognition performance.

In communications technology, the technique of compressed sensing (CS) may be applied to the processing of speech signals under certain conditions. In particular, CS can be used to reconstruct a sparse vector from a smaller number of measurements, provided the signal can be represented in sparse domain. "Sparse domain" refers to a domain in which only a few measurements have non-zero values.

William John Barry is a phonetician in Germany.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

The BABEL speech corpus is a corpus of recorded speech materials from five Central and Eastern European languages. Intended for use in speech technology applications, it was funded by a grant from the European Union and completed in 1998. It is distributed by the European Language Resources Association.

Peter John Roach is a British retired phonetician. He taught at the Universities of Leeds and Reading, and is best known for his work on the pronunciation of British English.

openSMILE is source-available software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction". The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community. The openSMILE project exists since 2008 and is maintained by the German company audEERING GmbH since 2013. openSMILE is provided free of charge for research purposes and personal use under a source-available license. For commercial use of the tool, the company audEERING offers custom license options.

Speechmatics is a technology company based in Cambridge, England, which develops automatic speech recognition software (ASR) based on recurrent neural networks and statistical language modelling. Speechmatics was originally named Cantab Research Ltd when founded in 2006 by speech recognition specialist Dr. Tony Robinson.

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers. Participants did not know each other, and conversations were held on topics from a predetermined list.

References

↑ M. Raab, R. Gruhn and E. Noeth, Non-Native speech databases, in Proc. ASRU, Kyoto, Japan, 2007.
↑ AMI Project, "AMI Meeting Corpus" .
↑ R. Gruhn, T. Cincarek, and S. Nakamura, "A multi-accent non-native English database", in ASJ, 2004.
↑ University Munich, "Bavarian archive for speech signals strange corpus", .
↑ Jurafsky et al., "The Berkeley Restaurant Project", Proc. ICSLP 1994.
↑ L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.
↑ S. Witt, Use of Speech Recognition in Computer-Assisted Language Learning, Ph.D. thesis, Cambridge University Engineering Department, UK, 1999.
↑ H. Ye and S. Young, Improving the speech recognition performance of beginners in spoken conversational interaction for language learning, in Proc. Interspeech, Lisbon, Portugal, 2005.
↑ L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.
↑ T. P. Tan and L. Besacier, A French non-native corpus for automatic speech recognition, in LREC, Genoa, Italy, 2006.
↑ T. Lander, CSLU: Foreign accented English release 1.2, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2007.
↑ Z. Wang, T. Schultz, and A. Waibel, Comparison of acoustic model adaptation techniques on non-native speech, in Proc. ICASSP, 2003.
↑ S. Schaden, Regelbasierte Modellierung fremdsprachlich akzentbehafteter Aussprachevarianten, Ph.D. thesis, University Duisburg-Essen, 2006.
↑ L. M. Arslan and J. H. Hansen, Frequency characteristics of foreign accented speech, in Proc. of ICASSP, Munich, Germany, 1997, pp. 1123-1126.
↑ N. Minematsu et al., Development of English speech database read by Japanese to support CALL research, in ICA, Kyoto, Japan, 2004, pp. 577-560.
↑ Christopher Cieri, David Miller, Kevin Walker, The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text, Proc. LREC 2004
↑ S. Fitt, The pronunciation of unfamiliar native and non-native town names, in Proc. of Eurospeech, 1995, pp. 2227-2230.
↑ G. Stemmer, E. Noeth, and H. Niemann, Acoustic modeling of foreign words in a German speech recognition system, in Proc. Eurospeech, P. Dalsgaard, B. Lindberg, and H. Benner, Eds., 2001, vol. 4, pp. 2745-2748.
↑ W. Byrne, E. Knodt, S. Khudanpur, and J. Bernstein, Is automatic speech recognition ready for non-native speech? A data-collection effort and initial experiments in modeling conversational Hispanic English, in STiLL, Marholmen, Sweden, 1998, pp. 37-40.
↑ Y. Li, P. Fung, P. Xu, and Y. Liu, Asymmetric acoustic modeling for mixed language speech recognition, in ICASSP, Prague, Czech, 2011, pp. 37-40.
↑ V. Fischer, E. Janke, and S. Kunzmann, Recent progress in the decoding of non-native speech with multilingual acoustic models, in Proc. of Eurospeech, 2003, pp. 3105-3108.
↑ Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li, iCALL Corpus: Mandarin Chinese Spoken by Non-Native Speakers of European Descent, in Proc. of Interspeech, 2015.
↑ Nancy F. Chen, Vivaek Shivakumar, Mahesh Harikumar, Bin Ma, Haizhou Li. Large-Scale Characterization of Mandarin Pronunciation Errors Made by native Speakers of European Languages, in Proc. of Interspeech, 2013.
↑ W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter, The ISLE corpus of non-native spoken English, in LREC, Athens, Greece, 2000, pp. 957-963.
↑ K. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.
↑ S-C. Rhee and S-H. Lee and S-K. Kang and Y-J. Lee, Design and Construction of Korean-Spoken English Corpus (K-SEC), Proc. ICSLP 2004
↑ L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.
↑ Gut, U., Non-native Speech. A Corpus-based Analysis of Phonological and Phonetic Properties of L2 English and German, Frankfurt am Main: Peter Lang, 2009.
↑ TNO Human Factors Research Institute, Mist multi-lingual interoperability in speech technology database, Tech. Rep., ELRA, Paris, France, 2007, ELRA Catalog Reference S0238.
↑ J.C. Segura et al., The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication, 2007, .
↑ S. Pigeon, W. Shen, and D. van Leeuwen, Design and characterization of the non-native military air traffic communications database, in ICSLP, Antwerp, Belgium, 2007.
↑ L. Benarousse et al., The NATO native and non-native (n4) speech corpus, in Proc. of the MIST workshop (ESCA-NATO), Leusden, Sep 1999.
↑ Onomastica Consortium, The ONOMASTICA interlanguage pronunciation lexicon, in Proc. Eurospeech, Madrid, Spain, 1995, pp. 829-832.
↑ C. Hacker, T. Cincarek, A. Maier, A. Hessler, and E. Noeth, Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children, in Proc. of ICASSP, Honolulu, Hawai, 2007, pp. 197-200.
↑ C. Teixeira, I. Trancoso, and A. Serralheiro, Recognition of non-native accents, in Proc. Eurospeech, Rhodes, Greece, 1997, pp. 2375-2378.
↑ H. Heuvel, K. Choukri, C. Gollan, A. Moreno, and D. Mostefa, TC-STAR: New language resources for ASR and SLT purposes, in LREC, Genoa, 2006, pp. 2570-2573.
↑ L.F. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillmann, The translanguage English database TED, in ICSLP, Yokohama, Japan, Sep 1994.
↑ N. Mote, L. Johnson, A. Sethy, J. Silva, and S. Narayanan, Tactical language detection and modeling of learner speech errors: The case of Arabic tactical language training for American English speakers, in Proc. of InSTIL, June 2004.
↑ K. Nishina, Development of Japanese speech database read by non-native speakers for constructing CALL system, in ICA, Kyoto, Japan, 2004, pp. 561-564.
↑ University Munich, The Verbmobil project, .
↑ I. Trancoso, C. Viana, I. Mascarenhas, and C. Teixeira, On deriving rules for nativised pronunciation in navigation queries, in Proc. Eurospeech, 1999.
↑ A. LaRocca and R. Chouairi, West point Arabic speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2002.
↑ A. LaRocca and C. Tomei, West point Russian speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2003.
↑ J. Morgan, West point heroico Spanish speech, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2006.
↑ I. Amdal, F. Korkmazskiy, and A. C. Surendran, Joint pronunciation modelling of non-native speakers using data-driven methods, in ICSLP, Beijing, China, 2000, pp. 622-625.
↑ K. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] M. Raab, R. Gruhn and E. Noeth, Non-Native speech databases, in Proc. ASRU, Kyoto, Japan, 2007.

[2] AMI Project, "AMI Meeting Corpus" .

[3] R. Gruhn, T. Cincarek, and S. Nakamura, "A multi-accent non-native English database", in ASJ, 2004.

[4] University Munich, "Bavarian archive for speech signals strange corpus", .

[5] Jurafsky et al., "The Berkeley Restaurant Project", Proc. ICSLP 1994.

[6] L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

[7] S. Witt, Use of Speech Recognition in Computer-Assisted Language Learning, Ph.D. thesis, Cambridge University Engineering Department, UK, 1999.

[8] H. Ye and S. Young, Improving the speech recognition performance of beginners in spoken conversational interaction for language learning, in Proc. Interspeech, Lisbon, Portugal, 2005.

[9] L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

[10] T. P. Tan and L. Besacier, A French non-native corpus for automatic speech recognition, in LREC, Genoa, Italy, 2006.

[11] T. Lander, CSLU: Foreign accented English release 1.2, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2007.

[12] Z. Wang, T. Schultz, and A. Waibel, Comparison of acoustic model adaptation techniques on non-native speech, in Proc. ICASSP, 2003.

[13] S. Schaden, Regelbasierte Modellierung fremdsprachlich akzentbehafteter Aussprachevarianten, Ph.D. thesis, University Duisburg-Essen, 2006.

[14] L. M. Arslan and J. H. Hansen, Frequency characteristics of foreign accented speech, in Proc. of ICASSP, Munich, Germany, 1997, pp. 1123-1126.

[15] N. Minematsu et al., Development of English speech database read by Japanese to support CALL research, in ICA, Kyoto, Japan, 2004, pp. 577-560.

[16] Christopher Cieri, David Miller, Kevin Walker, The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text, Proc. LREC 2004

[17] S. Fitt, The pronunciation of unfamiliar native and non-native town names, in Proc. of Eurospeech, 1995, pp. 2227-2230.

[18] G. Stemmer, E. Noeth, and H. Niemann, Acoustic modeling of foreign words in a German speech recognition system, in Proc. Eurospeech, P. Dalsgaard, B. Lindberg, and H. Benner, Eds., 2001, vol. 4, pp. 2745-2748.

[19] W. Byrne, E. Knodt, S. Khudanpur, and J. Bernstein, Is automatic speech recognition ready for non-native speech? A data-collection effort and initial experiments in modeling conversational Hispanic English, in STiLL, Marholmen, Sweden, 1998, pp. 37-40.

[20] Y. Li, P. Fung, P. Xu, and Y. Liu, Asymmetric acoustic modeling for mixed language speech recognition, in ICASSP, Prague, Czech, 2011, pp. 37-40.

[21] V. Fischer, E. Janke, and S. Kunzmann, Recent progress in the decoding of non-native speech with multilingual acoustic models, in Proc. of Eurospeech, 2003, pp. 3105-3108.

[22] Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li, iCALL Corpus: Mandarin Chinese Spoken by Non-Native Speakers of European Descent, in Proc. of Interspeech, 2015.

[23] Nancy F. Chen, Vivaek Shivakumar, Mahesh Harikumar, Bin Ma, Haizhou Li. Large-Scale Characterization of Mandarin Pronunciation Errors Made by native Speakers of European Languages, in Proc. of Interspeech, 2013.

[24] W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter, The ISLE corpus of non-native spoken English, in LREC, Athens, Greece, 2000, pp. 957-963.

[25] K. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.

[26] S-C. Rhee and S-H. Lee and S-K. Kang and Y-J. Lee, Design and Construction of Korean-Spoken English Corpus (K-SEC), Proc. ICSLP 2004

[27] L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

[28] Gut, U., Non-native Speech. A Corpus-based Analysis of Phonological and Phonetic Properties of L2 English and German, Frankfurt am Main: Peter Lang, 2009.

[29] TNO Human Factors Research Institute, Mist multi-lingual interoperability in speech technology database, Tech. Rep., ELRA, Paris, France, 2007, ELRA Catalog Reference S0238.

[30] J.C. Segura et al., The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication, 2007, .

[31] S. Pigeon, W. Shen, and D. van Leeuwen, Design and characterization of the non-native military air traffic communications database, in ICSLP, Antwerp, Belgium, 2007.

[32] L. Benarousse et al., The NATO native and non-native (n4) speech corpus, in Proc. of the MIST workshop (ESCA-NATO), Leusden, Sep 1999.

[33] Onomastica Consortium, The ONOMASTICA interlanguage pronunciation lexicon, in Proc. Eurospeech, Madrid, Spain, 1995, pp. 829-832.

[34] C. Hacker, T. Cincarek, A. Maier, A. Hessler, and E. Noeth, Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children, in Proc. of ICASSP, Honolulu, Hawai, 2007, pp. 197-200.

[35] C. Teixeira, I. Trancoso, and A. Serralheiro, Recognition of non-native accents, in Proc. Eurospeech, Rhodes, Greece, 1997, pp. 2375-2378.

[36] H. Heuvel, K. Choukri, C. Gollan, A. Moreno, and D. Mostefa, TC-STAR: New language resources for ASR and SLT purposes, in LREC, Genoa, 2006, pp. 2570-2573.

[37] L.F. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillmann, The translanguage English database TED, in ICSLP, Yokohama, Japan, Sep 1994.

[38] N. Mote, L. Johnson, A. Sethy, J. Silva, and S. Narayanan, Tactical language detection and modeling of learner speech errors: The case of Arabic tactical language training for American English speakers, in Proc. of InSTIL, June 2004.

[39] K. Nishina, Development of Japanese speech database read by non-native speakers for constructing CALL system, in ICA, Kyoto, Japan, 2004, pp. 561-564.

[40] University Munich, The Verbmobil project, .

[41] I. Trancoso, C. Viana, I. Mascarenhas, and C. Teixeira, On deriving rules for nativised pronunciation in navigation queries, in Proc. Eurospeech, 1999.

[42] A. LaRocca and R. Chouairi, West point Arabic speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2002.

[43] A. LaRocca and C. Tomei, West point Russian speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2003.

[44] J. Morgan, West point heroico Spanish speech, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2006.

[45] I. Amdal, F. Korkmazskiy, and A. C. Surendran, Joint pronunciation modelling of non-native speakers using data-driven methods, in ICSLP, Beijing, China, 2000, pp. 622-625.

[46] K. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

Non-native speech database

Contents

List

Legend

Related Research Articles

References