L2 Syntactic Complexity Analyzer

Last updated

L2 Syntactical Complexity Analyzer (L2SCA) developed by Xiaofei Lu at the Pennsylvania State University, is a computational tool which produces syntactic complexity indices of written English language texts. [1] Along with Coh-Metrix, the L2SCA is one of the most extensively used computational tool to compute indices of second language writing development. The L2SCA is also widely utilised in the field of corpus linguistics. [2] The L2SCA is available in a single and a batch mode. The first provides the possibility of analyzing a single written text for 14 syntactic complexity indices. [3] The latter allows the user to analyze 30 written texts simultaneously.

Contents

Usage

Second language writing development

The L2SCA has been used in numerous studies in the field of second language writing development to compute indices of syntactic complexity. [4] [5] [6]

Corpus linguistics

The L2SCA has also been used in various studies in the field of corpus linguistics. [7] [8]

Indices

No.ConstructIndexAbbr.1
1.Syntactic structures Word count W
2. Sentence S
3. Verb phrase VP
4. Clause C
5. T-unit T
6. Dependent clause DC
7.Complex T-unitCT
8.Coordinate phraseCP
9.Complex nominalCN
10.Syntactic complexity indicesLength of production unitsMean length of sentenceMLS
11.Mean length of T-unitMLT
12.Mean length of clauseMLC
13.Overall sentence complexityClause per sentenceC/S
14. Amounts of subordination Clause per T-unitC/T
15.Complex T-unit ratioCT/T
16.Dependent clause per clauseDC/C
17.Dependent clause per T-unitDC/T
18. Amounts of coordination Coordinate phrase per clauseCP/C
19.Coordinate phrase per T-unitCP/T
20.T-unit per sentenceT/S
21.Phrasal sophisticationComplex nominal per clauseCN/C
22.Complex nominal per T-unitCN/T
23.Verb phrase per T-unitV/T
Notes

See also

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

The following outline is provided as an overview and topical guide to linguistics:

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

<span class="mw-page-title-main">Second language writing</span>

Second language writing is the study of writing performed by non-native speakers/writers of a language as a second or foreign language. According to Oxford University, second language writing is the expression of one's actions and what one wants to say in writing in a language other than one's native language. Learning a new language and writing in it is the most challenging thing. Learning a new language first requires an understanding of the writing system and the grammar of the language. Because grammar is the basis of writing. Learning the grammar of a language is the only way to write in that language. The extent to which non-native speakers write in formal or specialized domains, and the requirements for grammatical accuracy and compositional coherence, will vary according to the specific context. The process of second language writing has been an area of research in applied linguistics and second language acquisition theory since the middle of the 20th century. The focus has been mainly on second-language writing in academic settings. In the last few years, there has been a great deal of interest in and research on informal writing. These informal writings include writing in online contexts. In terms of instructional practices, the focus of second language writing instruction has traditionally been on achieving grammatical accuracy. However, this changed under the influence of compositional studies, which focused on conceptual and structural properties. Another development in the teaching of second language writing is the increasing use of models and the emphasis on the properties of particular writing genres. Recent research has analyzed how second-language writing differs from native-language writing, emphasizing the cultural factors that influence second-language writers. In general, second language acquisition research has transitioned from a primary focus on cognitive factors to a sociocultural perspective in which writing is viewed not only as an acquired language skill and cognitive ability but also, more broadly, as a socially situated communicative act involving a target audience. Recently, particular attention has been paid to the integration of written texts with other media (multimodality) and to the mixing of languages in online media.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Statistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach.

In linguistics, grammaticality is determined by the conformity to language usage as derived by the grammar of a particular speech variety. The notion of grammaticality rose alongside the theory of generative grammar, the goal of which is to formulate rules that define well-formed, grammatical, sentences. These rules of grammaticality also provide explanations of ill-formed, ungrammatical sentences.

Coh-Metrix is a computational tool that produces indices of the linguistic and discourse representations of a text. Developed by Arthur C. Graesser and Danielle S. McNamara, Coh-Metrix analyzes texts on many different features.

Language complexity is a topic in linguistics which can be divided into several sub-topics such as phonological, morphological, syntactic, and semantic complexity. The subject also carries importance for language evolution.

Native-language identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in second-language acquisition, language teaching and forensic linguistics, amongst others.

Complex Dynamic Systems Theory in the field of linguistics is a perspective and approach to the study of second, third and additional language acquisition. The general term Complex Dynamic Systems Theory was recommended by Kees de Bot to refer to both Complexity theory and Dynamic systems theory.

<span class="mw-page-title-main">Marjolijn Verspoor</span> Dutch linguist

Marjolijn Verspoor is a Dutch linguist. She is a professor of English language and English as a second language at the University of Groningen, Netherlands. She is known for her work on Complex Dynamic Systems Theory and the application of dynamical systems theory to study second language development. Her interest is also in second language writing.

<span class="mw-page-title-main">Diane Larsen-Freeman</span> American linguist

Diane Larsen-Freeman is an American linguist. She is currently a Professor Emerita in Education and in Linguistics at the University of Michigan in Ann Arbor, Michigan. An applied linguist, known for her work in second language acquisition, English as a second or foreign language, language teaching methods, teacher education, and English grammar, she is renowned for her work on the complex/dynamic systems approach to second language development.

<span class="mw-page-title-main">Kees de Bot</span> Dutch linguist

Cornelis Kees de Bot is a Dutch linguist. He is currently the chair of applied linguistics at the University of Groningen, Netherlands, and at the University of Pannonia. He is known for his work on second language development and the use of dynamical systems theory to study second language development.

Wander Marius Lowie is a Dutch linguist. He is currently a professor of applied linguistics at the Department of Applied Linguistics at the University of Groningen, Netherlands. He is known for his work on Complex Dynamic Systems Theory.

<span class="mw-page-title-main">Rosa Manchón</span> Spanish linguist

Rosa María Manchón Ruiz is a Spanish linguist. She is currently a professor of applied linguistics at the University of Murcia, Spain. Her research focuses on second language acquisition and second language writing. She was the editor of the Journal of Second Language Writing between 2008 and 2014.

<span class="mw-page-title-main">Lourdes Ortega</span> Professor of applied linguistics

Lourdes Ortega is a Spanish-born American linguist. She is currently a professor of applied linguistics at Georgetown University. Her research focuses on second language acquisition and second language writing. She is noted for her work on second language acquisition and for recommending that syntactic complexity needs to be measured multidimensionally.

Scott Andrew Crossley is an American linguist. He is a professor of applied linguistics at Vanderbilt University, United States. His research focuses on natural language processing and the application of computational tools and machine learning algorithms in learning analytics including second language acquisition, second language writing, and readability. His main interest area is the development and use of natural language processing tools in assessing writing quality and text difficulty.

Rosalind Ivanić is a Yugoslav-born British linguist. She is currently an honorary professor at the Department of Linguistics and English Language of Lancaster University, United Kingdom. Her research focuses on applied linguistics with a special focus on literacy, intertextuality, multimodal communication, adult literacy, educational linguistics, critical language awareness, punctuation, and second language writing. Along with Theo van Leeuwen and David Barton, she is considered one of the most prominent researchers on literacy.

Danielle S. McNamara is an educational researcher known for her theoretical and empirical work with reading comprehension and the development of game-based literacy technologies. She is professor of psychology and senior research scientist at Arizona State University. She has previously held positions at University of Memphis, Old Dominion University, and University of Colorado, Boulder.

References

  1. "L2 Syntactic Complexity Analyzer". Aihaiyang.com. 12 September 2018.
  2. Computational Methods for Corpus Annotation and Analysis. Springer Publishing. 12 September 2018. ISBN   9789401786447.
  3. Kyle, Kristopher; Crossley, Scott (2017-10-20). "Assessing syntactic sophistication in L2 writing: A usage-based approach". Language Testing. 34 (4): 513–535. doi:10.1177/0265532217712554. ISSN   0265-5322. S2CID   149239304.
  4. "Diane Mazgutova & Judit Kormos: Syntactic and lexical development in an intensive English for Academic Purposes programme" (PDF). Journal of Second Language Writing . 29: 3–15. 12 September 2018. doi:10.1016/j.jslw.2015.06.004.
  5. Hou, Junping; Verspoor, Marjolijn; Loerts, Hanneke (12 September 2018). "Junping Hou, Marjolijn Verspoor & Hanneke Loerts: An exploratory study into the dynamics of Chinese L2 writing development". Dutch Journal of Applied Linguistics. 5: 65–96. doi:10.1075/dujal.5.1.04loe.
  6. "Attila M. Wind: Second language writing development from a Dynamic Systems Theory perspective" (PDF). Lancaster University. 12 September 2018.
  7. "Lu & Ai: Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds". Journal of Second Language Writing. 29: 16–27. September 2015. doi:10.1016/j.jslw.2015.06.003.
  8. "Nasseri: A Corpus-based Analysis of Syntactic Complexity measures in the Academic Writing of EFL, ESL, and Native English Master's Students" (PDF). Birmingham.ac.uk. 12 September 2018.