L2 Syntactical Complexity Analyzer (L2SCA) developed by Xiaofei Lu at the Pennsylvania State University, is a computational tool which produces syntactic complexity indices of written English language texts. [1] Along with Coh-Metrix, the L2SCA is one of the most extensively used computational tool to compute indices of second language writing development. The L2SCA is also widely utilised in the field of corpus linguistics. [2] The L2SCA is available in a single and a batch mode. The first provides the possibility of analyzing a single written text for 14 syntactic complexity indices. [3] The latter allows the user to analyze 30 written texts simultaneously.
The L2SCA has been used in numerous studies in the field of second language writing development to compute indices of syntactic complexity. [4] [5] [6]
The L2SCA has also been used in various studies in the field of corpus linguistics. [7] [8]
No. | Construct | Index | Abbr.1 | |
---|---|---|---|---|
1. | Syntactic structures | Word count | W | |
2. | Sentence | S | ||
3. | Verb phrase | VP | ||
4. | Clause | C | ||
5. | T-unit | T | ||
6. | Dependent clause | DC | ||
7. | Complex T-unit | CT | ||
8. | Coordinate phrase | CP | ||
9. | Complex nominal | CN | ||
10. | Syntactic complexity indices | Length of production units | Mean length of sentence | MLS |
11. | Mean length of T-unit | MLT | ||
12. | Mean length of clause | MLC | ||
13. | Overall sentence complexity | Clause per sentence | C/S | |
14. | Amounts of subordination | Clause per T-unit | C/T | |
15. | Complex T-unit ratio | CT/T | ||
16. | Dependent clause per clause | DC/C | ||
17. | Dependent clause per T-unit | DC/T | ||
18. | Amounts of coordination | Coordinate phrase per clause | CP/C | |
19. | Coordinate phrase per T-unit | CP/T | ||
20. | T-unit per sentence | T/S | ||
21. | Phrasal sophistication | Complex nominal per clause | CN/C | |
22. | Complex nominal per T-unit | CN/T | ||
23. | Verb phrase per T-unit | V/T |
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.
The following outline is provided as an overview and topical guide to linguistics:
In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.
Second language writing is the study of writing performed by non-native speakers/writers of a language as a second or foreign language. According to Oxford University, second language writing is the expression of one's actions and what one wants to say in writing in a language other than one's native language. Learning a new language and writing in it is the most challenging thing. Learning a new language first requires an understanding of the writing system and the grammar of the language. Because grammar is the basis of writing. Learning the grammar of a language is the only way to write in that language. The extent to which non-native speakers write in formal or specialized domains, and the requirements for grammatical accuracy and compositional coherence, will vary according to the specific context. The process of second language writing has been an area of research in applied linguistics and second language acquisition theory since the middle of the 20th century. The focus has been mainly on second-language writing in academic settings. In the last few years, there has been a great deal of interest in and research on informal writing. These informal writings include writing in online contexts. In terms of instructional practices, the focus of second language writing instruction has traditionally been on achieving grammatical accuracy. However, this changed under the influence of compositional studies, which focused on conceptual and structural properties. Another development in the teaching of second language writing is the increasing use of models and the emphasis on the properties of particular writing genres. Recent research has analyzed how second-language writing differs from native-language writing, emphasizing the cultural factors that influence second-language writers. In general, second language acquisition research has transitioned from a primary focus on cognitive factors to a sociocultural perspective in which writing is viewed not only as an acquired language skill and cognitive ability but also, more broadly, as a socially situated communicative act involving a target audience. Recently, particular attention has been paid to the integration of written texts with other media (multimodality) and to the mixing of languages in online media.
In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.
Statistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach.
In linguistics, grammaticality is determined by the conformity to language usage as derived by the grammar of a particular speech variety. The notion of grammaticality rose alongside the theory of generative grammar, the goal of which is to formulate rules that define well-formed, grammatical, sentences. These rules of grammaticality also provide explanations of ill-formed, ungrammatical sentences.
Coh-Metrix is a computational tool that produces indices of the linguistic and discourse representations of a text. Developed by Arthur C. Graesser and Danielle S. McNamara, Coh-Metrix analyzes texts on many different features.
Language complexity is a topic in linguistics which can be divided into several sub-topics such as phonological, morphological, syntactic, and semantic complexity. The subject also carries importance for language evolution.
Native-language identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in second-language acquisition, language teaching and forensic linguistics, amongst others.
Complex Dynamic Systems Theory in the field of linguistics is a perspective and approach to the study of second, third and additional language acquisition. The general term Complex Dynamic Systems Theory was recommended by Kees de Bot to refer to both Complexity theory and Dynamic systems theory.
Marjolijn Verspoor is a Dutch linguist. She is a professor of English language and English as a second language at the University of Groningen, Netherlands. She is known for her work on Complex Dynamic Systems Theory and the application of dynamical systems theory to study second language development. Her interest is also in second language writing.
Diane Larsen-Freeman is an American linguist. She is currently a Professor Emerita in Education and in Linguistics at the University of Michigan in Ann Arbor, Michigan. An applied linguist, known for her work in second language acquisition, English as a second or foreign language, language teaching methods, teacher education, and English grammar, she is renowned for her work on the complex/dynamic systems approach to second language development.
Cornelis Kees de Bot is a Dutch linguist. He is currently the chair of applied linguistics at the University of Groningen, Netherlands, and at the University of Pannonia. He is known for his work on second language development and the use of dynamical systems theory to study second language development.
Wander Marius Lowie is a Dutch linguist. He is currently a professor of applied linguistics at the Department of Applied Linguistics at the University of Groningen, Netherlands. He is known for his work on Complex Dynamic Systems Theory.
Rosa María Manchón Ruiz is a Spanish linguist. She is currently a professor of applied linguistics at the University of Murcia, Spain. Her research focuses on second language acquisition and second language writing. She was the editor of the Journal of Second Language Writing between 2008 and 2014.
Lourdes Ortega is a Spanish-born American linguist. She is currently a professor of applied linguistics at Georgetown University. Her research focuses on second language acquisition and second language writing. She is noted for her work on second language acquisition and for recommending that syntactic complexity needs to be measured multidimensionally.
Scott Andrew Crossley is an American linguist. He is a professor of applied linguistics at Vanderbilt University, United States. His research focuses on natural language processing and the application of computational tools and machine learning algorithms in learning analytics including second language acquisition, second language writing, and readability. His main interest area is the development and use of natural language processing tools in assessing writing quality and text difficulty.
Rosalind Ivanić is a Yugoslav-born British linguist. She is currently an honorary professor at the Department of Linguistics and English Language of Lancaster University, United Kingdom. Her research focuses on applied linguistics with a special focus on literacy, intertextuality, multimodal communication, adult literacy, educational linguistics, critical language awareness, punctuation, and second language writing. Along with Theo van Leeuwen and David Barton, she is considered one of the most prominent researchers on literacy.
Danielle S. McNamara is an educational researcher known for her theoretical and empirical work with reading comprehension and the development of game-based literacy technologies. She is professor of psychology and senior research scientist at Arizona State University. She has previously held positions at University of Memphis, Old Dominion University, and University of Colorado, Boulder.