James R. Curran is a computational linguist and was a senior lecturer at the University of Sydney and former CEO of Grok Academy. He holds a PhD in Informatics from the University of Edinburgh.
Curran's research focuses on natural language processing. [1] Specifically Curran's research has focused on the area of natural language processing known as combinatory categorial grammar parsing. In addition to his contributions to NLP, Curran has produced a paper on the development of search engines to assist in driving problem based learning. [2]
Within NLP, he has published papers on combinatory categorial grammar parsing as well as question answering systems. [3]
Curran has co-authored software packages such as C&C tools, [4] a CCG parser (with Stephen Clark). [5]
In addition to his work as a University of Sydney lecturer, Curran directed the National Computer Science School, an annual summer school for technologically talented high school students. [6] In 2013, based on their work with NCSS, he and a number of other organisers founded Grok Learning.
In 2013 he was one of the authors of the Digital Technologies section of the Australian Curriculum - its first appearance in the national curriculum. [7] Additionally, he acted as an advocate for digital literacy among Australian students. [8] [9]
He was the academic director of the Australian Computing Academy, a not-for-profit within the University of Sydney [10] until its merger with Grok Learning in 2021 to form Grok Academy. [11]
In October 2024 he resigned from his position as CEO and board member of Grok Academy after multiple allegations of harassment were substantiated by an independent investigator. [12] It was reported that over a 10-year span there were nine women, including six who were in high school at the time, that allege Curran sent them inappropriate messages.
Additionally, it was revealed that a 2019 University of Sydney investigation found 35 cases of harassment, after which he received a warning and a 2024 University of New South Wales investigation was referred to the NSW police, who took no action as they found no criminal wrongdoing by Curran, in part because the students were over 16 at the time of the alleged harassment. [12]
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.
Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.
Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas link grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.
Richard Merritt Montague was an American mathematician and philosopher who made contributions to mathematical logic and the philosophy of language. He is known for proposing Montague grammar to formalize the semantics of natural language. As a student of Alfred Tarski, he also contributed early developments to axiomatic set theory (ZFC). For the latter half of his life, he was a professor at the University of California, Los Angeles until his early death, believed to be a homicide, at age 40.
Coco/R is a compiler generator that takes wirth syntax notation grammars of a source language and generates a scanner and a parser for that language.
Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.
Vaughan Pratt is a Professor Emeritus at Stanford University, who was an early pioneer in the field of computer science. Since 1969, Pratt has made several contributions to foundational areas such as search algorithms, sorting algorithms, and primality testing. More recently, his research has focused on formal modeling of concurrent systems and Chu spaces.
The National Council for the Social Studies (NCSS) is a private, non-profit association based in Silver Spring, Maryland, that provides leadership, support, and advocacy for social studies education.
Northholm Grammar School is an independent Anglican co-educational primary and secondary day school, located in Arcadia.
Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.
Eugene Charniak was a professor of computer Science and cognitive Science at Brown University. He held an A.B. in Physics from the University of Chicago and a Ph.D. from M.I.T. in Computer Science. His research was in the area of language understanding or technologies which relate to it, such as knowledge representation, reasoning under uncertainty, and learning. Since the early 1990s he was interested in statistical techniques for language understanding. His research in this area included work in the subareas of part-of-speech tagging, probabilistic context-free grammar induction, and, more recently, syntactic disambiguation through word statistics, efficient syntactic parsing, and lexical resource acquisition through statistical means.
Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.
Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.
Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory in order to make generalizations or form concepts from training examples. It is also linked with Encoding (memory) to help with Learning.
Prof. Vivian James Cook was a British linguist who was Emeritus Professor of Applied Linguistics at Newcastle University.
Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.
Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.
In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.
Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.