James Curran (linguist)

Last updated
James Curran in 2015 James Curran, 2015.jpg
James Curran in 2015

James R. Curran is a computational linguist and senior lecturer at the University of Sydney. He holds a PhD in Informatics from the University of Edinburgh.

Contents

Research

Curran's research focuses on natural language processing, [1] making him one of the few Australian computational linguists. Specifically Curran's research has focused on the area of natural language processing known as combinatory categorial grammar parsing. In addition to his contributions to NLP, Curran has produced a paper on the development of search engines to assist in driving problem based learning. [2]

Within NLP, he has published papers on combinatory categorial grammar parsing as well as question answering systems. [3]

Works

Curran has co-authored software packages such as C&C tools, [4] a CCG parser (with Stephen Clark). [5]

Educational work

In addition to his work as a University of Sydney lecturer, Curran directs the National Computer Science School, an annual summer school for technologically talented high school students. In 2013, based on their work with NCSS, he and a number of other organisers founded Grok Learning.

Curran is also an advocate for embedding computational literacy in the general curriculum, participating in the development of Australia's National Digital Curriculum and presenting the "Python for Every Child in Australia" keynote at PyCon Australia 2014. [6] [7]

Related Research Articles

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas link grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

<span class="mw-page-title-main">Richard Montague</span> American mathematician

Richard Merritt Montague was an American mathematician and philosopher who made contributions to mathematical logic and the philosophy of language. He is known for proposing Montague grammar to formalize the semantics of natural language. As a student of Alfred Tarski, he also contributed early developments to axiomatic set theory (ZFC). For the latter half of his life, he was a professor at the University of California, Los Angeles until his early death, believed to be a homicide, at age 40.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It supports classification, tokenization, stemming, tagging, parsing, and semantic reasoning functionalities. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Mark Jerome Steedman, is a computational linguist and cognitive scientist.

Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.

Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory in order to make generalizations or form concepts from training examples. It is also linked with Encoding (memory) to help with Learning.

Deep Linguistic Processing with HPSG - INitiative (DELPH-IN) is a collaboration where computational linguists worldwide develop natural language processing tools for deep linguistic processing of human language. The goal of DELPH-IN is to combine linguistic and statistical processing methods in order to computationally understand the meaning of texts and utterances.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.

PyMC is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.

<span class="mw-page-title-main">PyTorch</span> Open source machine learning library

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

<span class="mw-page-title-main">David M. Beazley</span> American software engineer

David Beazley is an American software engineer. He has made significant contributions to the Python developer community, which includes writing the definitive Python reference text Python Essential Reference, the SWIG software tool for creating language agnostic C and C++ extensions, and the PLY parsing tool. He has served on the program committees for PyCon and the O'Reilly Open Source Convention, and was elected a fellow of the Python Software Foundation in 2002.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

References

  1. Scientific Commons
  2. Steven Bird; James R. Curran. "Building a Search Engine to Drive Problem-Based Learning" (PDF). Archived from the original (PDF) on 25 July 2008.
  3. "James R. Curran - Publications". James R. Curran Homepage. 10 June 2010. Archived from the original on 10 June 2010.
  4. "C&C tools". svn.ask.it.usyd.edu.au/trac/candc. 24 June 2007. Archived from the original on 24 June 2007.
  5. James R. Curran; Stephen Clark; Johan Bos. "Linguistically Motivated Large-Scale NLP with C&C and Boxer" (PDF).
  6. "PyCon Australia keynote announcement" . Retrieved 2 May 2016.
  7. "Python for Every Child in Australia". YouTube . Archived from the original on 2021-12-15. Retrieved 2 May 2016.