Syntactic parsing (computational linguistics)

Last updated

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations (in dependency grammar) and labelling spans of constituents (in constituency grammar). [1] It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

Contents

Different theories of grammar propose different formalisms for describing the syntactic structure of sentences. For computational purposes, these formalisms can be grouped under constituency grammars and dependency grammars. Parsers for either class call for different types of algorithms, and approaches to the two problems have taken different forms. The creation of human-annotated treebanks using various formalisms (e.g. Universal Dependencies) has proceeded alongside the development of new algorithms and methods for parsing.

Part-of-speech tagging (which resolves some semantic ambiguity) is a related problem, and often a prerequisite for or a subproblem of syntactic parsing. Syntactic parses can be used for information extraction (e.g. event parsing, semantic role labelling, entity labelling) and may be further used to extract formal semantic representations.

Constituency parsing

Constituency parsing involves parsing in accordance with constituency grammar formalisms, such as Minimalism or the formalism of the Penn Treebank. This, at the very least, means telling which spans are constituents (e.g. [The man] is here.) and what kind of constituent it is (e.g. [The man] is a noun phrase) on the basis of a context-free grammar (CFG) which encodes rules for constituent formation and merging. [2]

Algorithms generally require the CFG to be converted to Chomsky Normal Form (with two children per constituent), which can be done without losing any information about the tree or reducing expressivity using the algorithm first described by Hopcroft and Ullman in 1979. [3]

CKY

The most popular algorithm for constituency parsing is the Cocke–Kasami–Younger algorithm (CKY), [4] [5] which is a dynamic programming algorithm which constructs a parse in worst-case time, on a sentence of words and is the size of a CFG given in Chomsky Normal Form.

Given the issue of ambiguity (e.g. preposition-attachment ambiguity in English) leading to multiple acceptable parses, it is necessary to be able to score the probability of parses to pick the most probable one. One way to do this is by using a probabilistic context-free grammar (PCFG) which has a probability of each constituency rule, and modifying CKY to maximise probabilities when parsing bottom-up. [6] [7] [8]

A further modification is the lexicalized PCFG, which assigns a head to each constituent and encodes rule for each lexeme in that head slot. Thus, where a PCFG may have a rule "NP → DT NN" (a noun phrase is a determiner and a noun) while a lexicalized PCFG will specifically have rules like "NP(dog) → DT NN(dog)" or "NP(person)" etc. In practice this leads to some performance improvements. [9] [10]

More recent work does neural scoring of span probabilities (which can take into account context unlike (P)CFGs) to feed to CKY, such as by using a recurrent neural network or transformer [11] on top of word embeddings.

In 2022, Nikita Kitaev et al. [12] introduced an incremental parser that first learns discrete labels (out of a fixed vocabulary) for each input token given only the left-hand context, which are then the only inputs to a CKY chart parser with probabilities calculated using a learned neural span scorer. This approach is not only linguistically-motivated, but also competitive with previous approaches to constituency parsing. Their work won the best paper award at ACL 2022.

Transition-based

Following the success of transition-based parsing for dependency grammars, work began on adapting the approach to constituency parsing. The first such work was by Kenji Sagae and Alon Lavie in 2005, which relied on a feature-based classifier to greedily make transition decisions. [13] This was followed by the work of Yue Zhang and Stephen Clark in 2009, which added beam search to the decoder to make more globally-optimal parses. [14] The first parser of this family to outperform a chart-based parser was the one by Muhua Zhu et al. in 2013, which took on the problem of length differences of different transition sequences due to unary constituency rules (a non-existent problem for dependency parsing) by adding a padding operation. [15]

Note that transition-based parsing can be purely greedy (i.e. picking the best option at each time-step of building the tree, leading to potentially non-optimal or ill-formed trees) or use beam search to increase performance while not sacrificing efficiency.

Sequence-to-sequence

A different approach to constituency parsing leveraging neural sequence models was developed by Oriol Vinyals et al. in 2015. [16] In this approach, constituent parsing is modelled like machine translation: the task is sequence-to-sequence conversion from the sentence to a constituency parse, in the original paper using a deep LSTM with an attention mechanism. The gold training trees have to be linearised for this kind of model, but the conversion does not lose any information. This runs in with a beam search decoder of width 10 (but they found little benefit from greater beam size and even limiting it to greedy decoding performs well), and achieves competitive performance with traditional algorithms for context-free parsing like CKY.

Dependency parsing

Dependency parsing is parsing according to a dependency grammar formalism, such as Universal Dependencies (which is also a project that produces multilingual dependency treebanks). This means assigning a head (or multiple heads in some formalisms like Enhanced Dependencies, e.g. in the case of coordination) to every token and a corresponding dependency relation for each edge, eventually constructing a tree or graph over the whole sentence. [17]

There are broadly three modern paradigms for modelling dependency parsing: transition-based, grammar-based, and graph-based. [18]

Transition-based

Many modern approaches to dependency tree parsing use transition-based parsing (the base form of this is sometimes called arc-standard) as formulated by Joakim Nivre in 2003, [19] which extends on shift-reduce parsing by keeping a running stack of tokens, and deciding from three operations for the next token encountered:

The algorithm can be formulated as comparing the top two tokens of the stack (after adding the next token to the stack) or the top token on the stack and the next token in the sentence.

Training data for such an algorithm is created by using an oracle, which constructs a sequence of transitions from gold trees which are then fed to a classifier. The classifier learns which of the three operations is optimal given the current state of the stack, buffer, and current token. Modern methods use a neural classifier which is trained on word embeddings, beginning with work by Danqi Chen and Christopher Manning in 2014. [20] In the past, feature-based classifiers were also common, with features chosen from part-of-speech tags, sentence position, morphological information, etc.

This is an greedy algorithm, so it does not guarantee the best possible parse or even a necessarily valid parse, but it is efficient. [21] It is also not necessarily the case that a particular tree will have only one sequence of valid transitions that can reach it, so a dynamic oracle (which may permit multiple choices of operations) will increase performance. [22]

A modification to this is arc-eager parsing, which adds another operation: Reduce (remove the top token on the stack). Practically, this results in earlier arc-formation.

These all only support projective trees so far, wherein edges do not cross given the token ordering from the sentence. For non-projective trees, Nivre in 2009 modified arc-standard transition-based parsing to add the operation Swap (swap the top two tokens on the stack, assuming the formulation where the next token is always added to the stack first). This increases runtime to in the worst-case but practically still near-linear. [23]

Grammar-based

A chart-based dynamic programming approach to projective dependency parsing was proposed by Michael Collins [24] in 1996 and further optimised by Jason Eisner [25] in the same year. [26] This is an adaptation of CKY (previously mentioned for constituency parsing) to headed dependencies, a benefit being that the only change from constituency parsing is that every constituent is headed by one of its descendant nodes. Thus, one can simply specify which child provides the head for every constituency rule in the grammar (e.g. an NP is headed by its child N) to go from constituency CKY parsing to dependency CKY parsing.

McDonald's original adaptation had a runtime of , and Eisner's dynamic programming optimisations reduced runtime to . Eisner suggested three different scoring methods for calculating span probabilities in his paper.

Graph-based

Exhaustive search of the possible edges in the dependency tree, with backtracking in the case an ill-formed tree is created, gives the baseline runtime for graph-based dependency parsing. This approach was first formally described by Michael A. Covington in 2001, but he claimed that it was "an algorithm that has been known, in some form, since the 1960s". [27]

The problem of parsing can also be modelled as finding a maximum-probability spanning arborescence over the graph of all possible dependency edges, and then picking dependency labels for the edges in tree we find. Given this, we can use an extension of the Chu–Liu/Edmonds algorithm with an edge scorer and a label scorer. This algorithm was first described by Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič in 2005. [28] It can handle non-projective trees unlike the arc-standard transition-based parser and CKY. As before, the scorers can be neural (trained on word embeddings) or feature-based. This runs in with Tarjan's extension of the algorithm. [29]

Evaluation

The performance of syntactic parsers is measured using standard evaluation metrics. Both constituency and dependency parsing approaches can be evaluated for the ratio of exact matches (percentage of sentences that were perfectly parsed), and precision, recall, and F1-score calculated based on the correct constituency or dependency assignments in the parse relative to that number in reference and/or hypothesis parses. The latter are also known as the PARSEVAL metrics. [30]

Dependency parsing can also be evaluated using attachment score. Unlabelled attachment score (UAS) is the percentage of tokens with correctly assigned heads, while labelled attachment score (LAS) is the percentage of tokens with correctly assigned heads and dependency relation labels. [31]

Conversion between parses

Given that much work on English syntactic parsing depended on the Penn Treebank, which used a constituency formalism, many works on dependency parsing developed ways to deterministically convert the Penn formalism to a dependency syntax, in order to use it as training data. One of the major conversion algorithms was Penn2Malt, which reimplemented previous work on the problem. [32]

Work in the dependency-to-constituency conversion direction benefits from the faster runtime of dependency parsing algorithms. One approach is using constrained CKY parsing, ignoring spans which obviously violate the dependency parse's structure and thus reducing runtime to . [33] Another approach is to train a classifier to find an ordering for all the dependents of every token, which results in a structure isomorphic to the constituency parse. [34]

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural language sentence into its constituent parts, also known as syntactic categories, including both lexical categories and phrasal categories. A grammar that uses phrase structure rules is a type of phrase structure grammar. Phrase structure rules as they are commonly employed operate according to the constituency relation, and a grammar that employs phrase structure rules is therefore a constituency grammar; as such, it stands in contrast to dependency grammars, which are based on the dependency relation.

<span class="mw-page-title-main">Parse tree</span> Tree in formal language theory

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Head-driven phrase structure grammar (HPSG) is a highly lexicalized, constraint-based grammar developed by Carl Pollard and Ivan Sag. It is a type of phrase structure grammar, as opposed to a dependency grammar, and it is the immediate successor to generalized phrase structure grammar. HPSG draws from other fields such as computer science and uses Ferdinand de Saussure's notion of the sign. It uses a uniform formalism and is organized in a modular way which makes it attractive for natural language processing.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas Link Grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

Generalized phrase structure grammar (GPSG) is a framework for describing the syntax and semantics of natural languages. It is a type of constraint-based phrase structure grammar. Constraint based grammars are based around defining certain syntactic processes as ungrammatical for a given language and assuming everything not thus dismissed is grammatical within that language. Phrase structure grammars base their framework on constituency relationships, seeing the words in a sentence as ranked, with some words dominating the others. For example, in the sentence "The dog runs", "runs" is seen as dominating "dog" since it is the main focus of the sentence. This view stands in contrast to dependency grammars, which base their assumed structure on the relationship between a single word in a sentence and its dependents.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Constraint grammar (CG) is a methodological paradigm for natural language processing (NLP). Linguist-written, context-dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation, inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally or globally. Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.

Linguistic categories include

<span class="mw-page-title-main">Quranic Arabic Corpus</span>

The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.

Deep Linguistic Processing with HPSG - INitiative (DELPH-IN) is a collaboration where computational linguists worldwide develop natural language processing tools for deep linguistic processing of human language. The goal of DELPH-IN is to combine linguistic and statistical processing methods in order to computationally understand the meaning of texts and utterances.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.

Manning's Law describes the combination of principles that need to be balanced in the design and growth of universal linguistic dependencies. These dependencies are used to describe and model syntactic relations, for all languages. This supports natural language processing, and is a major topic, with its own event, thousands of linguistics and AI researchers working with and on it, and widely-adopted. The law was put forward by Christopher D. Manning.

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations.

Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages. These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time, there are just over 200 treebanks of more than 100 languages available in the UD inventory.

References

  1. Jurafsky & Martin 2021.
  2. Jurafsky & Martin 2021, chapter 13.
  3. Lange, Martin; Leiß, Hans (2009). "To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm" (PDF). Informatica Didactica. 8.
  4. Younger, Daniel H. (1967). "Recognition and parsing of context-free languages in time n3". Information and Control. 10 (2): 189–208. doi: 10.1016/S0019-9958(67)80007-X .
  5. Kasami, T. (1966). "An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages". Bedford, MA: Air Force Cambridge Research Laboratory.{{cite journal}}: Cite journal requires |journal= (help)
  6. Tsvetkov, Yulia; Titov, Ivan; Berg-Kirkpatrick, Taylor; Klein, Dan (2018). "Algorithms for NLP: Parsing I" (PDF). Algorithms for NLP. Carnegie Mellon University. Retrieved 29 September 2021.
  7. Salomaa, Arto (1969). "Probabilistic and weighted grammars". Information and Control. 15 (6): 529–544. doi: 10.1016/S0019-9958(69)90554-3 .
  8. Booth, Taylor L. (1969). Probabilistic representation of formal languages. 10th Annual Symposium on Switching and Automata Theory.
  9. Chen, Danqi; Narasimhan, Karthik (2019). "Constituency Parsing" (PDF). COS 484: Natural Language Processing.
  10. Collins, Michael (2003). "Head-Driven Statistical Models for Natural Language Parsing". Computational Linguistics. 29 (4): 589–637. doi: 10.1162/089120103322753356 . S2CID   7901127.
  11. Kitaev, Nikita; Klein, Dan (2018). Constituency Parsing with a Self-Attentive Encoder. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. pp. 2676–2686.
  12. Kitaev, Nikita; Lu, Thomas; Klein, Dan (2022). Learned Incremental Representations for Parsing. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. pp. 3086–3095.
  13. Sagae, Kenji; Lavie, Alon (2005). A Classifier-Based Parser with Linear Run-Time Complexity. Proceedings of the Ninth International Workshop on Parsing Technology. pp. 125–132.
  14. Zhang, Yue; Clark, Stephen (2009). Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model. Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09). pp. 162–171.
  15. Zhu, Muhua; Zhang, Yue; Chen, Wenliang; Zhang, Min; Zhu, Jingbo (2013). Fast and Accurate Shift-Reduce Constituent Parsing. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. pp. 434–443.
  16. Vinyals, Oriol; Kaiser, Łukasz; Koo, Terry; Petrov, Slav; Sutskever, Ilya; Hinton, Geoffrey (2015). Grammar as a Foreign Language. Advances in Neural Information Processing Systems 28 (NIPS 2015).
  17. Jurafsky & Martin 2021, chapter 14.
  18. Kübler, McDonald & Nivre 2009.
  19. Nivre, Joakim (2003). An Efficient Algorithm for Projective Dependency Parsing. Proceedings of the Eighth International Conference on Parsing Technologies. pp. 149–160.
  20. Chen, Danqi; Manning, Christopher (2014). A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 740–750.
  21. Stymne, Sara (17 December 2014). "Transition-based dependency parsing" (PDF). Syntactic analysis (5LN455). Uppsala Universitet. Retrieved 22 October 2021.
  22. Goldberg, Yoav; Nivre, Joakim (2012). A Dynamic Oracle for Arc-Eager Dependency Parsing. Proceedings of COLING 2012. pp. 959–976.
  23. Nivre, Joakim (2009). Non-Projective Dependency Parsing in Expected Linear Time. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. pp. 351–359.
  24. Collins, Michael John (1996). A New Statistical Parser Based on Bigram Lexical Dependencies. 34th Annual Meeting of the Association for Computational Linguistics. pp. 184–191.
  25. Eisner, Jason M. (1996). Three New Probabilistic Models for Dependency Parsing: An Exploration. COLING.
  26. Stymne, Sara (15 December 2014). "Collins' and Eisner's algorithms" (PDF). Syntactic analysis (5LN455). Uppsala Universitet. Retrieved 22 October 2021.
  27. Covington, Michael A. (2001). A Fundamental Algorithm for Dependency Parsing. Proceedings of the 39th Annual ACM Southeast Conference. CiteSeerX   10.1.1.136.7335 .
  28. McDonald, Ryan; Pereira, Fernando; Ribarov, Kiril; Hajič, Jan (2005). Non-Projective Dependency Parsing using Spanning Tree Algorithms. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. pp. 523–530.
  29. Goldberg, Yoav (2021). "Dependency Parsing" (PDF). Introduction to Natural Language Processing. Bar Ilan University. Retrieved 22 October 2021.
  30. Black, E.; Abney, S.; Flickenger, D.; Gdaniec, C.; Grishman, R.; Harrison, P.; Hindle, D.; Ingria, R.; Jelinek, F.; Klavans, J.; Liberman, M.; Marcus, M.; Roukos, S.; Santorini, B.; Strzalkowski, T. (1991). A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991.
  31. Kübler, McDonald & Nivre 2009, chapter 6.
  32. Nivre, Joakim; Hall, Johan; Nilsson, Jens (2006). MaltParser: A Data-Driven Parser-Generator for Dependency Parsing. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06).
  33. Kong, Lingpeng; Rush, Alexander M.; Smith, Noah A. (2015). Transforming Dependencies into Phrase Structures. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 788–798.
  34. Fernández-González, Daniel; Martins, André F. T. (2015). Parsing as Reduction. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. pp. 1523–1533.

Further reading

Dependency parsing