Syntactic parsing (computational linguistics)

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations (in dependency grammar) and labelling spans of constituents (in constituency grammar).^[1] It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

Different theories of grammar propose different formalisms for describing the syntactic structure of sentences. For computational purposes, these formalisms can be grouped under constituency grammars and dependency grammars. Parsers for either class call for different types of algorithms, and approaches to the two problems have taken different forms. The creation of human-annotated treebanks using various formalisms (e.g. Universal Dependencies) has proceeded alongside the development of new algorithms and methods for parsing.

Part-of-speech tagging (which resolves some semantic ambiguity) is a related problem, and often a prerequisite for or a subproblem of syntactic parsing.

Constituency parsing

Constituency parsing involves parsing in accordance with constituency grammar formalisms, such as Minimalism or the formalism of the Penn Treebank. This, at the very least, means telling which spans are constituents (e.g. [The man] is here.) and what kind of constituent it is (e.g. [The man] is a noun phrase) on the basis of a context-free grammar which encodes rules for constituent formation and merging.^[2]

CKY

The most popular algorithm for constituency parsing is the Cocke–Kasami–Younger algorithm (CKY),^[3]^[4] which is a dynamic programming algorithm which constructs a parse in worst-case ${\mathcal {O}}\left(n^{3}\cdot \left|G\right|\right)$ time, on a sentence of $n$ words and $\left|G\right|$ is the size of a context-free grammar (CFG) given in Chomsky Normal Form.

Given the issue of ambiguity (e.g. preposition-attachment ambiguity in English) leading to multiple acceptable parses, it is necessary to be able to score the probability of parses to pick the most probable one. One way to do this is by using a probabilistic context-free grammar (PCFG) which has a probability of each constituency rule, and modifying CKY to maximise probabilities when parsing bottom-up.^[5]^[6]^[7]

A further modification is the lexicalized PCFG, which assigns a head to each constituent and encodes rule for each lexeme in that head slot. Thus, where a PCFG may have a rule "NP → DT NN" (a noun phrase is a determiner and a noun) while a lexicalized PCFG will specifically have rules like "NP(dog) → DT NN(dog)" or "NP(person)" etc. In practice this leads to large performance improvements.^[8]^[9]

More recent work does neural scoring of span probabilities (which can take into account context unlike (P)CFGs) to feed to CKY, such as by using a recurrent neural network or transformer^[10] on top of word embeddings.

Dependency parsing

Dependency parsing is parsing according to a dependency grammar formalism, such as Universal Dependencies (which is also a project that produces multilingual dependency treebanks). This means assigning a head (or multiple heads sometimes, in the case of coordination) to every token and a corresponding dependency relation, eventually constructing a tree or graph over the whole sentence.^[11]

Transition-based parsing

Most modern approaches to dependency tree parsing use transition-based parsing (the base form of this is sometimes called arc-standard) as formulated by Joakim Nivre in 2003,^[12] which extends on shift-reduce parsing by keeping a running stack of tokens, and deciding from three operations for the next token encountered:

LeftArc (current token is a child of the top of the stack, is not added to stack)
RightArc (current token is the parent of the top of the stack, replaces top)
Shift (add current token to the stack)

The algorithm can be formulated as comparing the top two tokens of the stack (after adding the next token to the stack) or the top token on the stack and the next token in the sentence. To decide which operation to perform, modern methods use a neural oracle which is trained on contextualised word embeddings by constructing the transition-based parsing process on gold trees from a treebank. In the past, feature-based oracles were also common, with features chosen from part-of-speech tags, sentence position, morphological information, etc. This is an $O(n)$ greedy algorithm, so it does not guarantee the best possible parse or even a necessarily valid parse, but it is efficient.^[13]

A modification to this is arc-eager parsing, which adds another operation: Reduce (remove the top token on the stack). Practically, this results in earlier arc-formation.

These all only support projective trees so far, wherein edges do not cross given the token ordering from the sentence. For non-projective trees, one can modify arc-standard transition-based parsing to add the operation Swap (swap the top two tokens on the stack, assuming the formulation where the next token is always added to the stack first). This increases runtime to $O(n^{2})$ .^[13]

CKY adaptations

A chart-based dynamic programming approach to dependency parsing was proposed by Michael Collins (computational linguist)^[14] in 1996 and further optimised by Jason Eisner^[15] in the same year.^[16] This is an adaptation of CKY (previously mentioned for constituency parsing) to headed dependencies and some dynamic programming optimisations to reduce runtime to $O(n^{3})$ . Eisner suggested three different scoring methods for calculating span probabilities in his paper.

Spanning trees

The problem of parsing can also be modelled as finding a maximum-probability spanning arborescence over the graph of all possible dependency edges, and then picking dependency labels for the edges in tree we find. For this, we can use an extension of the Chu–Liu/Edmonds algorithm with an edge scorer and a label scorer. This algorithm was first described by Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič in 2005.^[17]

As before, the scorers can be neural (trained on word embeddings) or feature-based. This runs in $O(n^{2})$ with Tarjan's extension of the algorithm.^[18]

References

^ Jurafsky & Martin 2021.
^ Jurafsky & Martin 2021, chapter 13.
^ Younger, Daniel H. (1967). "Recognition and parsing of context-free languages in time n³". Information and Control. 10 (2): 189–208.
^ Kasami, T. (1966). "An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages". Bedford, MA: Air Force Cambridge Research Laboratory. {{cite journal}}: Cite journal requires |journal= (help)
^ Tsvetkov, Yulia; Titov, Ivan; Berg-Kirkpatrick, Taylor; Klein, Dan (2018). "Algorithms for NLP: Parsing I" (PDF). Algorithms for NLP. Carnegie Mellon University. Retrieved 29 September 2021.
^ Salomaa, Arto (1969). "Probabilistic and weighted grammars". Information and Control. 15 (6): 529–544.
^ Booth, Taylor L. (1969). "Probabilistic representation of formal languages". 10th Annual Symposium on Switching and Automata Theory.
^ Chen, Danqi; Narasimhan, Karthik (2019). "Constituency Parsing" (PDF). COS 484: Natural Language Processing.
^ Collins, Michael (2003). "Head-Driven Statistical Models for Natural Language Parsing". Computational Linguistics. 29 (4): 589–637.
^ Kitaev, Nikita; Klein, Dan (2018). "Constituency Parsing with a Self-Attentive Encoder". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: 2676–2686.
^ Jurafsky & Martin 2021, chapter 14.
^ Nivre, Joakim (2003). "An Efficient Algorithm for Projective Dependency Parsing". Proceedings of the Eighth International Conference on Parsing Technologies: 149–160.
^ ^a ^b Stymne, Sara (17 December 2014). "Transition-based dependency parsing" (PDF). Syntactic analysis (5LN455). Uppsala Universitet. Retrieved 22 October 2021.
^ Collins, Michael John (1996). "A New Statistical Parser Based on Bigram Lexical Dependencies". 34th Annual Meeting of the Association for Computational Linguistics: 184–191.
^ Eisner, Jason M. (1996). "Three New Probabilistic Models for Dependency Parsing: An Exploration". COLING.
^ Stymne, Sara (15 December 2014). "Collins' and Eisner's algorithms" (PDF). Syntactic analysis (5LN455). Uppsala Universitet. Retrieved 22 October 2021.
^ McDonald, Ryan; Pereira, Fernando; Ribarov, Kiril; Hajič, Jan (2005). "Non-Projective Dependency Parsing using Spanning Tree Algorithms". Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing: 523–530.
^ Goldberg, Yoav (2021). "Dependency Parsing" (PDF). Introduction to Natural Language Processing. Bar Ilan University. Retrieved 22 October 2021.