Yacc

Last updated
Yacc
Original author(s) Stephen C. Johnson
Repository
Written in C
Operating system Unix, Unix-like, Plan 9, Inferno
Platform Cross-platform
Type Command
License Plan 9: MIT License

Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a compiler that tries to make syntactic sense of the source code) based on a formal grammar, written in a notation similar to Backus–Naur form (BNF). [1] Yacc is supplied as a standard utility on BSD and AT&T Unix. [2] GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement. [3]

Contents

History

In the early 1970s, Stephen C. Johnson, a computer scientist at Bell Labs / AT&T, developed Yacc because he wanted to insert an exclusive or operator into a B language compiler [4] (developed using McIlroy's TMG compiler-compiler [5] ), but it turned out to be a hard task. As a result, he was directed by his colleague at Bell Labs Al Aho to Donald Knuth's work on LR parsing, which served as the basis for Yacc. [4] Yacc was influenced by [6] and received its name in reference to TMG compiler-compiler. [7]

Yacc was originally written in the B programming language, but was soon rewritten in C by Alan Snyder. [5] It appeared as part of Version 3 Unix, [8] and a full description of Yacc was published in 1975. [6]

Johnson used Yacc to create the Portable C Compiler. [8] Bjarne Stroustrup also attempted to use Yacc to create a formal specification of C++, but "was defeated by C's syntax". [9] While finding it unsuitable for a formal specification of the language, Stroustrup did proceed to use Yacc to implement Cfront, the first implementation of C++. [10]

In a 2008 interview, Johnson reflected that "the contribution Yacc made to the spread of Unix and C is what I'm proudest of". [11]

Description

The input to Yacc is a grammar with snippets of C code (called "actions") attached to its rules. Its output is a shift-reduce parser in C that executes the C snippets associated with each rule as soon as the rule is recognized. Typical actions involve the construction of parse trees. Using an example from Johnson, if the call node(label, left, right) constructs a binary parse tree node with the specified label and children, then the rule

expr:expr'+'expr{$$=node('+',$1,$3);}

recognizes summation expressions and constructs nodes for them. The special identifiers $$, $1 and $3 refer to items on the parser's stack. [6]

Yacc produces only a parser (phrase analyzer) which can be used alone in the case of scannerless parsing however, full syntactic analysis typically requires an external lexical analyzer to perform a tokenization stage first (word analysis), which is then followed by the parsing stage proper. [6] Lexical analyzer generators, such as Lex or Flex, are widely available for this purpose. The IEEE POSIX P1003.2 standard defines the functionality and requirements for both Lex and Yacc. [12]

Some versions of AT&T Yacc have become open source. For example, source code is available with the standard distributions of Plan 9. [13]

Impact

Yacc and similar programs (largely reimplementations) have been very popular. Yacc itself used to be available as the default parser generator on most Unix systems, though it has since been supplanted by more recent, largely compatible, programs such as Berkeley Yacc, GNU Bison, MKS Yacc, and Abraxas PCYACC. An updated version of the original AT&T Yacc is included as part of Sun's OpenSolaris project. Each offers slight improvements and additional features over the original Yacc, but the concept and basic syntax have remained the same. [14]

Yacc was also one of several UNIX tools available for Charles River Data Systems' UNOS operating system under Bell Laboratories license. [15]

Among the languages that were first implemented with Yacc are AWK, C++, [10] eqn and Pic. [16] Yacc was also used on Unix to implement the Portable C Compiler, as well as parsers for such programming languages as FORTRAN 77, Ratfor, APL, bc, m4, etc. [8] [17]

Yacc has also been rewritten for other languages, including OCaml, [18] Ratfor, ML, Ada, Pascal, Java, Python, Ruby, Go, [19] Common Lisp [20] and Erlang. [21]

See also

Related Research Articles

B is a programming language developed at Bell Labs circa 1969 by Ken Thompson and Dennis Ritchie.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

Cfront was the original compiler for C++ from around 1983, which converted C++ to C; developed by Bjarne Stroustrup at AT&T Bell Labs. The preprocessor did not understand all of the language and much of the code was written via translations. Cfront had a complete parser, built symbol tables, and built a tree for each class, function, etc. Cfront was based on CPre, a C compiler started in 1979.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for a large language model's data preprocessing, that encodes text into numerical tokens, using byte pair encoding.

Lex is a computer program that generates lexical analyzers. It is commonly used with the yacc parser generator and is the standard lexical analyzer generator on many Unix and Unix-like systems. An equivalent tool is specified as part of the POSIX standard.

<span class="mw-page-title-main">Alfred Aho</span> Canadian computer scientist

Alfred Vaino Aho is a Canadian computer scientist best known for his work on programming languages, compilers, and related algorithms, and his textbooks on the art and science of computer programming.

<span class="mw-page-title-main">Douglas McIlroy</span> American mathematician and computer scientist

Malcolm Douglas McIlroy is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed several Unix tools, such as spell, diff, sort, join, graph, speak, and tr. He was also one of the pioneering researchers of macro processors and programming language extensibility. He participated in the design of multiple influential programming languages, particularly PL/I, SNOBOL, ALTRAN, TMG and C++.

Flex is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers . It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems, or together with GNU bison in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License, although a manual for Flex was produced and published by the Free Software Foundation.

JavaCC is an open-source parser generator and lexical analyzer generator written in the Java programming language.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are result of attribute evaluation rules associated with productions of the grammar. Attributes allow to transfer information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the most popular version of Yacc. It has the advantages of being written in ANSI C89 and being public domain software.

This is a list of notable lexer generators and parser generators for various language classes.

Lemon is a parser generator, maintained as part of the SQLite project, that generates a look-ahead LR parser in the programming language C from an input context-free grammar. The generator is quite simple, implemented in one C source file with another file used as a template for output. Lexical analysis is performed externally.

The Portable C Compiler is an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan Snyder in 1973, and "distributed as the C compiler by Bell Labs... with the blessing of Dennis Ritchie."

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

A lookahead LR parser (LALR) generator is a software tool that reads a context-free_grammar (CFG) and creates an LALR parser which is capable of parsing files written in the context-free language defined by the CFG. LALR parsers are desirable because they are very fast and small in comparison to other types of parsers.

<span class="mw-page-title-main">TMG (language)</span>

In computing TMG (TransMoGrifier) is a recursive descent compiler-compiler developed by Robert M. McClure and presented in 1965. TMG ran on systems including OS/360 and early Unix. It was used to build EPL, an early version of PL/I.

RE/flex is a free and open source computer program written in C++ that generates fast lexical analyzers in C++. RE/flex offers full Unicode support, indentation anchors, word boundaries, lazy quantifiers, and performance tuning options. RE/flex accepts Flex lexer specifications and offers options to generate scanners for Bison parsers. RE/flex includes a fast C++ regular expression library.

References

  1. "The A-Z of Programming Languages: YACC". Computerworld. Archived from the original on 31 January 2013. Retrieved 30 November 2012.
  2. Levine, John (1992). Lex & yacc . Sebastopol, CA: O'Reilly & Associates. p. xx. ISBN   1-56592-000-7.
  3. Levine, John (2009). Flex & bison. Sebastopol, Calif: O'Reilly Media. p. xv. ISBN   978-0-596-15597-1.
  4. 1 2 Morris, Richard (1 October 2009). "Stephen Curtis Johnson: Geek of the Week". Red Gate Software. Retrieved 19 January 2018.
  5. 1 2 Ritchie, Dennis M. (April 1993). "The Development of the C Language". History of programming languages---II. Association for Computing Machinery, Inc. (published 1996-01-01). doi: 10.1145/234286.1057834 . ISBN   0-201-89502-1. pp. 675, 684: After the TMG version of B was working, Thompson rewrote B in itself(a bootstrapping step).…When Johnson returned to Bell Labs in 1973, he was disconcerted to find that the language whose seeds he had brought to Canada had evolved back home; even his own yacc program had been rewritten in C, by Alan Snyder.
  6. 1 2 3 4 Johnson, Stephen C. (1975). Yacc: Yet Another Compiler-Compiler (Technical report). Murray Hill, New Jersey: AT&T Bell Laboratories. 32. Retrieved 31 January 2020.
  7. "Early Translator Writing Systems". Atlas Computer Laboratory.
  8. 1 2 3 McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  9. Stroustrup, Bjarne. "A History of C++: 1979−1991" (PDF).
  10. 1 2 Stroustrup, Bjarne. "Cfront source code".
  11. Hamilton, Naomi (2008-07-09). "Yacc, Unix, and advice from Bell Labs alumni Stephen Johnson". www.computerworld.com. Archived from the original on 2020-08-22. Retrieved 2020-11-10.
  12. lex   Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group, yacc   Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group.
  13. "plan9: UC Berkeley release of Plan 9 under the GPLv2". GitHub . 26 December 2017. Retrieved 2 January 2018.
  14. Bison Manual: History
  15. The Insider's Guide To The Universe (PDF). Charles River Data Systems, Inc. 1983. p. 13.
  16. "UNIX Special: Profs Kernighan & Brailsford". Computerphile. September 30, 2015. Archived from the original on 2021-12-11.
  17. Kernighan, Brian W.; Pike, Rob (1984). The Unix Programming Environment . Prentice Hall. ISBN   0-13-937681-X.
  18. "OCaml User's Manual: Chapter 12 Lexer and parser generators (ocamllex, ocamlyacc)" . Retrieved 25 Nov 2013.
  19. "Yacc.go: A version of Yacc for the Go Programming Language" . Retrieved 15 July 2017.
  20. "CL-Yacc: A Common Lisp version of Yacc".
  21. "yecc: An Erlang implementation of Yacc".
  22. John Levine (August 2009), flex & bison, O'Reilly Media