Parsec (parser)

Last updated
Parsec
Original author(s) Daan Leijen, Paolo Martini, Antoine Latter
Developer(s) Herbert Valerio Riedel, Derek Elkins, Antoine Latter, Roman Cheplyaka, Ryan Scott
Initial releaseNovember 2, 2006;17 years ago (2006-11-02) [1]
Stable release
3.1.14.0 / August 10, 2019;4 years ago (2019-08-10) [2]
Repository github.com/haskell/parsec
Written in Haskell
Operating system Linux, macOS, Windows
Platform Haskell Platform
Type Parser combinator, Library
License BSD-3
Website hackage.haskell.org/package/parsec

Parsec is a library for writing parsers in Haskell. [3] It is based on higher-order parser combinators, so a complicated parser can be made out of many smaller ones. [4] It has been reimplemented in many other languages, including Erlang, [5] Elixir, [6] OCaml, [7] Racket, [8] and F#, [9] [10] as well as imperative languages such as C#, [11] and Java. [12]

Contents

Because a parser combinator-based program is generally slower than a parser generator-based program,[ citation needed ] Parsec is normally used for small domain-specific languages, while Happy is used for compilers such as GHC. [13]

Other Haskell parser combinator libraries that have been derived from Parsec include Megaparsec [14] and Attoparsec. [15]

Parsec is free software released under the BSD-3-Clause license. [16]

Example

Parsers written in Parsec start with simpler parsers, such as ones that recognize certain strings, and combine them to build a parser with more complicated behavior. For example, digit parses a digit, and string parses a specific string (like "hello").

Parser combinator libraries like Parsec provide utility functions to run the parsers on real values. A parser to recognize a single digit from a string can be split into two functions: one to create the parser, and a main function that calls one of these utility functions (parse in this case) to run the parser:

importText.Parsec-- has general parsing utility functionsimportText.Parsec.Char-- contains specific basic combinatorstypeParser=StreamsmChar=>ParsecTsumStringparser::Parserparser=string"hello"main::IO()main=print(parseparser"<test>""hello world")-- prints 'Right "hello"'

We define a Parser type to make the type signature of parser easier to read. If we wanted to alter this program, say to read either the string "hello" or the string "goodbye", we could use the operator <|>, provided by the Alternative typeclass, to combine two parsers into a single parser that tries either:

parser=string"hello"<|>string"goodbye"

Related Research Articles

Rebol is a cross-platform data exchange language and a multi-paradigm dynamic programming language designed by Carl Sassenrath for network communications and distributed computing. It introduces the concept of dialecting: small, optimized, domain-specific languages for code and data, which is also the most notable property of the language according to its designer Carl Sassenrath:

Although it can be used for programming, writing functions, and performing processes, its greatest strength is the ability to easily create domain-specific languages or dialects

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

String functions are used in computer programming languages to manipulate a string or query information about a string.

A combinator library is a software library which implements combinators for a functional programming language; "the key idea is this: a combinator library offers functions that combine functions together to make bigger functions". These kinds of libraries are particularly useful for allowing domain-specific programming languages to be easily embedded into a general purpose language by defining a few primitive functions for the given domain and turning over the task of expanding higher-level constructs to the general language. An example would be the monadic Parsec parser for Haskell. The library approach allows the parsers to be first-class citizens of the language.

QuickCheck is a software library, specifically a combinator library, originally written in the programming language Haskell, designed to assist in software testing by generating test cases for test suites – an approach known as property testing.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for programming languages. It was developed by Facebook. Since 2020, it is an open source project in the Apache Software Foundation.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. It was created by Austin Appleby in 2008 and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants, all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

In functional programming, a result type is a monadic type holding a returned value or an error code. They provide an elegant way of handling errors, without resorting to exception handling; when a function that may fail returns a result type, the programmer is forced to consider success or failure paths, before getting access to the expected result; this eliminates the possibility of an erroneous programmer assumption.

<span class="mw-page-title-main">Yesod (web framework)</span>

Yesod is a web framework based on the programming language Haskell for productive development of type-safe, representational state transfer (REST) model based, high performance web applications, developed by Michael Snoyman, et al. It is free and open-source software released under an MIT License.

<span class="mw-page-title-main">Elm (programming language)</span> Functional programming language

Elm is a domain-specific programming language for declaratively creating web browser-based graphical user interfaces. Elm is purely functional, and is developed with emphasis on usability, performance, and robustness. It advertises "no runtime exceptions in practice", made possible by the Elm compiler's static type checking.

jq (programming language) Programming language for JSON

jq is a very high-level lexically scoped functional programming language in which every JSON value is a constant. jq supports backtracking and managing indefinitely long streams of JSON data. It is related to the Icon and Haskell programming languages. The language supports a namespace-based module system and has some support for closures. In particular, functions and functional expressions can be used as parameters of other functions.

References

  1. "parsec 2.0". Hackage. Retrieved 3 September 2019.
  2. "Releases". Github. Retrieved 3 September 2019.
  3. "Parsec on Haskell wiki". Haskell Wiki. Retrieved 29 May 2017.
  4. Leijen, Daan; Meijer, Erik (July 2001). "Parsec: Direct Style Monadic Parser Combinators For The Real World" (PDF). Microsoft Research. Retrieved 22 November 2014.
  5. "Parsec Erlang". BitBucket. Retrieved 23 November 2014.
  6. "Nimble Parsec". Github. Retrieved 18 December 2018.
  7. "Parsec OCaml" (PDF). The OCaml Summer Project. Retrieved 23 November 2014.
  8. "Megaparsack: Practical Parser Combinators".
  9. "XParsec by corsis". XParsec. Retrieved 29 May 2017.
  10. "FParsec". Quanttec. Retrieved 29 May 2017.
  11. "CSharp monad". Github. Retrieved 10 December 2014.
  12. "JParsec". Github. Retrieved 14 October 2016.
  13. "The Glasgow Haskell Compiler (AOSA Vol. 2)". The Architecture of Open Source Applications. Retrieved 23 November 2014.
  14. "megaparsec: Monadic parser combinators". Hackage. Retrieved 2018-09-10.
  15. "attoparsec: Fast combinator parsing for bytestrings and text". Hackage. Retrieved 2018-09-10.
  16. "Parsec". 25 October 2021.