Parsec (parser)

Last updated • 1 min readFrom Wikipedia, The Free Encyclopedia
Parsec
Original author(s) Daan Leijen, Paolo Martini, Antoine Latter
Developer(s) Herbert Valerio Riedel, Derek Elkins, Antoine Latter, Roman Cheplyaka, Ryan Scott
Initial releaseNovember 2, 2006;18 years ago (2006-11-02) [1]
Stable release
3.1.17.0 / April 5, 2024;7 months ago (2024-04-05) [2]
Repository github.com/haskell/parsec
Written in Haskell
Operating system Linux, macOS, Windows
Platform Haskell Platform
Available inEnglish
Type Parser combinator, library
License BSD-2-clause
Website hackage.haskell.org/package/parsec

Parsec is a library for writing parsers written in the programming language Haskell. [3] It is based on higher-order parser combinators, so a complicated parser can be made out of many smaller ones. [4] It has been reimplemented in many other languages, including Erlang, [5] Elixir, [6] OCaml, [7] Racket, [8] F#, [9] [10] and the imperative programming languages C#, [11] and Java. [12]

Contents

Because a parser combinator-based program is generally slower than a parser generator-based program,[ citation needed ] Parsec is normally used for small domain-specific languages, while Happy is used for compilers such as the Glasgow Haskell Compiler (GHC). [13]

Other Haskell parser combinator libraries that have been derived from Parsec include Megaparsec [14] and Attoparsec. [15]

Parsec is free software released under the BSD-3-Clause license. [16]

Example

Parsers written in Parsec start with simpler parsers, such as ones that recognize certain strings, and combine them to build a parser with more complicated behavior. For example, digit parses a digit, and string parses a specific string (like "hello").

Parser combinator libraries like Parsec provide utility functions to run the parsers on real values. A parser to recognize a single digit from a string can be split into two functions: one to create the parser, and a main function that calls one of these utility functions (parse in this case) to run the parser:

importText.Parsec-- has general parsing utility functionsimportText.Parsec.Char-- contains specific basic combinatorstypeParser=StreamsmChar=>ParsecTsumStringparser::Parserparser=string"hello"main::IO()main=print(parseparser"<test>""hello world")-- prints 'Right "hello"'

We define a Parser type to make the type signature of parser easier to read. If we wanted to alter this program, say to read either the string "hello" or the string "goodbye", we could use the operator <|>, provided by the Alternative typeclass, to combine two parsers into a single parser that tries either:

parser=string"hello"<|>string"goodbye"

Related Research Articles

Rebol is a cross-platform data exchange language and a multi-paradigm dynamic programming language designed by Carl Sassenrath for network communications and distributed computing. It introduces the concept of dialecting: small, optimized, domain-specific languages for code and data, which is also the most notable property of the language according to its designer Carl Sassenrath:

Although it can be used for programming, writing functions, and performing processes, its greatest strength is the ability to easily create domain-specific languages or dialects

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Programming languages can be grouped by the number and types of paradigms supported.

HaXml is a collection of utilities for parsing, filtering, transforming, and generating Extensible Markup Language (XML) documents using the programming language Haskell.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

String functions are used in computer programming languages to manipulate a string or query information about a string.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

A combinator library is a software library which implements combinatory logic as combinators, for a functional programming language: "the key idea is this: a combinator library offers functions that combine functions together to make bigger functions". These kinds of libraries are particularly useful for allowing domain-specific languages to be easily embedded into a general purpose language by defining a few primitive functions for the given domain and turning over the task of expanding higher-level constructs to the general language. An example would be the monadic Parsec parser for Haskell. The library approach allows the parsers to be first-class citizens of the language.

QuickCheck is a software library, a combinator library, originally written in the programming language Haskell, designed to assist in software testing by generating test cases for test suites – an approach known as property testing.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. It was created by Austin Appleby in 2008 and, as of 8 January 2016, is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants, all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

<span class="mw-page-title-main">Haskell Platform</span>

The Haskell Platform is a set of software packages, tools, and libraries that create a common platform for using and developing applications in the programming language Haskell. With the Haskell Platform, Haskell follows the same principle as Python: "Batteries included". Since 2022, the Haskell Platform has been deprecated.

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered several programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

<span class="mw-page-title-main">Yesod (web framework)</span>

Yesod is a web framework based on the programming language Haskell for productive development of type-safe, representational state transfer (REST) model based, high performance web applications, developed by Michael Snoyman, et al. It is free and open-source software released under an MIT License.

<span class="mw-page-title-main">Elm (programming language)</span> Functional programming language

Elm is a domain-specific programming language for declaratively creating web browser-based graphical user interfaces. Elm is purely functional, and is developed with emphasis on usability, performance, and robustness. It advertises "no runtime exceptions in practice", made possible by the Elm compiler's static type checking.

<span class="mw-page-title-main">F* (programming language)</span> Functional programming language inspired by ML and aimed at program verification

F* is a high-level, multi-paradigm, functional and object-oriented programming language inspired by the languages ML, Caml, and OCaml, and intended for program verification. It is a joint project of Microsoft Research, and the French Institute for Research in Computer Science and Automation (Inria). Its type system includes dependent types, monadic effects, and refinement types. This allows expressing precise specifications for programs, including functional correctness and security properties. The F* type-checker aims to prove that programs meet their specifications using a combination of satisfiability modulo theories (SMT) solving and manual proofs. For execution, programs written in F* can be translated to OCaml, F#, C, WebAssembly, or assembly language. Prior F* versions could also be translated to JavaScript.

<span class="mw-page-title-main">PureScript</span> Strongly-typed language that compiles to JavaScript

PureScript is a strongly-typed, purely-functional programming language that transpiles to JavaScript, C++11, Erlang, and Go. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron or via C++11 and Go compilers with suitable libraries. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, the PureScript language is defined as having a strict evaluation strategy, although there are non-conforming back-ends which implement a lazy evaluation strategy.

jq (programming language) Programming language for JSON

jq is a very high-level lexically scoped functional programming language in which every JSON value is a constant. jq supports backtracking and managing indefinitely long streams of JSON data. It is related to the Icon and Haskell programming languages. The language supports a namespace-based module system and has some support for closures. In particular, functions and functional expressions can be used as parameters of other functions.

References

  1. "parsec 2.0". Hackage. Retrieved 3 September 2019.
  2. "Releases". Github. Retrieved 22 September 2024.
  3. "Parsec on Haskell wiki". Haskell Wiki. Retrieved 29 May 2017.
  4. Leijen, Daan; Meijer, Erik (July 2001). "Parsec: Direct Style Monadic Parser Combinators For The Real World" (PDF). Microsoft Research. Retrieved 22 November 2014.
  5. "Parsec Erlang". BitBucket. Retrieved 23 November 2014.
  6. "Nimble Parsec". Github. Retrieved 18 December 2018.
  7. "Parsec OCaml" (PDF). The OCaml Summer Project. Retrieved 23 November 2014.
  8. "Megaparsack: Practical Parser Combinators".
  9. "XParsec by corsis". XParsec. Retrieved 29 May 2017.
  10. "FParsec". Quanttec. Retrieved 29 May 2017.
  11. "CSharp monad". Github. Retrieved 10 December 2014.
  12. "JParsec". Github. Retrieved 14 October 2016.
  13. "The Glasgow Haskell Compiler (AOSA Vol. 2)". The Architecture of Open Source Applications. Retrieved 23 November 2014.
  14. "megaparsec: Monadic parser combinators". Hackage. Retrieved 2018-09-10.
  15. "attoparsec: Fast combinator parsing for bytestrings and text". Hackage. Retrieved 2018-09-10.
  16. "Parsec". GitHub . 25 October 2021.