Paradigm | object-oriented |
---|---|
Designed by | Alessandro Warth Ian Piumarta |
Developer | Viewpoints Research Institute |
First appeared | 2007 |
Implementation language | Combined Object Lambda Architecture (COLA), Squeak Smalltalk |
Platform | IA-32, x86-64; Common Language Runtime |
OS | Linux, Windows |
Major implementations | |
MetaCOLA, OMeta/Squeak, OMeta/JS, OMeta#, IronMeta, Ohm |
OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 at the Viewpoints Research Institute. The language is based on parsing expression grammars (PEGs), rather than context-free grammars, with the intent to provide "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers". [1]
OMeta's main goal is to allow a broader audience to use techniques generally available only to language programmers, such as parsing. [1] It is also known for its use in quickly creating prototypes, though programs written in OMeta are noted to be generally less efficient than those written in vanilla (base language) implementations, such as JavaScript. [2] [3]
OMeta is noted for its use in creating domain-specific languages, and especially for the maintainability of its implementations (Newcome). OMeta, like other meta languages, requires a host language; it was originally created as a COLA implementation. [1]
OMeta is a meta-language used to prototype and create domain-specific languages. It was introduced as "an object-oriented language for pattern matching". [1] It uses parsing expression grammars (descriptions of languages "based on recognizing strings instead of generating them" [4] ) designed "to handle arbitrary kinds of data", such as characters, numbers, strings, atoms, and lists. This increases its versatility, enabling it to work on both structured and unstructured data. [1]
The language's main advantage over similar languages is its ability to use the same code for all steps of compiling, e.g., lexing and parsing. OMeta also supports the defining of production rules based on arguments; this can be used to add such rules to OMeta, and the host language that OMeta is running in. Also, these rules can use each other as arguments, creating "higher-order rules", and inheriting each other to gain production rules from existing code. OMeta is capable of using host-language booleans (True/False) while pattern matching; these are referred to as "semantic predicates". OMeta uses generalized pattern-matching to allow programmers to more easily implement and extend phases of compilation with a single tool. [1]
OMeta uses grammars to determine the rules in which it operates. The grammars are able to hold an indefinite number of variables due to the use of an __init__ function called when a grammar is created. Grammars can inherit and call each other (using the "foreign production invocation mechanism", enabling grammars to "borrow" each other's input streams), much like classes in full programming languages. [1] OMeta also prioritizes options within a given grammar to remove ambiguity, unlike most meta-languages. After pattern-matching an input to a given grammar, OMeta then assigns each component of the pattern to a variable, which it then feeds into the host language. [5]
OMeta uses pattern matching to perform all of the steps of traditional compiling by itself. It first finds patterns in characters to create tokens, then it matches those tokens to its grammar to make syntax trees. Typecheckers then match patterns on the syntax trees to make annotated trees, and visitors do the same to produce other trees. A code generator then pattern-matches the trees to produce the code. [3] In OMeta, it is easy to "traverse through the parse tree since such functionality is natively supported". [3]
The meta-language is noted for its usability in most programming languages, though it is most commonly used in its language of implementation—OMeta/JS, for example, is used in JavaScript. [5] Because it requires a host language, the creators of OMeta refer to it as a "parasitic language". [6]
Alessandro Warth and Ian Piumarta developed OMeta at the Viewpoints Research Institute, an organization intended to improve research systems and personal computing, in 2007. They first used a Combined Object Lambda Architecture (COLA), a self-describing language investigated at Viewpoints Research Institute, as OMeta's host language, and later, assisted by Yoshiki Ohshima, ported it to Squeak Smalltalk to verify its usability with multiple host languages. OMeta was also used "to implement a nearly complete subset of…JavaScript" as a case study in its introductory paper. [1]
OMeta, like other meta languages, is primarily used to create domain-specific languages (DSLs); specifically, it is used to quickly prototype DSLs — OMeta's slow running speed and unclear error reports remove much of its function as a full programming language (Heirbaut 73–74). OMeta is useful thanks to its ability to use one syntax for every phase of compiling, allowing it to be used rather than several separate tools to create a compiler. [5] Also, OMeta is valued both for the speed at which it can be used to create DSLs and the significantly lower amount of code it needs to perform such a task in contrast to vanilla implementations, with reports showing around 26% as many lines of functional code as vanilla. [2]
The following is an example of a basic calculator language in C# using OMeta:
ometaBasicCalc<:Parser{Digit=super:d->d.ToDigit(),Number=Number:nDigit:d->(n*10+d)|Digit,AddExpr=AddExpr:x‘+’MulExpr:y->(x+y)|AddExpr:x‘-’MulExpr:y->(x-y)|MulExpr,MulExpr=MulExpr:x‘*’primExpr:y->(x*y)|MulExpr:x‘/’primExpr:y->(x/y)|PrimExpr,PrimExpr=‘(‘Expr:x‘)’->x|Number,Expr=AddExpr}
It is also possible to create subclasses of languages you have written:
ometaExponentCalc<:BasicCalc{MulExpr=MulExpr:x‘^’PrimExpr:e->Math.pow(x,e)|super}
Previously written languages can also be called rather than inherited:
ometaScientificCalc<:Parser{MathFunc:n=Token(n)Spaces,AdvExp=MathFunc(‘sqrt’)AdvExp:x->Math.Sqrt(x)|FacExpFacExp=PrimExp:x‘!’->{varr=1;for(;x>1;x--){r*=x;}returnr;}|PrimExpPrimExp=foreign(ExponentCalc.Expr):x->xExpr=AdvExp}
OMeta can theoretically be implemented into any host language, but it is used most often as OMeta/JS, a JavaScript implementation. [5] Warth has stated that patterns in "OMeta/X---where X is some host language" are better left to be influenced by "X" than standardized within OMeta, due to the fact that different host languages recognize different types of objects. [6]
MetaCOLA was the first implementation of OMeta, used in the language's introductory paper. MetaCOLA implemented OMeta's first test codes, and was one of the three forms (the others being OMeta/Squeak and a nearly-finished OMeta/JS) of the language made prior to its release. [1]
OMeta/Squeak was a port of OMeta used during the initial demonstration of the system. OMeta/Squeak is used "to experiment with alternative syntaxes for the Squeak EToys system" OMeta/Squeak requires square brackets and "pointy brackets" (braces) in rule operations, unlike OMeta/JS, which requires only square brackets. [6] OMeta/Squeak 2, however, features syntax more similar to that of OMeta/JS. [7] Unlike the COLA implementation of OMeta, the Squeak version does not memorize intermediate results (store numbers already used in calculation). [1]
OMeta/JS is OMeta in the form of a JavaScript implementation. Language implementations using OMeta/JS are noted to be easier to use and more space-efficient than those written using only vanilla JavaScript, but the former have been shown to perform much more slowly. Because of this, OMeta/JS is seen as a highly useful tool for prototyping, but is not preferred for production language implementations. [3]
The use of DSL development tools, such as OMeta, are considered much more maintainable than "vanilla implementations" (i. e. JavaScript) due to their low NCLOC (Non-Comment Lines of Code) count. This is due in part to the "semantic action code which creates the AST objects or performs limited string operations". OMeta's lack of "context-free syntax" allows it to be used in both parser and lexer creation at the cost of extra lines of code. Additional factors indicating OMeta's maintainability include a high maintainability index "while Halstead Effort indicate[s] that the vanilla parser requires three times more development effort compared to the OMeta parser". Like JavaScript, OMeta/JS supports "the complete syntax notation of Waebric". [3]
One of the major advantages of OMeta responsible for the difference in NCLOC is OMeta's reuse of its "tree walking mechanism" by allowing the typechecker to inherit the mechanism from the parser, which causes the typechecker to adapt to changes in the OMeta parser, while JavaScript's tree walking mechanism contains more code and must be manually adapted to the changes in the parser. Another is the fact that OMeta's grammars have a "higher abstraction level...than the program code". It can also be considered "the result of the semantic action code which creates the AST objects or performs limited string operations", though the grammar's non-semantics create a need for relatively many lines of code per function because of explicit whitespace definition—a mechanism implemented to allow OMeta to act as a single tool for DSL creation. [3]
In terms of performance, OMeta is found to run at slow speeds in comparison to vanilla implementations. The use of backtracking techniques by OMeta is a potential major cause for this (OMeta's parser "includes seven look-ahead operators...These operators are necessary to distinguish certain rules from each other and cannot be left out of the grammar"); however, it is more likely that this performance drop is due to OMeta's method of memoization:
"The storage of intermediate parsing steps causes the size of the parsing table to be proportional with the number of terminals and non-terminals (operands) used in the grammar. Since the grammar of the OMeta parser contains 446 operands, it is believed that performance is affected negatively." [3]
Where OMeta gains time on the vanilla implementation, however, is in lexing. JavaScript's vanilla lexer slows down significantly due to a method by which the implementation converts the entire program into a string through Java before the lexer starts. Despite this, the OMeta implementation runs significantly slower overall. [3]
OMeta also falls behind in terms of error reporting. While vanilla implementations return the correct error message in about "92% of the test cases" in terms of error location, OMeta simply returns "Match failed!" to any given error. Finding the source through OMeta requires "manually...counting the newline characters in the semantic action code in order to output at least the line number at which parsing fails". [3]
OMeta# is a project by Jeff Moser meant to translate OMeta/JS into a C# function; as such, the design of OMeta# is based on Alessandro Warth's OMeta/JS design.. The goal of the project is to give users the ability to make working languages with high simplicity. Specifically, OMeta# is intended to work as a single tool for .NET language development, reduce the steep learning curve of language development, become a useful teaching resource, and be practical for use in real applications. [5] OMeta# currently uses C# 3.0 as OMeta's host language rather than 4.0; because C# 3.0 is a static language rather than a dynamic one, recognition of the host language within OMeta# is "two to three times uglier and larger than it might have been" in a dynamically typed language. [8]
OMeta# uses .NET classes, or Types, as grammars and methods for the grammars’ internal "rules". OMeta# uses braces ( { and } ) to recognize its host language in grammars. The language has a focus on strong, clean, static typing much like that of its host language, though this adds complexity to the creation of the language. New implementations in C# must also be compatible with the .NET meta-language, making the creation even more complex. Also, to prevent users from accidentally misusing the metarules in OMeta#, Moser has opted to implement them as "an explicit interface exposed via a property (e.g. instead of "_apply", I have "MetaRules.Apply")." Later parts of OMeta# are written in OMeta#, though the functions of the language remains fairly tied to C#. [9] The OMeta# source code is posted on Codeplex, and is intended to remain as an open-source project. However, updates have been on indefinite hiatus since shortly after the project's beginnings, with recommits by the server on October 1, 2012. [5]
Gordon Tisher created IronMeta for .NET in 2009, and while similar to OMeta#, it's a much more supported and robust implementation, distributed under BSD license on GitHub.
Ohm is a successor to Ometa that aims to improve on it by (amongst other things) separating the grammar from the semantic actions. [10]
A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.
Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learning Research Group (LRG) scientists, including Alan Kay, Dan Ingalls, Adele Goldberg, Ted Kaehler, Diana Merry, and Scott Wallace.
Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.
GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.
In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.
In computer science, conditionals are programming language constructs that perform different computations or actions or return different values depending on the value of a Boolean expression, called a condition.
Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.
A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.
In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.
In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.
META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as:
Each syntax equation is translated into a recursive subroutine which tests the input string for a particular phrase structure, and deletes it if found.
The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems. DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.
Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system.
Amber Smalltalk, formerly named Jtalk, is an implementation of the programming language Smalltalk-80, that runs on the JavaScript runtime of a web browser. It is designed to enable client-side development using Smalltalk. The programming environment in Amber is named Helios.
Tcl is a high-level, general-purpose, interpreted, dynamic programming language. It was designed with the goal of being very simple but powerful. Tcl casts everything into the mold of a command, even programming constructs like variable assignment and procedure definition. Tcl supports multiple programming paradigms, including object-oriented, imperative, functional, and procedural styles.
Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.