Domain-specific language

Last updated

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages (more generally, specification languages), and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

A computer language is a system of communication with a computer. Types of computer languages include these:

A domain is a field of study that defines a set of common requirements, terminology, and functionality for any software program constructed to solve a problem in the area of computer programming, known as domain engineering. The word domain is also taken as a synonym of application domain. It is also seen as a sphere of knowledge.

A general-purpose language is a computer language that is broadly applicable across application domains, and lacks specialized features for a particular domain. This is in contrast to a domain-specific language (DSL), which is specialized to a particular application domain. The line is not always sharp, as a language may have specialized features for a particular domain but be applicable more broadly, or conversely may in principle be capable of broad application but in practice used primarily for a specific domain.

Contents

The line between general-purpose languages and domain-specific languages is not always sharp, as a language may have specialized features for a particular domain but be applicable more broadly, or conversely may in principle be capable of broad application but in practice used primarily for a specific domain. For example, Perl was originally developed as a text-processing and glue language, for the same domain as AWK and shell scripts, but was mostly used as a general-purpose programming language later on. By contrast, PostScript is a Turing complete language, and in principle can be used for any task, but in practice is narrowly used as a page description language.

Perl interpreted programming language

Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages, Perl 5 and Perl 6.

AWK is a programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Shell script script written for the shell, or command line interpreter, of an operating system

A shell script is a computer program designed to be run by the Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup, logging, etc. is called a wrapper.

Use

The design and use of appropriate DSLs is a key part of domain engineering, by using a language suitable to the domain at hand – this may consist of using an existing DSL or GPL, or developing a new DSL. Language-oriented programming considers the creation of special-purpose languages for expressing problems as standard part of the problem-solving process. Creating a domain-specific language (with software to support it), rather than reusing an existing language, can be worthwhile if the language allows a particular type of problem or solution to be expressed more clearly than an existing language would allow and the type of problem in question reappears sufficiently often. Pragmatically, a DSL may be specialized to a particular problem domain, a particular problem representation technique, a particular solution technique, or other aspects of a domain.

Domain engineering, also called product line engineering, is the entire process of reusing domain knowledge in the production of new software systems. It is a key concept in systematic software reuse. A key idea in systematic software reuse is the domain. Most organizations work in only a few domains. They repeatedly build similar systems within a given domain with variations to meet different customer needs. Rather than building each new system variant from scratch, significant savings may be achieved by reusing portions of previous systems in the domain to build new ones.

Language-oriented programming (LOP) is a style of computer programming in which, rather than solving problems in general-purpose programming languages, the programmer creates one or more domain-specific languages for the problem first, and solves the problem in those languages. This concept is described in detail in the paper by Martin Ward entitled "Language Oriented Programming", published in Software - Concepts and Tools, Vol.15, No.4, pp 147-161, 1994.

Overview

A domain-specific language is created specifically to solve problems in a particular domain and is not intended to be able to solve problems outside it (although that may be technically possible). In contrast, general-purpose languages are created to solve problems in many domains. The domain can also be a business area. Some examples of business areas include:

A domain-specific language is somewhere between a tiny programming language and a scripting language, and is often used in a way analogous to a programming library. The boundaries between these concepts are quite blurry, much like the boundary between scripting languages and general-purpose languages.

A scripting or script language is a programming language that supports scripts — programs written for a special run-time environment that automates the execution of tasks that could alternatively be executed one-by-one by a human operator. Scripting languages are often interpreted. Primitives are usually the elementary tasks or API calls, and the language allows them to be combined into more complex programs. Environments that can be automated through scripting include software applications, web pages within a web browser, usage of the shells of operating systems (OS), embedded systems, as well as numerous games. A scripting language can be viewed as a domain-specific language for a particular environment; in the case of scripting an application, it is also known as an extension language. Scripting languages are also sometimes referred to as very high-level programming languages, as they operate at a high level of abstraction, or as control languages, particularly for job control languages on mainframes.

In design and implementation

Domain-specific languages are languages (or often, declared syntaxes or grammars) with very specific goals in design and implementation. A domain-specific language can be one of a visual diagramming language, such as those created by the Generic Eclipse Modeling System, programmatic abstractions, such as the Eclipse Modeling Framework, or textual languages. For instance, the command line utility grep has a regular expression syntax which matches patterns in lines of text. The sed utility defines a syntax for matching and replacing regular expressions. Often, these tiny languages can be used together inside a shell to perform more complex programming tasks.

Generic Eclipse Modeling System (GEMS) is a configurable toolkit for creating domain-specific modeling and program synthesis environments for Eclipse. The project aims to bridge the gap between the communities experienced with visual metamodeling tools like those built around the Eclipse modeling technologies, such as the Eclipse Modeling Framework (EMF) and Graphical Modeling Framework (GMF). GEMS helps developers rapidly create a graphical modeling tool from a visual language description or metamodel without any coding in third-generation languages. Graphical modeling tools created with GEMS automatically support complex capabilities, such as remote updating and querying, template creation, styling with Cascading Style Sheets (CSS), and model linking.

Eclipse Modeling Framework modeling framework and code generation facility for building tools and other applications based on a structured data model

Eclipse Modeling Framework (EMF) is an Eclipse-based modeling framework and code generation facility for building tools and other applications based on a structured data model.

grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p, which has the same effect: doing a global search with the regular expression and printing all matching lines. Grep was originally developed for the Unix operating system, but later available for all Unix-like systems.

The line between domain-specific languages and scripting languages is somewhat blurred, but domain-specific languages often lack low-level functions for filesystem access, interprocess control, and other functions that characterize full-featured programming languages, scripting or otherwise. Many domain-specific languages do not compile to byte-code or executable code, but to various kinds of media objects: GraphViz exports to PostScript, GIF, JPEG, etc., where Csound compiles to audio files, and a ray-tracing domain-specific language like POV compiles to graphics files. A computer language like SQL presents an interesting case: it can be deemed a domain-specific language because it is specific to a specific domain (in SQL's case, accessing and managing relational databases), and is often called from another application, but SQL has more keywords and functions than many scripting languages, and is often thought of as a language in its own right, perhaps because of the prevalence of database manipulation in programming and the amount of mastery required to be an expert in the language.

PostScript (PS) is a page description language in the electronic publishing and desktop publishing business. It is a dynamically typed, concatenative programming language and was created at Adobe Systems by John Warnock, Charles Geschke, Doug Brotz, Ed Taft and Bill Paxton from 1982 to 1984.

GIF bitmap image file format family

The Graphics Interchange Format, is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite on June 15, 1987. It has since come into widespread usage on the World Wide Web due to its wide support and portability.

JPEG Lossy compression method for reducing the size of digital images

JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality.

Further blurring this line, many domain-specific languages have exposed APIs, and can be accessed from other programming languages without breaking the flow of execution or calling a separate process, and can thus operate as programming libraries.

Programming tools

Some domain-specific languages expand over time to include full-featured programming tools, which further complicates the question of whether a language is domain-specific or not. A good example is the functional language XSLT, specifically designed for transforming one XML graph into another, which has been extended since its inception to allow (particularly in its 2.0 version) for various forms of filesystem interaction, string and date manipulation, and data typing.

In model-driven engineering, many examples of domain-specific languages may be found like OCL, a language for decorating models with assertions or QVT, a domain-specific transformation language. However, languages like UML are typically general-purpose modeling languages.

To summarize, an analogy might be useful: a Very Little Language is like a knife, which can be used in thousands of different ways, from cutting food to cutting down trees. A domain-specific language is like an electric drill: it is a powerful tool with a wide variety of uses, but a specific context, namely, putting holes in things. A General Purpose Language is a complete workbench, with a variety of tools intended for performing a variety of tasks. Domain-specific languages should be used by programmers who, looking at their current workbench, realize they need a better drill and find that a particular domain-specific language provides exactly that.

Domain-specific language topics

Usage patterns

There are several usage patterns for domain-specific languages: [1] [2]

Many domain-specific languages can be used in more than one way.[ citation needed ] DSL code embedded in a host language may have special syntax support, such as regexes in sed, AWK, Perl or JavaScript, or may be passed as strings.

Design goals

Adopting a domain-specific language approach to software engineering involves both risks and opportunities. The well-designed domain-specific language manages to find the proper balance between these.

Domain-specific languages have important design goals that contrast with those of general-purpose languages:

Idioms

In programming, idioms are methods imposed by programmers to handle common development tasks, e.g.:

General purpose programming languages rarely support such idioms, but domain-specific languages can describe them, e.g.:

Examples

Examples of domain-specific languages include HTML, Logo for pencil-like drawing, Verilog and VHDL hardware description languages, MATLAB and GNU Octave for matrix programming, Mathematica, Maple and Maxima for symbolic mathematics, Specification and Description Language for reactive and distributed systems, spreadsheet formulas and macros, SQL for relational database queries, YACC grammars for creating parsers, regular expressions for specifying lexers, the Generic Eclipse Modeling System for creating diagramming languages, Csound for sound and music synthesis, and the input languages of GraphViz and GrGen, software packages used for graph layout and graph rewriting.

GameMaker Language

The GML scripting language used by GameMaker Studio is a domain-specific language targeted at novice programmers to easily be able to learn programming. While the language serves as a blend of multiple languages including Delphi, C++, and BASIC, there is a lack of structures, data types, and other features of a full-fledged programming language. Many of the built-in functions are sandboxed for the purpose of easy portability. The language primarily serves to make it easy for anyone to pick up the language and develop a game.

Unix shell scripts

Unix shell scripts give a good example of a domain-specific language for data [3] organization. They can manipulate data in files or user input in many different ways. Domain abstractions and notations include streams (such as stdin and stdout) and operations on streams (such as redirection and pipe). These abstractions combine to make a robust language to describe the flow and organization of data.

The language consists of a simple interface (a script) for running and controlling processes that perform small tasks. These tasks represent the idioms of organizing data into a desired format such as tables, graphs, charts, etc.

These tasks consist of simple control-flow and string manipulation mechanisms that cover a lot of common usages like searching and replacing string in files, or counting occurrences of strings (frequency counting).

Even though Unix scripting languages are Turing complete, they differ from general purpose languages.[ clarification needed ]

In practice, scripting languages are used to weave together small Unix tools such as grep, ls, sort or wc.

ColdFusion Markup Language

ColdFusion's associated scripting language is another example of a domain-specific language for data-driven websites. This scripting language is used to weave together languages and services such as Java, .NET, C++, SMS, email, email servers, http, ftp, exchange, directory services, and file systems for use in websites.

The ColdFusion Markup Language (CFML) includes a set of tags that can be used in ColdFusion pages to interact with data sources, manipulate data, and display output. CFML tag syntax is similar to HTML element syntax.

Erlang OTP

The Erlang Open Telecom Platform was originally designed for use inside Ericsson as a domain-specific language. The language itself offers a platform of libraries to create finite state machines, generic servers and event managers that quickly allow an engineer to deploy applications, or support libraries, that have been shown in industry benchmarks to outperform other languages intended for a mixed set of domains, such as C and C++. The language is now officially open source and can be downloaded from their website.

FilterMeister

FilterMeister is a programming environment, with a programming language that is based on C, for the specific purpose of creating Photoshop-compatible image processing filter plug-ins; FilterMeister runs as a Photoshop plug-in itself and it can load and execute scripts or compile and export them as independent plug-ins. Although the FilterMeister language reproduces a significant portion of the C language and function library, it contains only those features which can be used within the context of Photoshop plug-ins and adds a number of specific features only useful in this specific domain.

MediaWiki templates

The Template feature of MediaWiki is an embedded domain-specific language whose fundamental purpose is to support the creation of page templates and the transclusion (inclusion by reference) of MediaWiki pages into other MediaWiki pages.

Software engineering uses

There has been much interest in domain-specific languages to improve the productivity and quality of software engineering. Domain-specific language could possibly provide a robust set of tools for efficient software engineering. Such tools are beginning to make their way into the development of critical software systems.

The Software Cost Reduction Toolkit [4] is an example of this. The toolkit is a suite of utilities including a specification editor to create a requirements specification, a dependency graph browser to display variable dependencies, a consistency checker to catch missing cases in well-formed formulas in the specification, a model checker and a theorem prover to check program properties against the specification, and an invariant generator that automatically constructs invariants based on the requirements.

A newer development is language-oriented programming, an integrated software engineering methodology based mainly on creating, optimizing, and using domain-specific languages.

Metacompilers

Complementing language-oriented programming, as well as all other forms of domain-specific languages, are the class of compiler writing tools called metacompilers. A metacompiler is not only useful for generating parsers and code generators for domain-specific languages, but a metacompiler itself compiles a domain-specific metalanguage specifically designed for the domain of metaprogramming.

Besides parsing domain-specific languages, metacompilers are useful for generating a wide range of software engineering and analysis tools. The meta-compiler methodology is often found in program transformation systems.

Metacompilers that played a significant role in both computer science and the computer industry include Meta-II [5] and its descendent TreeMeta. [6]

Unreal Engine before version 4 and other games

Unreal and Unreal Tournament unveiled a language called UnrealScript. This allowed for rapid development of modifications compared to the competitor Quake (using the Id Tech 2 engine). The Id Tech engine used standard C code meaning C had to be learned and properly applied, while UnrealScript was optimized for ease of use and efficiency. Similarly, the development of more recent games introduced their own specific languages, one more common example is Lua for scripting.

Rules Engines for Policy Automation

Various Business Rules Engines have been developed for automating policy and business rules used in both government and private industry. ILOG, Oracle Policy Automation, DTRules, Drools and others provide support for DSLs aimed to support various problem domains. DTRules goes so far as to define an interface for the use of multiple DSLs within a Rule Set.

The purpose of Business Rules Engines is to define a representation of business logic in as human-readable fashion as possible. This allows both subject matter experts and developers to work with and understand the same representation of the business logic. Most Rules Engines provide both an approach to simplifying the control structures for business logic (for example, using Declarative Rules or Decision Tables) coupled with alternatives to programming syntax in favor of DSLs.

Statistical modelling languages

Statistical modelers have developed domain-specific languages such as Bugs, Jags, and Stan. These languages provide a syntax for describing a Bayesian model and generate a method for solving it using simulation.

Generate model and services to multiple programming Languages

Generate object handling and services based on a Interface Description Language for a domain-specific language such as JavaScript for web applications, HTML for documentation, C++ for high-performance code, etc. This is done by cross-language frameworks such as Apache Thrift or Google Protocol Buffers.

Gherkin

Gherkin is a language designed to define test cases to check the behavior of software, without specifying how that behavior is implemented. It is meant to be read and used by non-technical users using a natural language syntax and a line-oriented design. The tests defined with Gherkin must then be implemented in a general programming language. Then, the steps in a Gherkin program acts as a syntax for method invocation accessible to non-developers.

Other examples

Other prominent examples of domain-specific languages include:

Advantages and disadvantages

Some of the advantages: [1] [2]

Some of the disadvantages:

Tools for designing domain-specific languages

See also

Related Research Articles

A compiler is a computer program that transforms computer code written in one programming language into another programming language. Compilers are a type of translator that support digital devices, primarily computers. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

In computer science, Backus–Naur form or Backus normal form (BNF) is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The input may be a text file containing the grammar written in BNF or EBNF that defines the syntax of a programming language, and whose generated output is some source code of the parser for the programming language, although other definitions exist. Usually, the resulting source code will have to be extended upon before a complete compiler emerges.

Abstract syntax tree tree representation of the abstract syntactic structure of source code

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is "abstract" in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural, content-related details. For instance, grouping parentheses are implicit in the tree structure, and a syntactic construct like an if-condition-then expression may be denoted by means of a single node with three branches.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the structure.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined together to accomplish a task, much as one might use multiple hand tools to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, and are often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. It also allows programs greater flexibility to efficiently handle new situations without recompilation.

Visual programming language

In computing, a visual programming language (VPL) is any programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used either as elements of syntax or secondary notation. For example, many VPLs are based on the idea of "boxes and arrows", where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations.

Metamodeling

A metamodel or surrogate model is a model of a model, and metamodeling is the process of generating such metamodels. Thus metamodeling or meta-modeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems. As its name implies, this concept applies the notions of meta- and modeling in software engineering and systems engineering. Metamodels are of many types and have diverse applications.

A UML tool or UML modeling tool is a software application that supports some or all of the notation and semantics associated with the Unified Modeling Language (UML), which is the industry standard general-purpose modeling language for software engineering.

Domain-specific modeling (DSM) is a software engineering methodology for designing and developing systems, such as computer software. It involves systematic use of a domain-specific language to represent the various facets of a system.

The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

JetBrains MPS

JetBrains MPS is a metaprogramming system which is being developed by JetBrains. MPS is a tool to design domain-specific languages (DSL). It uses projectional editing which allows users to overcome the limits of language parsers, and build DSL editors, such as ones with tables and diagrams.
It implements language-oriented programming. MPS is an environment for language definition, a language workbench, and integrated development environment (IDE) for such languages.

Moose (analysis)

Moose is a free and open source platform for software and data analysis built in Pharo.

Xtext is an open-source software framework for developing programming languages and domain-specific languages (DSLs). Unlike standard parser generators, Xtext generates not only a parser, but also a class model for the abstract syntax tree, as well as providing a fully featured, customizable Eclipse-based IDE.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing “a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers”.

A language workbench is a software development tool designed to define, reuse and compose domain-specific languages together with their integrated development environment. Language workbenches support language-oriented programming. Language workbenches were introduced and popularized by Martin Fowler in 2005.

References

  1. 1 2 Marjan Mernik, Jan Heering, and Anthony M. Sloane. When and how to develop domain-specific languages. ACM Computing Surveys, 37(4):316–344, 2005.doi : 10.1145/1118890.1118892
  2. 1 2 Diomidis Spinellis. Notable design patterns for domain specific languages. Journal of Systems and Software, 56(1):91–99, February 2001. doi : 10.1016/S0164-1212(00)00089-3
  3. "Data definition by The Linux Information Project (LINFO)". www.linfo.org. Retrieved 2016-01-14.
  4. "Archived copy" (PDF). Archived from the original (PDF) on 2004-07-19. Retrieved 2004-05-20.CS1 maint: Archived copy as title (link)
  5. Shorre, D.V., META II a syntax-oriented compiler writing language, Proceedings of the 1964 19th ACM National Conference, pp. 41.301–41.3011, 1964
  6. C. Stephen Carr, David A. Luther, Sherian Erdmann, 'The TREE-META Compiler-Compiler System: A Meta Compiler System for the Univac 1108 and General Electric 645', University of Utah Technical Report RADC-TR-69-83.
  7. 1 2 Freudenthal, Margus (1 January 2009). "Domain Specific Languages in a Customs Information System". IEEE Software: 1. doi:10.1109/MS.2009.152.
  8. Aram, Michael; Neumann, Gustaf (2015-07-01). "Multilayered analysis of co-development of business information systems" (PDF). Journal of Internet Services and Applications. 6 (1). doi:10.1186/s13174-015-0030-8.
  9. Miotto, Eric. "On the integration of domain-specific and scientific bodies of knowledge in Model Driven Engineering" (PDF).
  10. "JetBrains MPS: Domain-Specific Language Creator".
  11. "Xtext".
  12. Tobin-Hochstadt, S.; St-Amour, V.; Culpepper, R.; Flatt, M.; Felleisen, M. (2011). "Languages as Libraries" (PDF). Programming Language Design and Implementation.
  13. Flatt, Matthew (2012). "Creating Languages in Racket". Communications of the ACM. Retrieved 2012-04-08.

Further reading

Articles