Data-driven programming

Last updated July 30, 2024

In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken.^[1] Standard examples of data-driven languages are the text-processing languages sed and AWK,^[1] and the document transformation language XSLT, where the data is a sequence of lines in an input stream – these are thus also known as line-oriented languages – and pattern matching is primarily done via regular expressions or line numbers.

Related paradigms

Data-driven programming is similar to event-driven programming, in that both are structured as pattern matching and resulting processing, and are usually implemented by a main loop, though they are typically applied to different domains. The condition/action model is also similar to aspect-oriented programming, where when a join point (condition) is reached, a pointcut (action) is executed. A similar paradigm is used in some tracing frameworks such as DTrace, where one lists probes (instrumentation points) and associated actions, which execute when the condition is satisfied.

Adapting abstract data type design methods to object-oriented programming results in a data-driven design.^[2] This type of design is sometimes used in object-oriented programming to define classes during the conception of a piece of software.

Applications

Data-driven programming is typically applied to streams of structured data, for filtering, transforming, aggregating (such as computing statistics), or calling other programs. Typical streams include log files, delimiter-separated values, or email messages, notably for email filtering. For example, an AWK program may take as input a stream of log statements, and for example send all to the console, write ones starting with WARNING to a "WARNING" file, and send an email to a sysadmin in case any line starts with "ERROR". It could also record how many warnings are logged per day. Alternatively, one can process streams of delimiter-separated values, processing each line or aggregated lines, such as the sum or max. In email, a language like procmail can specify conditions to match on some emails, and what actions to take (deliver, bounce, discard, forward, etc.).

Some data-driven languages are Turing-complete, such as AWK and even sed, while others are intentionally very limited, notably for filtering. An extreme example of the latter is pcap, which only consists of filtering, with the only action being "capture". Less extremely, sieve has filters and actions, but in the base standard has no variables or loops, only allowing stateless filtering statements: each input element is processed independently. Variables allow state, which allow operations that depend on more than one input element, such as aggregation (summing inputs) or throttling (allow at most 5 mails per hour from each sender, or limiting repeated log messages).

Data-driven languages frequently have a default action: if no condition matches, line-oriented languages may print the line (as in sed), or deliver a message (as in sieve). In some applications, such as filtering, matching is may be done exclusively (so only first matching statement), while in other cases all matching statements are applied. In either case, failure to match any pattern may be "default behavior" or can be seen as an error, to be caught by a catch-all statement at the end.

Benefits and issues

While the benefits and issues may vary between implementation, there are a few big potential benefits of and problems with this paradigm. Functionality simply requires that it knows the abstract data type of the variables it is working with. Functions and interfaces can be used on all objects with the same data fields, for instance the object's "position". Data can be grouped into objects or "entities" according to preference with little to no consequence.

While data-driven design does prevent coupling of data and functionality, in some cases, data-driven programming has been argued to lead to bad object-oriented design, especially when dealing with more abstract data. This is because a purely data-driven object or entity is defined by the way it is represented. Any attempt to change the structure of the object would immediately break the functions that rely on it.

As an example, one might represent driving directions as a series of intersections (two intersecting streets) where the driver must turn right or left. If an intersection (in the United States) is represented in data by the zip code (5-digit number) and two street names (strings of text), bugs may appear when a city where streets intersect multiple times is encountered. While this example may be oversimplified, restructuring of data is a fairly common problem in software engineering, either to eliminate bugs, increase efficiency, or support new features.

Languages

AWK ^[1]
BASIC
Clojure ^[3]
fdm
Lua ^[4]
maildrop
Oz
Perl – data-driven programming as in AWK and sed is one paradigm supported by Perl
procmail
Raku - Raku has grammars (and regexes) built in, and so supports data-driven programming
REBOL, a Redbol language
Red, a Redbol language
Ren-C, a Redbol language
sed
Sieve
Tab (language)
XSLT

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

This is a "genealogy" of programming languages. Languages are categorized under the ancestor language with the strongest influence. Those ancestor languages are listed in alphabetic order. Any such categorization has a large arbitrary element, since programming languages often incorporate major ideas from multiple sources.

Oz is a multiparadigm programming language, developed in the Programming Systems Lab at Université catholique de Louvain, for programming-language education. It has a canonical textbook: Concepts, Techniques, and Models of Computer Programming.

procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats.

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly applying anti-spam techniques. Filtering can be applied to incoming emails as well as to outgoing ones.

A filter is a computer program or subroutine to process a stream, producing another stream. While a single filter can be used individually, they are frequently strung together to form a pipeline.

Sieve is a programming language that can be used for email filtering. It owes its creation to the CMU Cyrus Project, creators of Cyrus IMAP server.

<span class="mw-page-title-main">Stream (computing)</span> Sequence of data items available over time

In computer science, a stream is a sequence of potentially unlimited data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches. Streams are processed differently from batch data.

Event-driven architecture (EDA) is a software architecture paradigm concerning the production and detection of events.

<span class="mw-page-title-main">Scripting language</span> Programming language designed for scripting

In computing, a script is a relatively short and simple set of instructions that typically automate an otherwise manual process. The act of writing a script is called scripting. Scripting language or script language describes a programming language that it is used for scripting.

<span class="mw-page-title-main">Object-oriented programming</span> Programming paradigm based on the concept of objects

Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields, and code in the form of procedures. In OOP, computer programs are designed by making them out of objects that interact with one another.

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

The following outline is provided as an overview of and topical guide to the Perl programming language:

The composition filters model denotes a modular extension to the conventional object model. It provides a solution for a wide range of problems in the construction of large and complex applications. Most notably, one implementation of composition filters provides an abstraction layer for message-passing systems.

References

1 2 3 Stutz, Michael (September 19, 2006). "Get started with GAWK: AWK language fundamentals". developerWorks. IBM. Archived from the original on 20 May 2011. Retrieved 2010-10-23. [AWK is] often called a data-driven language -- the program statements describe the input data to match and process rather than a sequence of program steps
↑ Wirfs-Brock, Rebecca; Wilkerson, Brian (1989). "Object-oriented design: A responsibility-driven approach". Conference proceedings on Object-oriented programming systems, languages and applications - OOPSLA '89. New York: ACM. pp. 71–75. doi:10.1145/74877.74885. ISBN 0897913337. S2CID 7372657.
↑ "Clojure". www.clojure.org. Retrieved 2018-06-05.
↑ Ierusalimschy, Roberto; de Figueiredo, Luiz Henrique; Celes, Waldemar (2017-02-03). "Lua 5.3 Reference Manual". www.lua.org. Retrieved 2018-06-05.

External links

"The important part is moving program logic away from hardwired control structures and into data."

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[awk-1] 1 2 3 Stutz, Michael (September 19, 2006). "Get started with GAWK: AWK language fundamentals". developerWorks. IBM. Archived from the original on 20 May 2011. Retrieved 2010-10-23. [AWK is] often called a data-driven language -- the program statements describe the input data to match and process rather than a sequence of program steps

[responsibility-2] Wirfs-Brock, Rebecca; Wilkerson, Brian (1989). "Object-oriented design: A responsibility-driven approach". Conference proceedings on Object-oriented programming systems, languages and applications - OOPSLA '89. New York: ACM. pp. 71–75. doi:10.1145/74877.74885. ISBN 0897913337. S2CID 7372657.

[3] "Clojure". www.clojure.org. Retrieved 2018-06-05.

[4] Ierusalimschy, Roberto; de Figueiredo, Luiz Henrique; Celes, Waldemar (2017-02-03). "Lua 5.3 Reference Manual". www.lua.org. Retrieved 2018-06-05.

[1]

[2]

[3]

[4]

v t e Types of programming languages
Level	Machine Assembly Compiled Interpreted Low-level High-level Very high-level Esoteric
Generation	First Second Third Fourth Fifth