DMS Software Reengineering Toolkit

Last updated
DMS Software Reengineering Toolkit
Developer(s) Semantic Designs
License proprietary
Website www.semanticdesigns.com/Products/DMS/DMSToolkit.html

The DMS Software Reengineering Toolkit [1] is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems. [2] DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.

Contents

Usage

DMS has been used to implement domain-specific languages (such as code generation for factory control), test coverage [3] and profiling tools, clone detection, [4] language migration tools, C++ component reengineering., [5] and for research into difficult topics such as refactoring C++ reliably. [6]

Features

The toolkit provides means for defining language grammars and will produce parsers which automatically construct abstract syntax trees (ASTs), and prettyprinters to convert original or modified ASTs back into compilable source text. The parse trees capture, and the prettyprinters regenerate, complete detail about the original source program, including source position, comments, radix and format of numbers, etc., to ensure that regenerated source text is as recognizable to a programmer as the original text modulo any applied transformations.

DMS uses GLR parsing technology with semantic predicates. This enables it to handle all context-free grammars as well as most non-context-free language syntaxes, such as Fortran, which requires matching of multiple DO loops with shared CONTINUE statements by label to produce ASTs for correctly nested loops as it parses. DMS has a variety of predefined language front ends, covering most real dialects of C and C++ including C++0x, C#, Java, Python, PHP, EGL, Fortran, COBOL, Visual Basic, Verilog, VHDL and some 20 or more other languages. DMS can handle ASCII, ISO-8859, UTF-8, UTF-16, EBCDIC, Shift-JIS and a variety of Microsoft character encodings.

DMS provides attribute grammar evaluators for computing custom analyses over ASTs, such as metrics, and includes support for symbol table construction. Other program facts can be extracted by built-in control- and data- flow analysis engines, local and global pointer analysis, whole-program call graph extraction, and symbolic range analysis by abstract interpretation.

DMS is implemented in a parallel programming language, PARLANSE, which allows using symmetric multiprocessing to speed up large analyses and conversions. [7]

Rewriting

Changes to ASTs can be accomplished by both procedural methods coded in PARLANSE and source-to-source tree transformations coded as rewrite rules using surface-syntax conditioned by any extracted program facts, using DMS's Rule Specification Language (RSL). The rewrite rule engine supporting RSL handles associative and commutative rules. A rewrite rule for C to replace a complex condition by the ?: operator be written as:

   rule simplify_conditional_assignment(v:left_hand_side,e1:expression,e2:expression,e3:expression)         :statement->statement    =  " if (\e1) \v=\e2; else \v=e3; "     -> " \v=\e1?\e2:\e3; "    if no_side_effects(v); 

Rewrite rules have names, e.g. simplify_conditional_assignment. Each rule has a "match this" and "replace by that" pattern pair separated by ->, in our example, on separate lines for readability. The patterns must correspond to language syntax categories; in this case, both patterns must be of syntax category statement also separated in sympathy with the patterns by ->. Target language (e.g., C) surface syntax is coded inside meta-quotes ", to separate rewrite-rule syntax from that of the target language. Backslashes inside meta-quotes represent domain escapes, to indicate pattern meta variables (e.g., \v, \e1, \e2) that match any language construct corresponding to the metavariable declaration in the signature line, e.g., e1 must be of syntactic category: (any) expression. If a metavariable is mentioned multiple times in the match pattern, it must match to identical subtrees; the same identically shaped v must occur in both assignments in the match pattern in this example. Metavariables in the replace pattern are replaced by the corresponding matches from the left side. A conditional clause if provides an additional condition that must be met for the rule to apply, e.g., that the matched metavariable v, being an arbitrary left-hand side, must not have a side effect (e.g., cannot be of the form of a[i++]; the no_side_effects predicate is defined by an analyzer built with other DMS mechanisms).

Achieving a complex transformation on code is accomplished by providing a number of rules that cooperate to achieve the desired effect. The ruleset is focused on portions of the program by metaprograms coded in PARLANSE.

A complete example of a language definition and source-to-source transformation rules defined and applied is shown using high school algebra and a bit of calculus as a domain-specific language.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

Abstract syntax tree Tree representation of the abstract syntactic structure of source code

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text.

Syntax highlighting Tool of editors for programming, scripting, and markup

Syntax highlighting is a feature of text editors that are used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms. This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. This feature is also employed in many programming related contexts, either in the form of colorful books or online websites to make understanding code snippets easier for readers. Highlighting does not affect the meaning of the text itself; it is intended only for human readers.

In computer programming, M-expressions were an early proposed syntax for the Lisp programming language, inspired by contemporary languages such as Fortran and ALGOL. The notation was never implemented into the language and, as such, it was never finalized.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree, which is used to express the syntactic structure of an expression or program.

A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs that semantically differ from the original in predictable ways.

Syntax (programming languages) Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as:

Each syntax equation is translated into a recursive subroutine which tests the input string for a particular phrase structure, and deletes it if found.

TXL is a special-purpose programming language originally designed by Charles Halpern-Hamu and James Cordy at the University of Toronto in 1985. The acronym "TXL" originally stood for "Turing eXtender Language" after the language's original purpose, the specification and rapid prototyping of variants and extensions of the Turing programming language, but no longer has any meaningful interpretation.

The ROSE compiler framework, developed at Lawrence Livermore National Laboratory (LLNL), is an open-source software compiler infrastructure to generate source-to-source analyzers and translators for multiple source languages including C, C++, Fortran, OpenMP, Java, Python, and PHP.

Rascal is an experimental domain specific language for metaprogramming, such as static code analysis, program transformation, program generation and implementation of domain specific languages. It is a general meta language in the sense that it does not have a bias for any particular software language. It includes primitives from relational calculus and term rewriting. Its syntax and semantics are based on procedural (imperative) and functional programming.

The Sweble Wikitext parser is an open-source tool to parse the Wikitext markup language used by MediaWiki, the software behind Wikipedia. The initial development was done by Hannes Dohrn as a Ph.D. thesis project at the Open Source Research Group of professor Dirk Riehle at the University of Erlangen-Nuremberg from 2009 until 2011. The results were presented to the public for the first time at the WikiSym conference in 2011. Before that, the dissertation was inspected and approved by an independent scientific peer-review and was published at ACM Press.

srcML is a document-oriented XML representation of source code. It was created in a collaborative effort between Michael L. Collard and Jonathan I. Maletic. The abbreviation, srcML, is short for Source Markup Language. srcML wraps source code (text) with information from the Abstract Syntax Tree or AST (tags) into a single XML document. All original text is preserved so that the original source code document can be recreated from the srcML markup. The only exception is the possibility of newline normalization.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

Aum Programming Language is a multi-paradigm programming language which has been an internal research project at IT Adapter since 2004. It is somewhat similar to modern C# in its feature set adding aspect-oriented-programming, message passing and pattern matching .

References

  1. DMS: Program Transformations for Practical Scalable Software Evolution. Proceedings International Conference on Software Engineering 2004 Reprint
  2. Design Maintenance Systems. Communications of the ACM 1992 Reprint
  3. Branch Coverage for Arbitrary Languages Made Easy
  4. "Clone Detection Using Abstract Syntax Trees. Proceedings International Conference on Software Maintenance 1998". doi:10.1109/ICSM.1998.738528. S2CID   12834606. Archived from the original on 2012-10-10. Retrieved 2010-11-06.{{cite journal}}: Cite journal requires |journal= (help)
  5. Akers, Robert L.; Baxter, Ira D.; Mehlich, Michael; Ellis, Brian J.; Luecke, Kenn R. (2007). "Case study: Re-engineering C++ component models via automatic program transformation". Information and Software Technology. 49 (3): 275–291. doi:10.1016/j.infsof.2006.10.012. S2CID   13219993.
  6. Small Business Innovation Research (DoE): Refactor++
  7. "Semantic Designs: PARLANSE Parallel Programming Language for Windows Pentium/80x86". www.semanticdesigns.com.