Expression problem

Last updated

The expression problem is a challenging problem in programming languages that concerns the extensibility and modularity of statically typed data abstractions. The goal is to define a data abstraction that is extensible both in its representations and its behaviors, where one can add new representations and new behaviors to the data abstraction, without recompiling existing code, and while retaining static type safety (e.g., no casts). The statement of the problem exposes deficiencies in programming paradigms and programming languages, and as of 2023 is still considered unsolved,[ citation needed ] although there are many proposed solutions.[ citation needed ]

Contents

History

Philip Wadler formulated the challenge and named it "The Expression Problem" [1] in response to a discussion with Rice University's Programming Languages Team (PLT). He also cited three sources that defined the context for his challenge:

The problem was first observed by John Reynolds in 1975. [2] Reynolds discussed two forms of Data Abstraction: User-defined Types, which are now known as Abstract Data Types (ADTs) (not to be confused with Algebraic Data Types), and Procedural Data Structures, which are now understood as a primitive form of Objects with only one method. He argued that they are complementary, in that User-defined Types could be extended with new behaviors, and Procedural Data Structures could be extended with new representations. He also discussed related work going back to 1967. Fifteen years later in 1990, William Cook [3] applied Reynold's idea in the context of Objects and Abstract Data Types, which had both grown extensively. Cook identified the matrix of representations and behaviors that are implicit in a Data Abstraction, and discussed how ADTs are based on the behavioral axis, while Objects are based on the representation axis. He provides extensive discussion of work on ADTs and Objects that are relevant to the problem. He also reviewed implementations in both styles, discussed extensibility in both directions, and also identified the importance of static typing. Most importantly, he discussed situations in which there was more flexibility than Reynolds considered, including internalization and optimization of methods.

At ECOOP '98, Shriram Krishnamurthi et al. [4] presented a design pattern solution to the problem of simultaneously extending an expression-oriented programming language and its tool-set. They dubbed it the "expressivity problem" because they thought programming language designers could use the problem to demonstrate the expressive power of their creations. For PLT, the problem had shown up in the construction of DrScheme, now DrRacket, and they solved it [5] via a rediscovery of mixins. [6] [7] To avoid using a programming language problem in a paper about programming languages, Krishnamurthi et al. used an old geometry programming problem to explain their pattern-oriented solution. In conversations with Felleisen and Krishnamurthi after the ECOOP presentation, Wadler understood the PL-centric nature of the problem and he pointed out that Krishnamurthi's solution used a cast to circumvent Java's type system. The discussion continued on the types mailing list, where Corky Cartwright (Rice) and Kim Bruce (Williams) showed how type systems for OO languages might eliminate this cast. In response Wadler formulated his essay and stated the challenge, "whether a language can solve the expression problem is a salient indicator of its capacity for expression." The label "expression problem" puns on expression = "how much can your language express" and expression = "the terms you are trying to represent are language expressions".

Others co-discovered variants of the expression problem around the same time as Rice University's PLT, in particular Thomas Kühne [8] in his dissertation, and Smaragdakis and Batory [9] in a parallel ECOOP 98 article.

Some follow-up work used the expression problem to showcase the power of programming language designs. [10] [11]

The expression problem is also a fundamental problem in multi-dimensional Software Product Line design and in particular as an application or special case of FOSD Program Cubes.[ citation needed ]

Solutions

There are various solutions to the expression problem. Each solution varies in the amount of code a user must write to implement them, and the language features they require.

Example

Problem description

We can imagine we do not have the source code for the following library, written in C#, which we wish to extend:

publicinterfaceIEvalExp{intEval();}publicclassLit:IEvalExp{publicLit(intn){N=n;}publicintN{get;}publicintEval(){returnN;}}publicclassAdd:IEvalExp{publicAdd(IEvalExpleft,IEvalExpright){Left=left;Right=right;}publicIEvalExpLeft{get;}publicIEvalExpRight{get;}publicintEval(){returnLeft.Eval()+Right.Eval();}}publicstaticclassExampleOne{publicstaticIEvalExpAddOneAndTwo()=>newAdd(newLit(1),newLit(2));publicstaticintEvaluateTheSumOfOneAndTwo()=>AddOneAndTwo().Eval();}

Using this library we can express the arithmetic expression 1 + 2 as we did in ExampleOne.AddOneAndTwo() and can evaluate the expression by calling .Eval(). Now imagine that we wish to extend this library, adding a new type is easy because we are working with an Object-oriented programming language. For example, we might create the following class:

publicclassMult:IEvalExp{publicMult(IEvalExpleft,IEvalExpright){Left=left;Right=right;}publicIEvalExpLeft{get;}publicIEvalExpRight{get;}publicintEval(){returnLeft.Eval()*Right.Eval();}}

However, if we wish to add a new function over the type (a new method in C# terminology) we have to change the IEvalExp interface and then modify all the classes that implement the interface. Another possibility is to create a new interface that extends the IEvalExp interface and then create sub-types for Lit, Add and Mult classes, but the expression returned in ExampleOne.AddOneAndTwo() has already been compiled so we will not be able to use the new function over the old type. The problem is reversed in functional programming languages like F# where it is easy to add a function over a given type, but extending or adding types is difficult.

Problem Solution using Object Algebra

Let us redesign the original library with extensibility in mind using the ideas from the paper Extensibility for the Masses. [17]

publicinterfaceExpAlgebra<T>{TLit(intn);TAdd(Tleft,Tright);}publicclassExpFactory:ExpAlgebra<IEvalExp>{publicIEvalExpLit(intn){returnnewLit(n);}publicIEvalExpAdd(IEvalExpleft,IEvalExpright){returnnewAdd(left,right);}}publicstaticclassExampleTwo<T>{publicstaticTAddOneToTwo(ExpAlgebra<T>ae)=>ae.Add(ae.Lit(1),ae.Lit(2));}

We use the same implementation as in the first code example but now add a new interface containing the functions over the type as well as a factory for the algebra. Notice that we now generate the expression in ExampleTwo.AddOneToTwo() using the ExpAlgebra<T> interface instead of directly from the types. We can now add a function by extending the ExpAlgebra<T> interface, we will add functionality to print the expression:

publicinterfaceIPrintExp:IEvalExp{stringPrint();}publicclassPrintableLit:Lit,IPrintExp{publicPrintableLit(intn):base(n){N=n;}publicintN{get;}publicstringPrint(){returnN.ToString();}}publicclassPrintableAdd:Add,IPrintExp{publicPrintableAdd(IPrintExpleft,IPrintExpright):base(left,right){Left=left;Right=right;}publicnewIPrintExpLeft{get;}publicnewIPrintExpRight{get;}publicstringPrint(){returnLeft.Print()+" + "+Right.Print();}}publicclassPrintFactory:ExpFactory,ExpAlgebra<IPrintExp>{publicIPrintExpAdd(IPrintExpleft,IPrintExpright){returnnewPrintableAdd(left,right);}publicnewIPrintExpLit(intn){returnnewPrintableLit(n);}}publicstaticclassExampleThree{publicstaticintEvaluate()=>ExampleTwo<IPrintExp>.AddOneToTwo(newPrintFactory()).Eval();publicstaticstringPrint()=>ExampleTwo<IPrintExp>.AddOneToTwo(newPrintFactory()).Print();}

Notice that in ExampleThree.Print() we are printing an expression that was already compiled in ExampleTwo, we did not need to modify any existing code. Notice also that this is still strongly typed, we do not need reflection or casting. If we would replace the PrintFactory() with the ExpFactory() in the ExampleThree.Print() we would get a compilation error since the .Print() method does not exist in that context.

See also

Related Research Articles

A visitor pattern is a software design pattern that separates the algorithm from the object structure. Because of this separation, new operations can be added to existing object structures without modifying the structures. It is one way to follow the open/closed principle in object-oriented programming and software engineering.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.

In software systems, encapsulation refers to the bundling of data with the mechanisms or methods that operate on the data. It may also refer to the limiting of direct access to some of that data, such as an object's components. Essentially, encapsulation prevents external code from being concerned with the internal workings of an object.

A method in object-oriented programming (OOP) is a procedure associated with an object, and generally also a message. An object consists of state data and behavior; these compose an interface, which specifies how the object may be used. A method is a behavior of an object parametrized by a user.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, high-level, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

In programming language theory and type theory, polymorphism is the use of a single symbol to represent multiple different types.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.

<span class="mw-page-title-main">Scala (programming language)</span> General-purpose programming language

Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are intended to address criticisms of Java.

<span class="mw-page-title-main">Racket (programming language)</span> Lisp dialect

Racket is a general-purpose, multi-paradigm programming language. The Racket language is a modern dialect of Lisp and a descendant of Scheme. It is designed as a platform for programming language design and implementation. In addition to the core Racket language, Racket is also used to refer to the family of programming languages and set of tools supporting development on and with Racket. Racket is also used for scripting, computer science education, and research.

Haxe is a high-level cross-platform programming language and compiler that can produce applications and source code for many different computing platforms from one code-base. It is free and open-source software, released under an MIT License. The compiler, written in OCaml, is released under the GNU General Public License (GPL) version 2.

<span class="mw-page-title-main">Vala (programming language)</span> Programming language

Vala is an object-oriented programming language with a self-hosting compiler that generates C code and uses the GObject system.

The syntax and semantics of PHP, a programming language, form a set of rules that define how a PHP program can be written and interpreted.

OptimJ is an extension for Java with language support for writing optimization models and abstractions for bulk data processing. The extensions and the proprietary product implementing the extensions were developed by Ateji which went out of business in September 2011. OptimJ aims at providing a clear and concise algebraic notation for optimization modeling, removing compatibility barriers between optimization modeling and application programming tools, and bringing software engineering techniques such as object-orientation and modern IDE support to optimization experts.

Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system.

Swift is a high-level general-purpose, multi-paradigm, compiled programming language created by Chris Lattner in 2010 for Apple Inc. and maintained by the open-source community. Swift compiles to machine code and uses an LLVM-based compiler. Swift was first released in June 2014 and the Swift toolchain has shipped in Xcode since version 6, released in 2014.

In programming language theory, flow-sensitive typing is a type system where the type of an expression depends on its position in the control flow.

References

  1. "The Expression Problem".
  2. Reynolds, John C. (1975). "User-defined Types and Procedural Data Structures as complementary approaches to Data Abstraction.". New Directions in Algorithmic Languages (PDF). IFIP Working Group 2.1 on Algol. pp. 157–168.
  3. Cook, William (1990). "Object-Oriented Programming versus Abstract Data Types". In Bakker, J.W. de; Roever, W.P. de; Rozenberg, G. (eds.). Foundations of Object-Oriented Languages (FOOL), REX School/Workshop. Lecture Notes in Computer Science. Vol. 489. Noordwijkerhout, The Netherlands: Springer Berlin Heidelberg. pp. 151–178. doi:10.1007/BFb0019443. ISBN   978-3-540-46450-1.
  4. "Synthesizing Object-Oriented and Functional Design to Promote Re-Use".
  5. Findler, Robert Bruce; Flatt, Matthew (1999). "Modular object-oriented programming with units and mixins". ACM SIGPLAN Notices. 34: 94–104. doi: 10.1145/291251.289432 .
  6. Cook, William (1989). A Denotational Semantics of Inheritance (PDF) (PhD). Brown University.
  7. Flatt, Matthew; Krishnamurthi, Shriram; Felleisen, Matthias (1998). "Classes and Mixins". Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '98. pp. 171–183. doi:10.1145/268946.268961. ISBN   978-0897919791. S2CID   5815257.
  8. Kühne, Thomas (1999). A Functional Pattern System for Object-Oriented Design. Darmstadt: Verlag Dr. Kovac. ISBN   978-3-86064-770-7.
  9. Smaragdakis, Yannis; Don Batory (1998). Implementing Reusable Object-Oriented Components. Lecture Notes in Computer Science. Vol. 1445.
  10. Zenger, Matthias; Odersky, Martin (2001). "Extensible Algebraic Datatypes with Defaults": 241–252. CiteSeerX   10.1.1.28.6778 .{{cite journal}}: Cite journal requires |journal= (help)
  11. Zenger, Matthias; Odersky, Martin (2005). "Independently extensible solutions to the expression problem". Foundations of Object-Oriented Languages (FOOL). ACM SIGPLAN. CiteSeerX   10.1.1.107.4449 .{{cite web}}: Missing or empty |url= (help)
  12. Chambers, Craig; Leavens, Gary T. (November 1995). "Type Checking and Modules for Multi-Methods". ACM Trans. Program. Lang. Syst. 17 (6): 805–843. doi: 10.1145/218570.218571 .
  13. Clifton, Curtis; Leavens, Gary T.; Chambers, Craig; Millstein, Todd (2000). "MultiJava: Modular Open Classes and Symmetric Multiple Dispatch for Java" (PDF). Oopsla '00. doi:10.1145/353171.353181. S2CID   7879645.
  14. Wouter Swierstra (2008). "Data Types à La Carte". Journal of Functional Programming . 18 (4). Cambridge University Press: 423–436. doi: 10.1017/S0956796808006758 . ISSN   0956-7968. S2CID   21038598.
  15. Wehr, Stefan; Thiemann, Peter (July 2011). "JavaGI: The Interaction of Type Classes with Interfaces and Inheritance". ACM Trans. Program. Lang. Syst. (33). doi: 10.1145/1985342.1985343 . S2CID   13174506.
  16. Carette, Jacques; Kiselyov, Oleg; Chung-chieh, Shan (2009). "Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages" (PDF). J. Funct. Program. 19 (5): 509–543. doi:10.1017/S0956796809007205. S2CID   6054319.
  17. 1 2 Oliveira, Bruno C. d. S.; Cook, William R. (2012). "Extensibility for the Masses: Practical Extensibility with Object Algebras" (PDF). Ecoop '12.
  18. Garrigue, Jacques (2000). "Code Reuse Through Polymorphic Variants". CiteSeerX   10.1.1.128.7169 .{{cite journal}}: Cite journal requires |journal= (help)