Boxing (computer programming)

Last updated

In computer science, boxing (a.k.a. wrapping) is the transformation of placing a primitive type within an object so that the value can be used as a reference. Unboxing is the reverse transformation of extracting the primitive value from its wrapper object. Autoboxing is the term for automatically applying boxing and/or unboxing transformations as needed.

Contents

Boxing

Boxing's most prominent use is in Java where there is a distinction between reference and value types for reasons such as runtime efficiency and syntax and semantic issues. In Java, a LinkedList can only store values of type Object. One might desire to have a LinkedList of int, but this is not directly possible. Instead Java defines primitive wrapper classes corresponding to each primitive type: Integer and int, Character and char, Float and float, etc. One can then define a LinkedList using the boxed type Integer and insert int values into the list by boxing them as Integer objects. (Using generic parameterized types introduced in J2SE 5.0, this type is represented as LinkedList<Integer>.)

On the other hand, C# has no primitive wrapper classes, but allows boxing of any value type, returning a generic Object reference. In Objective-C, any primitive value can be prefixed by a @ to make an NSNumber out of it (e.g. @123 or @(123)). This allows for adding them in any of the standard collections, such as an NSArray.

Haskell has little or no notion of reference type, but still uses the term "boxed" for the runtime system's uniform pointer-to-tagged union representation. [1]

The boxed object is always a copy of the value object, and is usually immutable. Unboxing the object also returns a copy of the stored value. Repeated boxing and unboxing of objects can have a severe performance impact, because boxing dynamically allocates new objects and unboxing (if the boxed value is no longer used) then makes them eligible for garbage collection. However, modern garbage collectors such as the default Java HotSpot garbage collector can more efficiently collect short-lived objects, so if the boxed objects are short-lived, the performance impact may not be severe.

In some languages, there is a direct equivalence between an unboxed primitive type and a reference to an immutable, boxed object type. In fact, it is possible to substitute all the primitive types in a program with boxed object types. Whereas assignment from one primitive to another will copy its value, assignment from one reference to a boxed object to another will copy the reference value to refer to the same object as the first reference. However, this will not cause any problems, because the objects are immutable, so there is semantically no real difference between two references to the same object or to different objects (unless you look at physical equality). For all operations other than assignment, such as arithmetic, comparison, and logical operators, one can unbox the boxed type, perform the operation, and re-box the result as needed. Thus, it is possible to not store primitive types at all.

Autoboxing

Autoboxing is the term for getting a reference type out of a value type just through type conversion (either implicit or explicit). The compiler automatically supplies the extra source code that creates the object.

For example, in versions of Java prior to J2SE 5.0, the following code did not compile:

Integeri=newInteger(9);Integeri=9;// error in versions prior to 5.0!

Compilers prior to 5.0 would not accept the last line. Integer are reference objects, on the surface no different from List, Object, and so forth. To convert from an int to an Integer, one had to "manually" instantiate the Integer object. As of J2SE 5.0, the compiler will accept the last line, and automatically transform it so that an Integer object is created to store the value 9. [2] This means that, from J2SE 5.0 on, something like Integerc=a+b, where a and b are Integer themselves, will compile now - a and b are unboxed, the integer values summed up, and the result is autoboxed into a new Integer, which is finally stored inside variable c. The equality operators cannot be used this way, because the equality operators are already defined for reference types, for equality of the references; to test for equality of the value in a boxed type, one must still manually unbox them and compare the primitives, or use the Objects.equals method.

Another example: J2SE 5.0 allows the programmer to treat a collection (such as a LinkedList) as if it contained int values instead of Integer objects. This does not contradict what was said above: the collection still only contains references to dynamic objects, and it cannot list primitive types. It cannot be a LinkedList<int>, but it must be a LinkedList<Integer> instead. However, the compiler automatically transforms the code so that the list will "silently" receive objects, while the source code only mentions primitive values. For example, the programmer can now write list.add(3) and think as if the int3 were added to the list; but, the compiler will have actually transformed the line into list.add(newInteger(3)).

Automatic unboxing

With automatic unboxing the compiler automatically supplies the extra source code that retrieves the value out of that object, either by invoking some method on that object, or by other means.

For example, in versions of Java prior to J2SE 5.0, the following code did not compile:

Integerk=newInteger(4);intl=k.intValue();// always okayintm=k;// would have been an error, but okay now

C# doesn't support automatic unboxing in the same meaning as Java, because it doesn't have a separate set of primitive types and object types. All types that have both primitive and object version in Java, are automatically implemented by the C# compiler as either primitive (value) types or object (reference) types.

In both languages, automatic boxing does not downcast automatically, i.e. the following code won't compile:

C#:

inti=42;objecto=i;// boxintj=o;// unbox (error)Console.WriteLine(j);// unreachable line, author might have expected output "42"

Java:

inti=42;Objecto=i;// boxintj=o;// unbox (error)System.out.println(j);// unreachable line, author might have expected output "42"

Boxing in Rust

Rust have the Box struct: [3]

letnumber=Box::new(42);

If the value needs to passed to another thread it needs to use Arc. [4] [5]

Type helpers

Modern Object Pascal has yet another way to perform operations on simple types, close to boxing, called type helpers in FreePascal or record helpers in Delphi and FreePascal in Delphi mode.
The dialects mentioned are Object Pascal compile-to-native languages, and so miss some of the features that C# and Java can implement. Notably run-time type inference on strongly typed variables.
But the feature is related to boxing.
It allows the programmer to use constructs like

{$ifdef fpc}{$mode delphi}{$endif}usessysutils;// this unit contains wraps for the simple typesvarx:integer=100;s:string;begins:=x.ToString;writeln(s);end.

Related Research Articles

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered in the programming language ML in 1973, permits writing common functions or data types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, primitive data types may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. In general, both primitive and compound data types can be converted.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries.

<span class="mw-page-title-main">Java syntax</span> Set of rules defining correctly structured program

The syntax of Java is the set of rules defining how a Java program is written and interpreted.

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time-efficient or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">C Sharp (programming language)</span> Programming language

C# is a general-purpose high-level programming language supporting multiple paradigms. C# encompasses static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.

<span class="mw-page-title-main">Scala (programming language)</span> General-purpose programming language

Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are intended to address criticisms of Java.

In object-oriented programming, a wrapper class is a class that encapsulates types, so that those types can be used to create object instances and methods in another class that needs those types. So a primitive wrapper class is a wrapper class that encapsulates, hides or wraps data types from the eight primitive data types, so that these can be used to create instantiated objects with methods in another class or in other classes. The primitive wrapper classes are found in the Java API.

In Microsoft's .NET Framework, the Common Type System (CTS) is a standard that specifies how type definitions and specific values of types are represented in computer memory. It is intended to allow programs written in different programming languages to easily share information. As used in programming languages, a type can be described as a definition of a set of values, and the allowable operations on those values.

In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.

Generics are a facility of generic programming that were added to the Java programming language in 2004 within version J2SE 5.0. They were designed to extend Java's type system to allow "a type or method to operate on objects of various types while providing compile-time type safety". The aspect compile-time type safety required that parametrically polymorphic functions are not implemented in the Java virtual machine, since type safety is impossible in this case.

In certain computer programming languages, data types are classified as either value types or reference types, where reference types are always implicitly accessed via references, whereas value type variables directly contain the values themselves.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

In computer programming, a constant is a value that is not altered by the program during normal execution. When associated with an identifier, a constant is said to be "named," although the terms "constant" and "named constant" are often used interchangeably. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution. To simplify, constants' values remains, while the values of variables varies, hence both their names.

References

  1. "7.2. Unboxed types and primitive operations". downloads.haskell.org. Retrieved 10 August 2022.
  2. oracle.com Java language guide entry on autoboxing
  3. "Box in std::boxed - Rust". doc.rust-lang.org. Retrieved 18 January 2025.
  4. "Arc in std::sync - Rust". doc.rust-lang.org. Retrieved 18 January 2025.
  5. "Arc - Rust By Example". doc.rust-lang.org. Retrieved 18 January 2025.