Object copying

Last updated

In object-oriented programming, object copying is the act of creating and initializing a new object based on an existing object's state. The various ways to implement copy have implications that a programmer needs to understand in order to write a computer program that is correct and performant.

Contents

Copying allows for the emergent state of the original object represented by its internal state to be used and even modified without affecting the original object.

Strategies

Generally, an object resembles a monolithic concept while having an internal structure that is composite data a tree of state. Several strategies have been developed to copy this internal state based on program needs and runtime cost.

The earliest discussed are shallow and deep copy with the terminology dating back to Smalltalk-80. [1]

A similar distinction holds for comparing objects for equality. For two objects to be equal, their state must be the same in a meaningful way. Two objects could be considered equal if their fields are equal without traversing into sub-objects (shallow). Or maybe they are considered equal only if the state is equal throughout the object tree (deep). [ clarification needed ]

If two variables contain the same reference value, then clearly they refer to the same object which is even more specific than equal.

Reference copy

Even shallower than shallow copy, copying a reference is a form of object copying. This strategy is commonly employed when passing an object to a method. The reference is passed by value a copy of the reference value (probably an address).

Shallow copy

49psq.png
Variable reference to different memory space
QqE2L.png
The assignment of variable B to A.
Cys27.png
Variables referring to same area of memory.

Shallow copy involves creating a new, uninitialized object, B, and copying each field value from the original, A. [2] [3] [4] Due to this procedure, this is also known as a field-by-field copy, [5] [6] [7] field-for-field copy, or field copy. [8] If the field value is a primitive type (such as int), the value is copied such that changes to the value in B do not affect the value in A. If the field value is a reference to an object (e.g., a memory address) the reference is copied, hence referring to the same object that A does. Changing the state of the inner object affects the emergent state of both A and B since the objects are shared. In a language without primitive types (where everything is an object), all fields of the copy reference the same objects as the fields of the original.

A shallow copy is often relatively simple to implement and computationally cheap to perform. It can usually be implemented by simply copying a contiguous block of memory.

Deep copy

Deep copy in progress.svg
A deep copy in progress.
Deep copy done.svg
A deep copy has been completed.

Deep copy involves copying the state of all subordinate objects recursively dereferencing object references at each level of the tree that is the state of the original object and creating new objects and copying fields. A modification of either the original or copied object, including their inner objects, does not affect the other since they share none of the content.

Hybrid

In more complex cases, some fields in a copy should have shared values with the original object (as in a shallow copy), corresponding to an association relationship; and some fields should have copies (as in a deep copy), corresponding to an aggregation relationship. In these cases a custom implementation of copying is generally required; this issue and solution dates to Smalltalk-80. [9] Alternatively, fields can be marked as requiring a shallow copy or deep copy, and copy operations automatically generated (likewise for comparison operations). [10] This is not implemented in most object-oriented languages, however, though there is partial support in Eiffel. [10]

Lazy copy

Lazy copy, related to copy-on-write, is an implementation of a deep copy. When initially copying an object, a relatively fast shallow copy is performed. A counter is also used to track how many objects share the data. When the program wants to modify an object, it can determine if the data is shared (by examining the counter) and can perform a deep copy if needed.

Lazy copy provides the semantics of a deep copy, but takes advantage of the speed of a shallow copy when possible. The downside are rather high but constant base costs because of the counter. Circular references can cause problems.

Examples

Generally, an object-oriented programming language provides a way to copy an object. A programmer must define how a custom-defined object is copied, just as they must define if two objects are equal, comparable and so on.

Some languages support one of both of the shallow or deep strategies, defining either one copy operation or separate shallow and deep operations. [10] Many languages provide some default behavior.

Java

In Java, an object is always accessed indirectly via a reference. An object is never created implicitly but instead is always passed or assigned by a reference variable.

Parameters are passed by value, however, it is the value of the reference that is passed. [11]

The Java Virtual Machine manages garbage collection so that objects are cleaned up after they are no longer reachable.

The language provides no automatic way to copy an object.

Copying is usually performed by a clone() method. This method usually calls the clone() method of its parent class to obtain a copy, and then does any custom copying procedures. Eventually, this gets to the clone() method of the uppermost object (Object), which creates a new instance of the same class as the object and copies all the fields to the new instance (a shallow copy). If this method is used, the class must implement the Cloneable interface, or else it will throw a "Clone Not Supported Exception". After obtaining a copy from the parent class, a class' own clone() method may then provide custom cloning capability, like deep copying (i.e. duplicate some of the structures referred to by the object) or giving the new instance a new unique ID.

The return type of clone() is Object, but implementers of a clone method could write the type of the object being cloned instead due to Java's support for covariant return types. One advantage of using clone() is that since it is an overridable method, we can call clone() on any object, and it will use the clone() method of its class, without the calling code needing to know what that class is (which would be needed with a copy constructor).

A disadvantage is that one often cannot access the clone() method on an abstract type. Most interfaces and abstract classes in Java do not specify a public clone() method. Thus, often the only way to use the clone() method is if the class of an object is known, which is contrary to the abstraction principle of using the most generic type possible. For example, if one has a List reference in Java, one cannot invoke clone() on that reference because List specifies no public clone() method. Implementations of List like Array List and Linked List all generally have clone() methods, but it is inconvenient and bad abstraction to carry around the class type of an object.

Another way to copy objects in Java is to serialize them through the Serializable interface. This is typically used for persistence and wire protocol purposes, but it does create copies of objects and, unlike clone, a deep copy that gracefully handles cycled graphs of objects is readily available with minimal effort from a programmer.

Both of these methods suffer from a notable problem: the constructor is not used for objects copied with clone or serialization. This can lead to bugs with improperly initialized data, prevents the use of final member fields, and makes maintenance challenging. Some utilities attempt to overcome these issues by using reflection to deep copy objects, such as the deep-cloning library. [12]

Eiffel

Runtime objects in Eiffel are accessible either indirectly through references or as expanded objects which fields are embedded within the objects that use them. That is, fields of an object are stored either externally or internally.

The Eiffel class ANY contains features for shallow and deep copying and cloning of objects. All Eiffel classes inherit from ANY, so these features are available within all classes, and are applicable both to reference and expanded objects.

The copy feature effects a shallow, field-by-field copy from one object to another. In this case no new object is created. If y were copied to x, then the same objects referenced by y before the application of copy, will also be referenced by x after the copy feature completes.

To effect the creation of a new object which is a shallow duplicate of y, the feature twin is used. In this case, one new object is created with its fields identical to those of the source.

The feature twin relies on the feature copy, which can be redefined in descendants of ANY, if needed. The result of twin is of the anchored type like Current.

Deep copying and creating deep twins can be done using the features deep_copy and deep_twin, again inherited from class ANY. These features have the potential to create many new objects, because they duplicate all the objects in an entire object structure. Because new duplicate objects are created instead of simply copying references to existing objects, deep operations will become a source of performance issues more readily than shallow operations.

C#

In C#, rather than using the interface ICloneable, a generic extension method can be used to create a deep copy using reflection. This has two advantages: First, it provides the flexibility to copy every object without having to specify each property and variable to be copied manually. Second, because the type is generic, the compiler ensures that the destination object and the source object have the same type.[ citation needed ]

Objective-C

In Objective-C, the methods copy and mutableCopy are inherited by all objects and intended for performing copies; the latter is for creating a mutable type of the original object. These methods in turn call the copyWithZone and mutableCopyWithZone methods, respectively, to perform the copying. An object must implement the corresponding copyWithZone method to be copyable.[ citation needed ]

OCaml

In OCaml, the library function Oo.copy performs shallow copying of an object.

Python

In Python, the library's copy module provides shallow copy and deep copy of objects through the copy() and deepcopy() functions, respectively. [13] Programmers may define special methods __copy__() and __deepcopy__() in an object to provide custom copying implementation.

Ruby

In Ruby, all objects inherit two methods for performing shallow copies, clone and dup. The two methods differ in that clone copies an object's tainted state, frozen state, and any singleton methods it may have, whereas dup copies only its tainted state. Deep copies may be achieved by dumping and loading an object's byte stream or YAML serialization. Alternatively, you can use the deep_dive gem to do a controlled deep copy of your object graphs.

Perl

In Perl, nested structures are stored by the use of references, thus a developer can either loop over the entire structure and re-reference the data or use the dclone() function from the module Storable.

VBA

In VBA, an assignment of variables of type Object is a shallow copy, an assignment for all other types (numeric types, String, user defined types, arrays) is a deep copy. So the keyword Set for an assignment signals a shallow copy and the (optional) keyword Let signals a deep copy. There is no built-in method for deep copies of Objects in VBA.[ citation needed ]

See also

Notes

  1. Goldberg & Robson 1983, pp. 97–99. "There are two ways to make copies of an object. The distinction is whether or not the values of the object's variables are copied. If the values are not copied, then they are shared (shallowCopy); if the values are copied, then they are not shared (deepCopy)."
  2. "C++ Shallow vs Deep Copy Explanation".
  3. ".NET Shallow vs Deep Copy Explanation".
  4. "Generic Shallow vs Deep Copy Explanation". Archived from the original on 2016-03-04. Retrieved 2013-04-10.
  5. Core Java: Fundamentals, Volume 1, p. 295
  6. Effective Java , Second Edition, p. 54
  7. "What is this field-by-field copy done by Object.clone()?", Stack Overflow
  8. "Josh Bloch on Design: A Conversation with Effective Java Author, Josh Bloch", by Bill Venners, JavaWorld, January 4, 2002, p. 13
  9. Goldberg & Robson 1983, p. 97. "The default implementation of copy is shallowCopy. In subclasses in which copying must result in a special combination of shared and unshared variables, the method associated with copy is usually reimplemented, rather than the method associated with shallowCopy or deepCopy."
  10. 1 2 3 Grogono & Sakkinen 2000.
  11. "Passing Information to a Method or a Constructor" . Retrieved 8 October 2013.
  12. Java deep-cloning library
  13. Python copy module

Related Research Articles

In object-oriented programming, a class defines the shared aspects of objects created from the class. The capabilities of a class differ between programming languages, but generally the shared aspects consist of state (variables) and behavior (methods) that are each either associated with a particular object or with all objects of that class.

Multiple inheritance is a feature of some object-oriented computer programming languages in which an object or class can inherit features from more than one parent object or parent class. It is distinct from single inheritance, where an object or class may only inherit from one particular object or class.

<span class="mw-page-title-main">Smalltalk</span> Object-oriented programming language released first in 1972

Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learning Research Group (LRG) scientists, including Alan Kay, Dan Ingalls, Adele Goldberg, Ted Kaehler, Diana Merry, and Scott Wallace.

<span class="mw-page-title-main">Serialization</span> Conversion process for computer data

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

Prototype-based programming is a style of object-oriented programming in which behavior reuse is performed via a process of reusing existing objects that serve as prototypes. This model can also be known as prototypal, prototype-oriented,classless, or instance-based programming.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries.

In class-based, object-oriented programming, a constructor is a special type of function called to create an object. It prepares the new object for use, often accepting arguments that the constructor uses to set required member variables.

<span class="mw-page-title-main">Java syntax</span> Set of rules defining correctly structured program

The syntax of Java is the set of rules defining how a Java program is written and interpreted.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

this, self, and Me are keywords used in some computer programming languages to refer to the object, class, or other entity which the currently running code is a part of. The entity referred to thus depends on the execution context. Different programming languages use these keywords in slightly different ways. In languages where a keyword like "this" is mandatory, the keyword is the only way to access data and methods stored in the current object. Where optional, these keywords can disambiguate variables and functions with the same name.

In computer science, a value object is a small object that represents a simple entity whose equality is not based on identity: i.e. two value objects are equal when they have the same value, not necessarily being the same object.

clone is a method in the Java programming language for object duplication. In Java, objects are manipulated through reference variables, and there is no operator for copying an object—the assignment operator duplicates the reference, not the object. The clone method provides this missing functionality.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

In computer programming, a constant is a value that is not altered by the program during normal execution. When associated with an identifier, a constant is said to be "named," although the terms "constant" and "named constant" are often used interchangeably. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution. To simplify, constants' values remains, while the values of variables varies, hence both their names.

<span class="mw-page-title-main">Object-oriented programming</span> Programming paradigm based on the concept of objects

Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields, and code in the form of procedures. In OOP, computer programs are designed by making them out of objects that interact with one another.

References