Type conversion

Last updated

In computer science, type conversion, [1] [2] type casting, [1] [3] type coercion, [3] and type juggling [4] [5] are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, [1] [6] and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. [6] [7] In general, both primitive and compound data types can be converted.

Contents

Each programming language has its own rules on how types can be converted. Languages with strong typing typically do little implicit conversion and discourage the reinterpretation of representations, while languages with weak typing perform many implicit conversions between data types. Weak typing language often allow forcing the compiler to arbitrarily interpret a data item as having different representations—this can be a non-obvious programming error, or a technical method to directly deal with underlying hardware.

In most languages, the word coercion is used to denote an implicit conversion, either during compilation or during run time. For example, in an expression mixing integer and floating point numbers (like 5 + 0.1), the compiler will automatically convert integer representation into floating point representation so fractions are not lost. Explicit type conversions are either indicated by writing additional code (e.g. adding type identifiers or calling built-in routines) or by coding conversion routines for the compiler to use when it otherwise would halt with a type mismatch.

In most ALGOL-like languages, such as Pascal, Modula-2, Ada and Delphi, conversion and casting are distinctly different concepts. In these languages, conversion refers to either implicitly or explicitly changing a value from one data type storage format to another, e.g. a 16-bit integer to a 32-bit integer. The storage needs may change as a result of the conversion, including a possible loss of precision or truncation. The word cast, on the other hand, refers to explicitly changing the interpretation of the bit pattern representing a value from one type to another. For example, 32 contiguous bits may be treated as an array of 32 booleans, a 4-byte string, an unsigned 32-bit integer or an IEEE single precision floating point value. Because the stored bits are never changed, the programmer must know low level details such as representation format, byte order, and alignment needs, to meaningfully cast.

In the C family of languages and ALGOL 68, the word cast typically refers to an explicit type conversion (as opposed to an implicit conversion), causing some ambiguity about whether this is a re-interpretation of a bit-pattern or a real data representation conversion. More important is the multitude of ways and rules that apply to what data type (or class) is located by a pointer and how a pointer may be adjusted by the compiler in cases like object (class) inheritance.

Explicit casting in various languages

Ada

Ada provides a generic library function Unchecked_Conversion. [8]

C-like languages

Implicit type conversion

Implicit type conversion, also known as coercion or type juggling, is an automatic type conversion by the compiler. Some programming languages allow compilers to provide coercion; others require it.

In a mixed-type expression, data of one or more subtypes can be converted to a supertype as needed at runtime so that the program will run correctly. For example, the following is legal C language code:

doubled;longl;inti;if(d>i)d=i;if(i>l)l=i;if(d==l)d*=2;

Although d, l, and i belong to different data types, they will be automatically converted to equal data types each time a comparison or assignment is executed. This behavior should be used with caution, as unintended consequences can arise. Data can be lost when converting representations from floating-point to integer, as the fractional components of the floating-point values will be truncated (rounded toward zero). Conversely, precision can be lost when converting representations from integer to floating-point, since a floating-point type may be unable to exactly represent all possible values of some integer type. For example, float might be an IEEE 754 single precision type, which cannot represent the integer 16777217 exactly, while a 32-bit integer type can. This can lead to unintuitive behavior, as demonstrated by the following code:

#include<stdio.h>intmain(void){inti_value=16777217;floatf_value=16777216.0;printf("The integer is: %d\n",i_value);printf("The float is:   %f\n",f_value);printf("Their equality: %d\n",i_value==f_value);}

On compilers that implement floats as IEEE single precision, and ints as at least 32 bits, this code will give this peculiar print-out:

The integer is: 16777217 The float is: 16777216.000000 Their equality: 1

Note that 1 represents equality in the last line above. This odd behavior is caused by an implicit conversion of i_value to float when it is compared with f_value. The conversion causes loss of precision, which makes the values equal before the comparison.

Important takeaways:

  1. float to int causes truncation, i.e., removal of the fractional part.
  2. double to float causes rounding of digit.
  3. long to int causes dropping of excess higher order bits.
Type promotion

One special case of implicit type conversion is type promotion, where an object is automatically converted into another data type representing a superset of the original type. Promotions are commonly used with types smaller than the native type of the target platform's arithmetic logic unit (ALU), before arithmetic and logical operations, to make such operations possible, or more efficient if the ALU can work with more than one type. C and C++ perform such promotion for objects of boolean, character, wide character, enumeration, and short integer types which are promoted to int, and for objects of type float, which are promoted to double. Unlike some other type conversions, promotions never lose precision or modify the value stored in the object.

In Java:

intx=3;doubley=3.5;System.out.println(x+y);// The output will be 6.5

Explicit type conversion

Explicit type conversion, also called type casting, is a type conversion which is explicitly defined within a program (instead of being done automatically according to the rules of the language for implicit type conversion). It is requested by the user in the program.

doubleda=3.3;doubledb=3.3;doubledc=3.4;intresult=(int)da+(int)db+(int)dc;// result == 9// if implicit conversion would be used (as with "result = da + db + dc"), result would be equal to 10

There are several kinds of explicit conversion.

checked
Before the conversion is performed, a runtime check is done to see if the destination type can hold the source value. If not, an error condition is raised.
unchecked
No check is performed. If the destination type cannot hold the source value, the result is undefined.
bit pattern
The raw bit representation of the source is copied verbatim, and it is re-interpreted according to the destination type. This can also be achieved via aliasing.

In object-oriented programming languages, objects can also be downcast  : a reference of a base class is cast to one of its derived classes.

C# and C++

In C#, type conversion can be made in a safe or unsafe (i.e., C-like) manner, the former called checked type cast. [9]

Animalanimal=newCat();Bulldogb=(Bulldog)animal;// if (animal is Bulldog), stat.type(animal) is Bulldog, else an exceptionb=animalasBulldog;// if (animal is Bulldog), b = (Bulldog) animal, else b = nullanimal=null;b=animalasBulldog;// b == null

In C++ a similar effect can be achieved using C++-style cast syntax.

Animal*animal=newCat;Bulldog*b=static_cast<Bulldog*>(animal);// compiles only if either Animal or Bulldog is derived from the other (or same)b=dynamic_cast<Bulldog*>(animal);// if (animal is Bulldog), b = (Bulldog*) animal, else b = nullptrBulldog&br=static_cast<Bulldog&>(*animal);// same as above, but an exception will be thrown if a nullptr was to be returned// this is not seen in code where exception handling is avoideddeleteanimal;// always free resourcesanimal=nullptr;b=dynamic_cast<Bulldog*>(animal);// b == nullptr

Eiffel

In Eiffel the notion of type conversion is integrated into the rules of the type system. The Assignment Rule says that an assignment, such as:

x:=y

is valid if and only if the type of its source expression, y in this case, is compatible with the type of its target entity, x in this case. In this rule, compatible with means that the type of the source expression either conforms to or converts to that of the target. Conformance of types is defined by the familiar rules for polymorphism in object-oriented programming. For example, in the assignment above, the type of y conforms to the type of x if the class upon which y is based is a descendant of that upon which x is based.

Definition of type conversion in Eiffel

The actions of type conversion in Eiffel, specifically converts to and converts from are defined as:

A type based on a class CU converts to a type T based on a class CT (and T converts from U) if either

CT has a conversion procedure using U as a conversion type, or
CU has a conversion query listing T as a conversion type

Example

Eiffel is a fully compliant language for Microsoft .NET Framework. Before development of .NET, Eiffel already had extensive class libraries. Using the .NET type libraries, particularly with commonly used types such as strings, poses a conversion problem. Existing Eiffel software uses the string classes (such as STRING_8) from the Eiffel libraries, but Eiffel software written for .NET must use the .NET string class (System.String) in many cases, for example when calling .NET methods which expect items of the .NET type to be passed as arguments. So, the conversion of these types back and forth needs to be as seamless as possible.

my_string:STRING_8-- Native Eiffel stringmy_system_string:SYSTEM_STRING-- Native .NET string...my_string:=my_system_string

In the code above, two strings are declared, one of each different type (SYSTEM_STRING is the Eiffel compliant alias for System.String). Because System.String does not conform to STRING_8, then the assignment above is valid only if System.String converts to STRING_8.

The Eiffel class STRING_8 has a conversion procedure make_from_cil for objects of type System.String. Conversion procedures are also always designated as creation procedures (similar to constructors). The following is an excerpt from the STRING_8 class:

classSTRING_8...createmake_from_cil...convertmake_from_cil({SYSTEM_STRING})...

The presence of the conversion procedure makes the assignment:

my_string:=my_system_string

semantically equivalent to:

createmy_string.make_from_cil(my_system_string)

in which my_string is constructed as a new object of type STRING_8 with content equivalent to that of my_system_string.

To handle an assignment with original source and target reversed:

my_system_string:=my_string

the class STRING_8 also contains a conversion query to_cil which will produce a System.String from an instance of STRING_8.

classSTRING_8...createmake_from_cil...convertmake_from_cil({SYSTEM_STRING})to_cil:{SYSTEM_STRING}...

The assignment:

my_system_string:=my_string

then, becomes equivalent to:

my_system_string:=my_string.to_cil

In Eiffel, the setup for type conversion is included in the class code, but then appears to happen as automatically as explicit type conversion in client code. The includes not just assignments but other types of attachments as well, such as argument (parameter) substitution.

Rust

Rust provides no implicit type conversion (coercion) between primitive types. But, explicit type conversion (casting) can be performed using the as keyword. [10]

letx=1000;println!("1000 as a u16 is: {}",xasu16);

Implicit casting using untagged unions

Many programming languages support union types which can hold a value of multiple types. Untagged unions are provided in some languages with loose type-checking, such as C and PL/I, but also in the original Pascal. These can be used to interpret the bit pattern of one type as a value of another type.

Security issues

In hacking, typecasting is the misuse of type conversion to temporarily change a variable's data type from how it was originally defined. [11] This provides opportunities for hackers since in type conversion after a variable is "typecast" to become a different data type, the compiler will treat that hacked variable as the new data type for that specific operation. [12]

See also

Related Research Articles

Eiffel is an object-oriented programming language designed by Bertrand Meyer and Eiffel Software. Meyer conceived the language in 1985 with the goal of increasing the reliability of commercial software development; the first version becoming available in 1986. In 2005, Eiffel became an ISO-standardized language.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

In computer science, boxing is the transformation of placing a primitive type within an object so that the value can be used as a reference. Unboxing is the reverse transformation of extracting the primitive value from its wrapper object. Autoboxing is the term for automatically applying boxing and/or unboxing transformations as needed.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record, which could be defined to contain both a float and an integer; in a union, there is only one value at any given time.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

In class-based, object-oriented programming, a constructor is a special type of function called to create an object. It prepares the new object for use, often accepting arguments that the constructor uses to set required member variables.

In computer science, the Boolean is a data type that has one of two possible values which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions by changing control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type—logic does not always need to be Boolean.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">C data types</span> Data types supported by the C programming language

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

<span class="mw-page-title-main">C Sharp (programming language)</span> Programming language

C# is a general-purpose high-level programming language supporting multiple paradigms. C# encompasses static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

A class in C++ is a user-defined type or data structure declared with keyword class that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class is private. The private members are not accessible outside the class; they can be accessed only through methods of the class. The public members form an interface to the class and are accessible outside the class.

In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.

Haxe is a high-level cross-platform programming language and compiler that can produce applications and source code for many different computing platforms from one code-base. It is free and open-source software, released under the MIT License. The compiler, written in OCaml, is released under the GNU General Public License (GPL) version 2.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

References

  1. 1 2 3 Mehrotra, Dheeraj (2008). S. Chand's Computer Science. S. Chand. pp. 81–83. ISBN   978-8121929844.
  2. Programming Languages - Design and Constructs. Laxmi Publications. 2013. p. 35. ISBN   978-9381159415.
  3. 1 2 Reilly, Edwin (2004). Concise Encyclopedia of Computer Science. John Wiley & Sons. pp.  82, 110. ISBN   0470090952.
  4. Fenton, Steve (2017). Pro TypeScript: Application-Scale JavaScript Development. Apress. pp. xxiii. ISBN   978-1484232491.
  5. "PHP: Type Juggling - Manual". php.net. Retrieved 27 January 2019.
  6. 1 2 Olsson, Mikael (2013). C++ Quick Syntax Reference. Apress. pp. 87–89. ISBN   978-1430262770.
  7. Kruse, Rudolf; Borgelt, Christian; Braune, Christian; Mostaghim, Sanaz; Steinbrecher, Matthias (16 September 2016). Computational Intelligence: A Methodological Introduction. Springer. p. 269. ISBN   978-1447172963.
  8. "Unchecked Type Conversions". Ada Information Clearinghouse. Retrieved 11 March 2023.
  9. Mössenböck, Hanspeter (25 March 2002). "Advanced C#: Checked Type Casts" (PDF). Institut für Systemsoftware, Johannes Kepler Universität Linz, Fachbereich Informatik. p. 5. Retrieved 4 August 2011. at C# Tutorial
  10. "Casting - Rust By Example". doc.rust-lang.org.
  11. Jon Erickson Hacking, 2nd Edition: The Art of Exploitation 2008 1593271441 p51 "Typecasting is simply a way to temporarily change a variable's data type, despite how it was originally defined. When a variable is typecast into a different type, the compiler is basically told to treat that variable as if it were the new data type, but only for that operation. The syntax for typecasting is as follows: (typecast_data_type) variable ..."
  12. Arpita Gopal Magnifying C 2009 8120338618 p. 59 "From the above, it is clear that the usage of typecasting is to make a variable of one type, act like another type for one single operation. So by using this ability of typecasting it is possible for create ASCII characters by typecasting integer to its ..."