Primitive data type

Last updated

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. [1] Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. [2] More generally, primitive data types may refer to the standard data types built into a programming language (built-in types). [3] [4] Data types which are not primitive are referred to as derived or composite. [3]

Contents

Primitive types are almost always value types, but composite types may also be value types. [5]

Common primitive data types

The most common primitive types are those used and supported by computer hardware, such as integers of various sizes, floating-point numbers, and Boolean logical values. Operations on such types are usually quite efficient. Primitive data types which are native to the processor have a one-to-one correspondence with objects in the computer's memory, and operations on these types are often the fastest possible in most cases. [6] Integer addition, for example, can be performed as a single machine instruction, and some offer specific instructions to process sequences of characters with a single instruction. [7] But the choice of primitive data type may affect performance, for example it is faster using SIMD operations and data types to operate on an array of floats. [6] :113

Integer numbers

An integer data type represents some range of mathematical integers. Integers may be either signed (allowing negative values) or unsigned (non-negative integers only). Common ranges are:

Size (bytes)Size (bits)Names Signed range (two's complement representation)Unsigned range
1 byte8 bits Byte, octet, minimum size of char in C99( see limits.h CHAR_BIT)−128 to +1270 to 255
2 bytes16 bits x86 word, minimum size of short and int in C−32,768 to +32,7670 to 65,535
4 bytes32 bitsx86 double word, minimum size of long in C, actual size of int for most modern C compilers, [8] pointer for IA-32-compatible processors−2,147,483,648 to +2,147,483,6470 to 4,294,967,295
8 bytes64 bitsx86 quadruple word, minimum size of long long in C, actual size of long for most modern C compilers, [8] pointer for x86-64-compatible processors−9,223,372,036,854,775,808 to +9,223,372,036,854,775,8070 to 18,446,744,073,709,551,615

Floating-point numbers

A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a format equivalent to scientific notation, typically in binary but sometimes in decimal. Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Many languages have both a single precision (often called float) and a double precision type (often called double).

Booleans

A Boolean type, typically denoted bool or boolean, is typically a logical type that can have either the value true or the value false. Although only one bit is necessary to accommodate the value set true and false, programming languages typically implement Boolean types as one or more bytes.

Many languages (e.g. Java, Pascal and Ada) implement Booleans adhering to the concept of Boolean as a distinct logical type. Some languages, though, may implicitly convert Booleans to numeric types at times to give extended semantics to Booleans and Boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, early versions of the C programming language that followed ANSI C and its former standards did not have a dedicated Boolean type. Instead, numeric values of zero are interpreted as false, and any other value is interpreted as true. [9] The newer C99 added a distinct Boolean type _Bool (the more intuitive name bool as well as the macros true and false can be included with stdbool.h), [10] and C++ supports bool as a built-in type and true and false as reserved words. [11]

Specific languages

Java

The Java virtual machine's set of primitive data types consists of: [12]

C basic types

The set of basic C data types is similar to Java's. Minimally, there are four types, char, int, float, and double, but the qualifiers short, long, signed, and unsigned mean that C contains numerous target-dependent integer and floating-point primitive types. [15] C99 extended this set by adding the Boolean type _Bool and allowing the modifier long to be used twice in combination with int (e.g. long long int). [16]

XML Schema

The XML Schema Definition language provides a set of 19 primitive data types: [17]

JavaScript

In JavaScript, there are 7 primitive data types: string, number, bigint, boolean, symbol, undefined, and null. [19] Their values are considered immutable. These are not objects and have no methods or properties; however, all primitives except undefined and null have object wrappers. [20]

Visual Basic .NET

In Visual Basic .NET, the primitive data types consist of 4 integral types, 2 floating-point types, a 16-byte decimal type, a Boolean type, a date/time type, a Unicode character type, and a Unicode string type. [21]

Rust

Rust has primitive unsigned and signed fixed width integers in the format u or i respectively followed by any bit width that is a power of two between 8 and 128 giving the types u8, u16, u32, u64, u128, i8, i16, i32, i64 and i128. [22] Also available are the types usize and isize which are unsigned and signed integers that are the same bit width as a reference with the usize type being used for indices into arrays and indexable collection types. [22]

Rust also has:

Built-in types

Built-in types are distinguished from others by having specific support in the compiler or runtime, to the extent that it would not be possible to simply define them in a header file or standard library module. [23] Besides integers, floating-point numbers, and Booleans, other built-in types include:

Characters and strings

A character type is a type that can represent all Unicode characters, hence must be at least 21 bits wide. Some languages such as Julia include a true 32-bit Unicode character type as primitive. [24] Other languages such as JavaScript, Python, Ruby, and many dialects of BASIC do not have a primitive character type but instead add strings as a primitive data type, typically using the UTF-8 encoding. Strings with a length of one are normally used to represent single characters.

Some languages have character types that are too small to represent all Unicode characters. These are more properly categorized as integer types that have been given a misleading name. For example C includes a char type, but it is defined to be the smallest addressable unit of memory, which several standards (such as POSIX) require to be 8 bits. Recent versions of these standards refer to char as a numeric type. char is also used for a 16-bit integer type in Java, but again this is not a Unicode character type. [25]

The term string also does not always refer to a sequence of Unicode characters, instead referring to a sequence of bytes. For example, x86-64 has string instructions to move, set, search, or compare a sequence of items, where an item could be 1, 2, 4, or 8 bytes long. [26]

See also

Related Research Articles

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

External Data Representation (XDR) is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems. Converting from the local representation to XDR is called encoding. Converting from XDR to the local representation is called decoding. XDR is implemented as a software library of functions which is portable between different operating systems and is also independent of the transport layer.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

In computer science, a union is a value that may have any of multiple representations or formats within the same area of memory; that consists of a variable that may hold such a data structure. Some programming languages support a union type for such a data type. In other words, a union type specifies the permitted types that may be stored in its instances, e.g., float and integer. In contrast with a record, which could be defined to contain both a float and an integer; a union would hold only one at a time.

In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. In general, both primitive and compound data types can be converted.

A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file. Thus, it is called a .class file because it contains the bytecode for a single class.

<span class="mw-page-title-main">Boolean data type</span> Data having only values "true" or "false"

In computer science, the Boolean is a data type that has one of two possible values which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions by changing control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type—logic does not always need to be Boolean.

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

IEC 61131-3 is the third part of the international standard IEC 61131 for programmable logic controllers. It was first published in December 1993 by the IEC; the current (third) edition was published in February 2013.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">Integer overflow</span> Computer arithmetic error

In computer programming, an integer overflow occurs when an arithmetic operation on integers attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximum or lower than the minimum representable value.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

A bit field is a data structure that maps to one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to represent integral types of known, fixed bit-width, such as single-bit Booleans.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

Some programming languages provide a complex data type for complex number storage and arithmetic as a built-in (primitive) data type.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

Ion is a data serialization language developed by Amazon. It may be represented by either a human-readable text form or a compact binary form. The text form is a superset of JSON; thus, any valid JSON document is also a valid Ion document.

References

  1. Stone, R. G.; Cooke, D. J. (5 February 1987). Program Construction. Cambridge University Press. p. 18. ISBN   978-0-521-31883-9.
  2. Wikander, Jan; Svensson, Bertil (31 May 1998). Real-Time Systems in Mechatronic Applications. Springer Science & Business Media. p. 101. ISBN   978-0-7923-8159-4.
  3. 1 2 Khurana, Rohit. Data and File Structure (For GTU), 2nd Edition. Vikas Publishing House. p. 2. ISBN   978-93-259-6005-3.
  4. Chun, Wesley (2001). Core Python Programming. Prentice Hall Professional. p. 77. ISBN   978-0-13-026036-9.
  5. Olsen, Geir; Allison, Damon; Speer, James (1 January 2008). Visual Basic .NET Class Design Handbook: Coding Effective Classes. Apress. p. 80. ISBN   978-1-4302-0780-1.
  6. 1 2 Fog, Agner. "Optimizing software in C++" (PDF). p. 29. Retrieved 28 January 2022. Integer operations are fast in most cases, [...]
  7. "Single Instruction Single Data - an overview | ScienceDirect Topics".
  8. 1 2 Fog, Agner (2010-02-16). "Calling conventions for different C++ compilers and operating systems: Chapter 3, Data Representation" (PDF). Retrieved 2010-08-30.
  9. Kernighan, Brian W; Ritchie, Dennis M (1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. p.  41. ISBN   0-13-110163-3.
  10. "Boolean type support library". devdocs.io. Retrieved October 15, 2020.
  11. "Bool data type in C++". GeeksforGeeks. 5 June 2017. Retrieved October 15, 2020.
  12. Lindholm, Tim; Yellin, Frank; Bracha, Gilad; Buckley, Alex (13 February 2015). "Chapter 2. The Structure of the Java Virtual Machine". The Java® Virtual Machine Specification.
  13. Cowell, John (18 February 1997). Essential Java Fast: How to write object oriented software for the Internet. Springer Science & Business Media. p. 27. ISBN   978-3-540-76052-8.
  14. Rakshit, Sandip; Panigrahi, Goutam (December 1995). A Hand Book of Objected Oriented Programming With Java. S. Chand Publishing. p. 11. ISBN   978-81-219-3001-7.
  15. Kernighan, Brian W.; Ritchie, Dennis M. (1988). "2.2 Data Types and Sizes". The C programming language (Second ed.). Englewood Cliffs, N.J. p. 36. ISBN   0131103709.{{cite book}}: CS1 maint: location missing publisher (link)
  16. ISO/IEC 9899:1999 specification, TC3 (PDF). p. 255, § 6.2.5 Types.
  17. Biron, Paul V.; Malhotra, Ashok. "XML Schema Part 2: Datatypes". www.w3.org (Second ed.). Retrieved 29 January 2022.
  18. Phillips, Lee Anne (18 January 2002). "Declaring a NOTATION | Understanding XML Document Type Definitions". www.informit.com. Retrieved 29 January 2022.
  19. "Primitive - MDN Web Docs Glossary: Definitions of Web-related terms". MDN. 8 June 2023.
  20. "JavaScript data types and data structures". MDN. 9 July 2024.
  21. "Types in Visual Basic". Microsoft Docs. 18 September 2021. Retrieved 18 May 2022.
  22. 1 2 3 4 5 "Data Types - The Rust Programming Language". doc.rust-lang.org. Retrieved 2023-10-17.
  23. "Built-in types (C++)". learn.microsoft.com. 17 August 2021.
  24. "Strings · The Julia Language". docs.julialang.org. Retrieved 29 January 2022.
  25. Mansoor, Umer (8 May 2016). "The char Type in Java is Broken". CodeAhoy. Retrieved 10 February 2020.
  26. "I/O and string instructions" . Retrieved 29 January 2022.