Primitive data type

Last updated

In computer science, primitive data type is either of the following:[ citation needed ]

Contents

In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types.

Depending on the language and its implementation, primitive data types may or may not have a one-to-one correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are.[ citation needed ] Integer addition, for example, can be performed as a single machine instruction, and some processors offer specific instructions to process sequences of characters with a single instruction.[ citation needed ] In particular, the C standard mentions that "a 'plain' int object has the natural size suggested by the architecture of the execution environment."[ citation needed ] This means that int is likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost always value types.

Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be modified by programs. Exceptions include Smalltalk, which permits all data types to be extended within a program, adding to the operations that can be performed on them or even redefining the built-in operations.[ citation needed ]

Overview

The actual range of primitive data types that is available is dependent upon the specific programming language that is being used. For example, in C#, strings are a composite but built-in data type, whereas in modern dialects of BASIC and in JavaScript, they are assimilated to a primitive data type that is both basic and built-in. [1] [2]

Classic basic primitive types may include:

The above primitives are generally supported more or less directly by computer hardware, except possibly for floating point, so operations on such primitives are usually fairly efficient. Some programming languages support text strings as a primitive (e.g. BASIC) while others treat a text string as an array of characters (e.g. C). Some computer hardware (e.g. x86) has instructions which help in dealing with text strings, but complete hardware support for text strings is rare.

Strings could be any series of characters in the used encoding. To separate strings from code, most languages enclose them by single or double quotes. For example "Hello World" or 'Hello World'. Note that "200" could be mistaken for an integer type but is actually a string type because it is contained in double quotes.

More sophisticated types which can be built-in include:

Specific primitive data types

Integer numbers

An integer data type represents some range of mathematical integers. Integers may be either signed (allowing negative values) or unsigned (non-negative integers only). Common ranges are:

Size (bytes)Size (bits)NamesSigned range (assuming two's complement for signed)Unsigned range
1 byte8 bits Byte, octet, minimum size of char in C99( see limits.h CHAR_BIT)−128 to +1270 to 255
2 bytes16 bits x86 word, minimum size of short and int in C−32,768 to +32,7670 to 65,535
4 bytes32 bitsx86 double word, minimum size of long in C, actual size of int for most modern C compilers, [3] pointer for IA-32-compatible processors−2,147,483,648 to +2,147,483,6470 to 4,294,967,295
8 bytes64 bitsx86 quadruple word, minimum size of long long in C, actual size of long for most modern C compilers, [3] pointer for x86-64-compatible processors−9,223,372,036,854,775,808 to +9,223,372,036,854,775,8070 to 18,446,744,073,709,551,615
unlimited/8unlimited Bignum –2unlimited/2 to +(2unlimited/2 − 1)0 to 2unlimited − 1

Literals for integers can be written as regular Arabic numerals, consisting of a sequence of digits and with negation indicated by a minus sign before the value. However, most programming languages disallow use of commas or spaces for digit grouping. Examples of integer literals are:

There are several alternate methods for writing integer literals in many programming languages:

Floating-point numbers

A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a format equivalent to scientific notation, typically in binary but sometimes in decimal. Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately.

Many languages have both a single precision (often called "float") and a double precision type.

Literals for floating point numbers include a decimal point, and typically use e or E to denote scientific notation. Examples of floating-point literals are:

Some languages (e.g., Fortran, Python, D) also have a complex number type comprising two floating-point numbers: a real part and an imaginary part.

Fixed-point numbers

A fixed-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a scaled-integer form, typically in binary but sometimes in decimal. Because fixed-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Fixed-point numbers also tend to have a more limited range of values than floating point, and so the programmer must be careful to avoid overflow in intermediate calculations as well as the final result.

Booleans

A boolean type, typically denoted "bool" or "boolean", is typically a logical type that can have either the value "true" or the value "false". Although only one bit is necessary to accommodate the value set "true" and "false", programming languages typically implement boolean types as one or more bytes.

Many languages (e.g. Java, Pascal and Ada) implement booleans adhering to the concept of boolean as a distinct logical type. Languages, though, may implicitly convert booleans to numeric types at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, early versions of the C programming language that followed ANSI C and its former standards did not have a dedicated boolean type. Instead, numeric values of zero are interpreted as "false", and any other value is interpreted as "true". [5] The newer C99 added a distinct boolean type that can be included with stdbool.h, [6] and C++ supports bool as a built-in type and "true" and "false" as reserved words. [7]

Characters and strings

A character type (typically called "char") may contain a single letter, digit, punctuation mark, symbol, formatting code, control code, or some other specialized code (e.g., a byte order mark). In C, char is defined as the smallest addressable unit of memory. On most systems, this is 8 bits; Several standards, such as POSIX, require it to be this size. Some languages have two or more character types, for example a single-byte type for ASCII characters and a multi-byte type for Unicode characters. The term "character type" is normally used even for types whose values more precisely represent code units, for example a UTF-16 code unit as in Java (support limited to 16-bit characters only [8] )and JavaScript.

Characters may be combined into strings. The string data can include numbers and other numerical symbols but is treated as text. For example, the mathematical operations that can be performed on a numerical value (e.g. 200) generally cannot be performed on that same value written as a string (e.g. "200").

Strings are implemented in various ways, depending on the programming language. The simplest way to implement strings is to create them as an array of characters, followed by a delimiting character used to signal the end of the string, usually NUL. These are referred to as null-terminated strings, and are usually found in languages with a low amount of hardware abstraction, such as C and Assembly. While easy to implement, null terminated strings have been criticized for causing buffer overflows. Most high-level scripting languages, such as Python, Ruby, and many dialects of BASIC, have no separate character type; strings with a length of one are normally used to represent single characters. Some languages, such as C++ and Java, have the capability to use null-terminated strings (usually for backwards-compatibility measures), but additionally provide their own class for string handling ( std::string and java.lang.String, respectively) in the standard library.

There is also a difference on whether or not strings are mutable or immutable in a language. Mutable strings may be altered after their creation, whereas immutable strings maintain a constant size and content. In the latter, the only way to alter strings are to create new ones. There are both advantages and disadvantages to each approach: although immutable strings are much less flexible, they are simpler and completely thread-safe. Some examples of languages that use mutable strings include C++, Perl and Ruby, whereas languages that do not include JavaScript, Lua, Python and Go. A few languages, such as Objective-C, provide different types for mutable and immutable strings.

Literals for characters and strings are usually surrounded by quotation marks: sometimes, single quotes (') are used for characters and double quotes (") are used for strings. Python accepts either variant for its string notation.

Examples of character literals in C syntax are:

Examples of string literals in C syntax are:

Numeric data type ranges

Each numeric data type has its maximum and minimum value known as the range. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to truncation) depending on the language being used.

The range of a variable is based on the number of bytes used to save the value, and an integer data type is usually able to store 2n values (where n is the number of bits that contribute to the value). For other data types (e.g. floating-point values) the range is more complicated and will vary depending on the method used to store it. There are also some types that do not use entire bytes, e.g. a boolean that requires a single bit, and represents a binary value (although in practice a byte is often used, with the remaining 7 bits being redundant). Some programming languages (such as Ada and Pascal) also allow the opposite direction, that is, the programmer defines the range and precision needed to solve a given problem and the compiler chooses the most appropriate integer or floating-point type automatically.

See also

Related Research Articles

String (computer science) Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

Data type

In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans. A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored. A data type provides a set of values from which an expression may take its values.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. In general, both primitive and compound data types can be converted.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, please see Comparison of the Java and .NET platforms.

A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file.

In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality and inequalities.

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

In computer science, a literal is a notation for representing a fixed value in source code. Almost all programming languages have notations for atomic values such as integers, floating-point numbers, and strings, and usually for booleans and characters; some also have notations for elements of enumerated types and compound values such as arrays, records, and objects. An anonymous function is a literal for the function type.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted. The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

Some programming languages provide a complex data type for complex number storage and arithmetic as a built-in (primitive) data type.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

DG/L is a programming language developed by Data General Corp for the Nova, Eclipse, and Eclipse/MV families of minicomputers in the 1970s and early 1980s.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Scala, Smalltalk, and Swift.

A character literal is a type of literal in programming for the representation of a single character's value within the source code of a computer program.

JSON→URL is a language-independent data interchange format for the JSON data model suitable for use within a URL/URI query string. It is defined by an open specification , though not through a standards body.

References

  1. "Primitive Data Types (The Java™ Tutorials > Learning the Java Language > Language Basics)". docs.oracle.com. Retrieved 2020-05-01.
  2. "Data Types in C". GeeksforGeeks. 2015-06-30. Retrieved 2020-05-01.
  3. 1 2 Fog, Agner (2010-02-16). "Calling conventions for different C++ compilers and operating systems: Chapter 3, Data Representation" (PDF). Retrieved 2010-08-30.
  4. ECMAScript 6th Edition draft: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-literals-numeric-literals Archived 2013-12-16 at the Wayback Machine
  5. Kernighan, Brian W; Ritchie, Dennis M (1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. p.  41. ISBN   0-13-110163-3.
  6. "Boolean type support library". devdocs.io. Retrieved October 15, 2020.
  7. "Bool data type in C++". GeeksforGeeks. Retrieved October 15, 2020.
  8. Mansoor, Umer. "The char Type in Java is Broken". CodeAhoy. Retrieved 10 February 2020.