Record (computer science)

Last updated

In computer science, a record (also called a structure, struct , or compound data) is a basic data structure. Records in a database or spreadsheet are usually called "rows". [1] [2] [3] [4]

Contents

A record is a collection of fields , possibly of different data types, typically in fixed number and sequence. [5] The fields of a record may also be called members, particularly in object-oriented programming; fields may also be called elements, though these risk confusion with the elements of a collection.

For example, a date could be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. A personnel record might contain a name, a salary, and a rank. A Circle record might contain a center and a radius—in this instance, the center itself might be represented as a point record containing x and y coordinates.

Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type.

A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. In type theory, product types (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub. Since type-theoretical records may contain first-class function-typed fields in addition to data, they can express many features of object-oriented programming.

Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency.

The parameters of a function or procedure can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack that is often used to implement procedure calls, each entry is an activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields.

An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object types are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects, and are known as plain old data structures (PODSs), to contrast with objects that use OO features.

A record can be viewed as the computer analog of a mathematical tuple, although a tuple may or may not be considered a record, and vice versa, depending on conventions and the specific programming language. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.

Keys

A record may have zero or more keys. A key is a field or set of fields in the record that serves as an identifier. A unique key is often called the primary key, or simply the record key. For example an employee file might contain employee number, name, department, and salary. The employee number will be unique in the organization and would be the primary key. Depending on the storage medium and file organization the employee number might be indexed —that is also stored in a separate file to make lookup faster. The department code may not be unique; it may also be indexed, in which case it would be considered a secondary key, or alternate key. If it is not indexed the entire employee file would have to be scanned to produce a listing of all employees in a specific department. The salary field would not normally be considered usable as a key. Indexing is one factor considered when designing a file.

History

Journal sheet from 1880 United States Census, showing tabular data with rows of data, each a record corresponding to a single person. 1880 census Edison.gif
Journal sheet from 1880 United States Census, showing tabular data with rows of data, each a record corresponding to a single person.

The concept of record can be traced to various types of tables and ledgers used in accounting since remote times. The modern notion of records in computer science, with fields of well-defined type and size, was already implicit in 19th century mechanical calculators, such as Babbage's Analytical Engine. [6] [7]

Hollerith punched card (1895) Hollerith Punched Card.jpg
Hollerith punched card (1895)

The original machine-readable medium used for data (as opposed to control) was punch card used for records in the 1890 United States Census: each punch card was a single record. Compare the journal entry from 1880 and the punch card from 1895. Records were well established in the first half of the 20th century, when most data processing was done using punched cards. Typically each record of a data file would be recorded in one punched card, with specific columns assigned to specific fields. Generally, a record was the smallest unit that could be read in from external storage (e.g. card reader, tape or disk).

Most machine language implementations and early assembly languages did not have special syntax for records, but the concept was available (and extensively used) through the use of index registers, indirect addressing, and self-modifying code. Some early computers, such as the IBM 1620, had hardware support for delimiting records and fields, and special instructions for copying such records.

The concept of records and fields was central in some early file sorting and tabulating utilities, such as IBM's Report Program Generator (RPG).

COBOL was the first widespread programming language to support record types, [8] and its record definition facilities were quite sophisticated at the time. The language allows for the definition of nested records with alphanumeric, integer, and fractional fields of arbitrary size and precision, as well as fields that automatically format any value assigned to them (e.g., insertion of currency signs, decimal points, and digit group separators). Each file is associated with a record variable where data is read into or written from. COBOL also provides a MOVECORRESPONDING statement that assigns corresponding fields of two records according to their names.

The early languages developed for numeric computing, such as FORTRAN (up to FORTRAN IV) and Algol 60, did not have support for record types; but later versions of those languages, such as Fortran 77 and Algol 68 did add them. The original Lisp programming language too was lacking records (except for the built-in cons cell), but its S-expressions provided an adequate surrogate. The Pascal programming language was one of the first languages to fully integrate record types with other basic types into a logically consistent type system. The PL/I programming language provided for COBOL-style records. The C programming language initially provided the record concept as a kind of template ( struct ) that could be laid on top of a memory area, rather than a true record data type. The latter were provided eventually (by the typedef declaration), but the two concepts are still distinct in the language. Most languages designed after Pascal (such as Ada, Modula, and Java) also supported records.

Operations

The selection of a field from a record value yields a value.

Some languages may provide facilities that enumerate all fields of a record, or at least the fields that are references. This facility is needed to implement certain services such as debuggers, garbage collectors, and serialization. It requires some degree of type polymorphism.

In systems with record subtyping, operations on values of record type may also include:

In such settings, a specific record type implies that a specific set of fields are present, but values of that type may contain additional fields. A record with fields x, y, and z would thus belong to the type of records with fields x and y, as would a record with fields x, y, and r. The rationale is that passing an (x,y,z) record to a function that expects an (x,y) record as argument should work, since that function will find all the fields it requires within the record. Many ways of practically implementing records in programming languages would have trouble with allowing such variability, but the matter is a central characteristic of record types in more theoretical contexts.

Assignment and comparison

Most languages allow assignment between records that have exactly the same record type (including same field types and names, in the same order). Depending on the language, however, two record data types defined separately may be regarded as distinct types even if they have exactly the same fields.

Some languages may also allow assignment between records whose fields have different names, matching each field value with the corresponding field variable by their positions within the record; so that, for example, a complex number with fields called real and imag can be assigned to a 2D point record variable with fields X and Y. In this alternative, the two operands are still required to have the same sequence of field types. Some languages may also require that corresponding types have the same size and encoding as well, so that the whole record can be assigned as an uninterpreted bit string. Other languages may be more flexible in this regard, and require only that each value field can be legally assigned to the corresponding variable field; so that, for example, a short integer field can be assigned to a long integer field, or vice versa.

Other languages (such as COBOL) may match fields and values by their names, rather than positions.

These same possibilities apply to the comparison of two record values for equality. Some languages may also allow order comparisons ('<'and '>'), using the lexicographic order based on the comparison of individual fields.[ citation needed ]

PL/I allows both of the preceding types of assignment, and also allows structure expressions, such as a = a+1; where "a" is a record, or structure in PL/I terminology.

Algol 68's distributive field selection

In Algol 68, if Pts was an array of records, each with integer fields X and Y, one could write Y of Pts to obtain an array of integers, consisting of the Y fields of all the elements of Pts. As a result, the statements Y of Pts[3] := 7 and (Y of Pts)[3] := 7 would have the same effect.

Pascal's "with" statement

In the Pascal programming language, the command with R do S would execute the command sequence S as if all the fields of record R had been declared as variables. So, instead of writing Pt.X := 5; Pt.Y := Pt.X + 3 one could write with Pt do begin X := 5; Y := X + 3 end.

Representation in memory

The representation of records in memory varies depending on the programming languages. Usually the fields are stored in consecutive positions in memory, in the same order as they are declared in the record type. This may result in two or more fields stored into the same word of memory; indeed, this feature is often used in systems programming to access specific bits of a word. On the other hand, most compilers will add padding fields, mostly invisible to the programmer, in order to comply with alignment constraints imposed by the machine—say, that a floating point field must occupy a single word.

Some languages may implement a record as an array of addresses pointing to the fields (and, possibly, to their names and/or types). Objects in object-oriented languages are often implemented in rather complicated ways, especially in languages that allow multiple class inheritance.

Self-defining records

A self-defining record is a type of record which contains information to identify the record type and to locate information within the record. It may contain the offsets of elements; the elements can therefore be stored in any order or may be omitted. [9] Alternatively, various elements of the record, each including an element identifier, can simply follow one another in any order.

Examples

The following show examples of record definitions:

  modedate = struct (int year, int month, int day);

See also

Related Research Articles

C (programming language) general-purpose programming language

C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system. By design, C provides constructs that map efficiently to typical machine instructions. It has found lasting use in applications previously coded in assembly language. Such applications include operating systems and various application software for computers architectures that range from supercomputers to PLCs and embedded systems.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer science, a composite data type or compound data type is any data type which can be constructed in a program using the programming language's primitive data types and other composite types. It is sometimes called a structure or aggregate data type, although the latter term may also refer to arrays, lists, etc. The act of constructing a composite type is known as composition. Composite data types are often contrasted with scalar variables.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leafs. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

The syntax of the C programming language is the set of rules governing writing of software in the language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

Pointer (computer programming) programming language data type

In computer science, a pointer is a programming language object that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record, which could be defined to contain a float and an integer; in a union, there is only one value at any given time.

ALGOL 68 programming language

ALGOL 68 is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously defined syntax and semantics.

A struct in the C programming language is a composite data type declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address. The struct data type can contain other data types so is used for mixed-data-type records such as a hard-drive directory entry, or other mixed-type records.

In computer science, object composition is a way to combine objects or data types into more complex ones. Compositions relate to, but are not the same as, data structures, and common ones are the tagged union, set, sequence, and various graph structures, as well as the object used in object-oriented programming.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

In computer programming, a forward declaration is a declaration of an identifier for which the programmer has not yet given a complete definition.

A class in C++ is a user-defined type or data structure declared with keyword class that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class is private. The private members are not accessible outside the class; they can be accessed only through methods of the class. The public members form an interface to the class and are accessible outside the class.

C++11 is a version of the standard for the programming language C++. It was approved by International Organization for Standardization (ISO) on 12 August 2011, replacing C++03, superseded by C++14 on 18 August 2014 and later, by C++17. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computer science, type punning is a common term for any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

In computer programming, a variable-length array (VLA), also called variable-sized, runtime-sized, is an array data structure whose length is determined at run time . In C, the VLA is said to have a variably modified type that depends on a value.

Comparison of programming languages is a common topic of discussion among software engineers. Basic instructions of several programming languages are compared here.

In computer programming, initialization is the assignment of an initial value for a data object or variable. The manner in which initialization is performed depends on programming language, as well as type, storage class, etc., of an object to be initialized. Programming constructs which perform initialization are typically called initializers and initializer lists. Initialization is distinct from declaration, although the two can sometimes be conflated in practice. The complement of initialization is finalization, which is primarily used for objects, but not variables.

References

  1. "Computer Science Dictionary Definitions". Computing Students. Retrieved Jan 22, 2018.
  2. Radványi, Tibor (2014). Database Management Systems. Eszterházy Károly College. p. 19. Retrieved 23 September 2018.
  3. Kahate, Atul (2006). Introduction to Database Management Systems. Pearson. p. 3. ISBN   978-81-317-0078-5 . Retrieved 23 September 2018.
  4. Connolly, Thomas (2004). Database Solutions: A Step by Step Guide to Building Databases (2nd ed.). Pearson. p.  7. ISBN   978-0-321-17350-8.
  5. Felleisen, Matthias (2001). How To Design Programs . MIT Press. pp.  53, 60. ISBN   978-0262062183.
  6. Bromley, Allan (October 1998). "Charles Babbage's Analytical Engine, 1838". IEEE Annals of the History of Computing. 20 (4): 29–45. doi:10.1109/85.728228 . Retrieved 23 September 2018.
  7. Swade, Doron. "Automatic Computation: Charles Babbage and Computational Method". The Rutherford Journal. The Rutherford Journal. Retrieved 23 September 2018.
  8. Sebesta, Robert W. Concepts of Programming Languages (Third ed.). Addison-Wesley Publishing Company, Inc. p.  218. ISBN   0-8053-7133-8.
  9. Kraimer, Martin R. "EPICS Input / Output Controller (IOC) Application Developer's Guide". Argonne National Laboratory. Retrieved November 25, 2015.