Linked data structure

Last updated

In computer science, a linked data structure is a data structure which consists of a set of data records ( nodes ) linked together and organized by references (links or pointers ). The link between data can also be called a connector.

Contents

In linked data structures, the links are usually treated as special data types that can only be dereferenced or compared for equality. Linked data structures are thus contrasted with arrays and other data structures that require performing arithmetic operations on pointers. This distinction holds even when the nodes are actually implemented as elements of a single array, and the references are actually array indices: as long as no arithmetic is done on those indices, the data structure is essentially a linked one.

Linking can be done in two ways  using dynamic allocation and using array index linking.

Linked data structures include linked lists, search trees, expression trees, and many other widely used data structures. They are also key building blocks for many efficient algorithms, such as topological sort [1] and set union-find. [2]

Common types of linked data structures

Linked lists

A linked list is a collection of structures ordered not by their physical placement in memory but by logical links that are stored as part of the data in the structure itself. It is not necessary that it should be stored in the adjacent memory locations. Every structure has a data field and an address field. The Address field contains the address of its successor.

Linked list can be singly, doubly or multiply linked and can either be linear or circular.

Basic properties
Singly-linked-list.svg
A linked list with three nodes contain two fields each: an integer value and a link to the next node
A linked list with a single node. Single node1.svg
A linked list with a single node.

Example in Java

This is an example of the node class used to store integers in a Java implementation of a linked list:

publicclassIntNode{publicintvalue;publicIntNodelink;publicIntNode(intv){value=v;}}

Example in C

This is an example of the structure used for implementation of linked list in C:

structnode{intval;structnode*next;};

This is an example using typedefs:

typedefstructnodenode;structnode{intval;node*next;};

Note: A structure like this which contains a member that points to the same structure is called a self-referential structure.

Example in C++

This is an example of the node class structure used for implementation of linked list in C++:

classNode{intval;Node*next;};

Search trees

A search tree is a tree data structure in whose nodes data values can be stored from some ordered set, which is such that in an in-order traversal of the tree the nodes are visited in ascending order of the stored values.

Basic properties

Advantages and disadvantages

Linked list versus arrays

Compared to arrays, linked data structures allow more flexibility in organizing the data and in allocating space for it. In arrays, the size of the array must be specified precisely at the beginning, which can be a potential waste of memory, or an arbitrary limitation which would later hinder functionality in some way. A linked data structure is built dynamically and never needs to be bigger than the program requires. It also requires no guessing at creation time, in terms of how much space must be allocated. This is a feature that is key in avoiding wastes of memory.

In an array, the array elements have to be in a contiguous (connected and sequential) portion of memory. But in a linked data structure, the reference to each node gives users the information needed to find the next one. The nodes of a linked data structure can also be moved individually to different locations within physical memory without affecting the logical connections between them, unlike arrays. With due care, a certain process or thread can add or delete nodes in one part of a data structure even while other processes or threads are working on other parts.

On the other hand, access to any particular node in a linked data structure requires following a chain of references that are stored in each node. If the structure has n nodes, and each node contains at most b links, there will be some nodes that cannot be reached in less than logbn steps, slowing down the process of accessing these nodes - this sometimes represents a considerable slowdown, especially in the case of structures containing large numbers of nodes. For many structures, some nodes may require worst case up to n1 steps. In contrast, many array data structures allow access to any element with a constant number of operations, independent of the number of entries.

Broadly the implementation of these linked data structure is through dynamic data structures. It gives us the chance to use particular space again. Memory can be utilized more efficiently by using these data structures. Memory is allocated as per the need and when memory is not further needed, deallocation is done.

General disadvantages

Linked data structures may also incur in substantial memory allocation overhead (if nodes are allocated individually) and frustrate memory paging and processor caching algorithms (since they generally have poor locality of reference). In some cases, linked data structures may also use more memory (for the link fields) than competing array structures. This is because linked data structures are not contiguous. Instances of data can be found all over in memory, unlike arrays.

In arrays, nth element can be accessed immediately, while in a linked data structure we have to follow multiple pointers so element access time varies according to where in the structure the element is.

In some theoretical models of computation that enforce the constraints of linked structures, such as the pointer machine, many problems require more steps than in the unconstrained random-access machine model.

See also

Related Research Articles

In computer science, an array is a data structure consisting of a collection of elements, of same memory size, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Data structure</span> Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

A struct in the C programming language is a composite data type declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address. The struct data type can contain other data types so is used for mixed-data-type records such as a hard-drive directory entry, or other mixed-type records.

<span class="mw-page-title-main">Unrolled linked list</span>

In computer programming, an unrolled linked list is a variation on the linked list which stores multiple elements in each node. It can dramatically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references. It is related to the B-tree.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

In computer science, pointer swizzling is the conversion of references based on name or position into direct pointer references. It is typically performed during deserialization or loading of a relocatable object from a disk file, such as an executable file or pointer-based data structure.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

In computer programming, a sentinel value is a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

The Perl virtual machine is a stack-based process virtual machine implemented as an opcodes interpreter which runs previously compiled programs written in the Perl language. The opcodes interpreter is a part of the Perl interpreter, which also contains a compiler in one executable file, commonly /usr/bin/perl on various Unix-like systems or perl.exe on Microsoft Windows systems.

In computing, sequence containers refer to a group of container class templates in the standard library of the C++ programming language that implement storage of data elements. Being templates, they can be used to store arbitrary elements, such as integers or custom classes. One common property of all sequential containers is that the elements can be accessed sequentially. Like all other standard library components, they reside in namespace std.

References

  1. Donald Knuth, The Art of Computer Programming
  2. Bernard A. Galler and Michael J. Fischer. An improved equivalence algorithm. Communications of the ACM, Volume 7, Issue 5 (May 1964), pages 301–303. The paper originating disjoint-set forests. ACM Digital Library
  3. http://www.cs.toronto.edu/~hojjat/148s07/lectures/week5/07linked.pdf [ bare URL PDF ]