Rope (data structure)

Last updated
A simple rope built on the string of "Hello_my_name_is_Simon". Vector Rope example.svg
A simple rope built on the string of "Hello_my_name_is_Simon".

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently. [1]

Contents

Description

A rope is a type of binary tree where each leaf (end node) holds a string and a length (also known as a "weight"), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree. A node with two children thus divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part of the string, and a node's weight is the length of the first part.

For rope operations, the strings stored in nodes are assumed to be constant immutable objects in the typical nondestructive case, allowing for some copy-on-write behavior. Leaf nodes are usually implemented as basic fixed-length strings with a reference count attached for deallocation when no longer needed, although other garbage collection methods can be used as well.

Operations

In the following definitions, N is the length of the rope, that is, the weight of the root node.

Collect leaves

Definition: Create a stack S and a list L. Traverse down the left-most spine of the tree until you reach a leaf l', adding each node n to S. Add l' to L. The parent of l' (p) is at the top of the stack. Repeat the procedure for p's right subtree.
finalclassInOrderRopeIteratorimplementsIterator<RopeLike>{privatefinalDeque<RopeLike>stack;InOrderRopeIterator(@NonNullRopeLikeroot){stack=newArrayDeque<>();varc=root;while(c!=null){stack.push(c);c=c.getLeft();}}@OverridepublicbooleanhasNext(){returnstack.size()>0;}@OverridepublicRopeLikenext(){valresult=stack.pop();if(!stack.isEmpty()){valparent=stack.pop();valright=parent.getRight();if(right!=null){stack.push(right);varcleft=right.getLeft();while(cleft!=null){stack.push(cleft);cleft=cleft.getLeft();}}}returnresult;}}

Rebalance

Definition: Collect the set of leaves L and rebuild the tree from the bottom-up.
staticbooleanisBalanced(RopeLiker){valdepth=r.depth();if(depth>=FIBONACCI_SEQUENCE.length-2){returnfalse;}returnFIBONACCI_SEQUENCE[depth+2]<=r.weight();}staticRopeLikerebalance(RopeLiker){if(!isBalanced(r)){valleaves=Ropes.collectLeaves(r);returnmerge(leaves,0,leaves.size());}returnr;}staticRopeLikemerge(List<RopeLike>leaves){returnmerge(leaves,0,leaves.size());}staticRopeLikemerge(List<RopeLike>leaves,intstart,intend){intrange=end-start;if(range==1){returnleaves.get(start);}if(range==2){returnnewRopeLikeTree(leaves.get(start),leaves.get(start+1));}intmid=start+(range/2);returnnewRopeLikeTree(merge(leaves,start,mid),merge(leaves,mid,end));}

Insert

Definition:Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string C1, ..., Ci, S', Ci + 1, ..., Cm.
Time complexity:.

This operation can be done by a Split() and two Concat() operations. The cost is the sum of the three.

publicRopeinsert(intidx,CharSequencesequence){if(idx==0){returnprepend(sequence);}if(idx==length()){returnappend(sequence);}vallhs=base.split(idx);returnnewRope(Ropes.concat(lhs.fst.append(sequence),lhs.snd));}

Index

Figure 2.1: Example of index lookup on a rope. Vector Rope index.svg
Figure 2.1: Example of index lookup on a rope.
Definition:Index(i): return the character at position i
Time complexity:

To retrieve the i-th character, we begin a recursive search from the root node:

@OverridepublicintindexOf(charch,intstartIndex){if(startIndex>weight){returnright.indexOf(ch,startIndex-weight);}returnleft.indexOf(ch,startIndex);}

For example, to find the character at i=10 in Figure 2.1 shown on the right, start at the root node (A), find that 22 is greater than 10 and there is a left child, so go to the left child (B). 9 is less than 10, so subtract 9 from 10 (leaving i=1) and go to the right child (D). Then because 6 is greater than 1 and there's a left child, go to the left child (G). 2 is greater than 1 and there's a left child, so go to the left child again (J). Finally 2 is greater than 1 but there is no left child, so the character at index 1 of the short string "na" (ie "n") is the answer. (1-based index)

Concat

Figure 2.2: Concatenating two child ropes into a single rope. Vector Rope concat.svg
Figure 2.2: Concatenating two child ropes into a single rope.
Definition:Concat(S1, S2): concatenate two ropes, S1 and S2, into a single rope.
Time complexity: (or time to compute the root weight)

A concatenation can be performed simply by creating a new root node with left = S1 and right = S2, which is constant time. The weight of the parent node is set to the length of the left child S1, which would take time, if the tree is balanced.

As most rope operations require balanced trees, the tree may need to be re-balanced after concatenation.

Split

Figure 2.3: Splitting a rope in half. Vector Rope split.svg
Figure 2.3: Splitting a rope in half.
Definition:Split (i, S): split the string S into two new strings S1 and S2, S1 = C1, ..., Ci and S2 = Ci + 1, ..., Cm.
Time complexity:

There are two cases that must be dealt with:

  1. The split point is at the end of a string (i.e. after the last character of a leaf node)
  2. The split point is in the middle of a string.

The second case reduces to the first by splitting the string at the split point to create two new leaf nodes, then creating a new node that is the parent of the two component strings.

For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node K at the bottom level. Remove the link between K and G. Go to the parent of G and subtract the weight of K from the weight of D. Travel up the tree and remove any right links to subtrees covering characters past position 11, subtracting the weight of K from their parent nodes (only node D and A, in this case). Finally, build up the newly orphaned nodes K and H by concatenating them together and creating a new parent P with weight equal to the length of the left node K.

As most rope operations require balanced trees, the tree may need to be re-balanced after splitting.

publicPair<RopeLike,RopeLike>split(intindex){if(index<weight){valsplit=left.split(index);returnPair.of(rebalance(split.fst),rebalance(newRopeLikeTree(split.snd,right)));}elseif(index>weight){valsplit=right.split(index-weight);returnPair.of(rebalance(newRopeLikeTree(left,split.fst)),rebalance(split.snd));}else{returnPair.of(left,right);}}

Delete

Definition:Delete(i, j): delete the substring Ci, …, Ci + j − 1, from s to form a new string C1, …, Ci − 1, Ci + j, …, Cm.
Time complexity:.

This operation can be done by two Split() and one Concat() operation. First, split the rope in three, divided by i-th and i+j-th character respectively, which extracts the string to delete in a separate node. Then concatenate the other two nodes.

@OverridepublicRopeLikedelete(intstart,intlength){vallhs=split(start);valrhs=split(start+length);returnrebalance(newRopeLikeTree(lhs.fst,rhs.snd));}

Report

Definition:Report(i, j): output the string Ci, …, Ci + j − 1.
Time complexity:

To report the string Ci, …, Ci + j − 1, find the node u that contains Ci and weight(u) >= j, and then traverse T starting at node u. Output Ci, …, Ci + j − 1 by doing an in-order traversal of T starting at node u.

Comparison with monolithic arrays

Complexity[ citation needed ]
OperationRopeString
Index [1] O(log n)O(1)
Split [1] O(log n)O(1)
ConcatenateO(1) amortized, O(log n) worst case[ citation needed ]O(n)
Iterate over each character [1] O(n)O(n)
Insert [2] [ failed verification ]O(log n)O(n)
Append [2] [ failed verification ]O(1) amortized, O(log n) worst caseO(1) amortized, O(n) worst case
DeleteO(log n)O(n)
ReportO(j + log n)O(j)
BuildO(n)O(n)

Advantages:

Disadvantages:

This table compares the algorithmic traits of string and rope implementations, not their raw speed. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). In summary, ropes are preferable when the data is large and modified often.

See also

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

In computer science, a ternary search tree is a type of trie where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and auto-completion.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

References

  1. 1 2 3 4 5 Boehm, Hans-J; Atkinson, Russ; Plass, Michael (December 1995). "Ropes: an Alternative to Strings" (PDF). Software: Practice and Experience. New York, NY, USA: John Wiley & Sons, Inc. 25 (12): 1315–1330. doi:10.1002/spe.4380251203. Archived from the original on 2020-03-08.
  2. 1 2 "Rope Implementation Overview". www.sgi.com. Archived from the original on 2017-12-19. Retrieved 2017-03-01.