Rope (data structure)

Last updated November 15, 2024

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate longer strings or entire texts. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.^[1]

Description

A rope is a type of binary tree where each leaf (end node) holds a string of manageable size and length (also known as a weight), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree. A node with two children thus divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part of the string, and a node's weight is the length of the first part.

For rope operations, the strings stored in nodes are assumed to be constant immutable objects in the typical nondestructive case, allowing for some copy-on-write behavior. Leaf nodes are usually implemented as basic fixed-length strings with a reference count attached for deallocation when no longer needed, although other garbage collection methods can be used as well.

Operations

In the following definitions, N is the length of the rope, that is, the weight of the root node.

Collect leaves

Definition: Create a stack S and a list L. Traverse down the left-most spine of the tree until you reach a leaf l', adding each node n to S. Add l' to L. The parent of l' (p) is at the top of the stack. Repeat the procedure for p's right subtree.

finalclassInOrderRopeIteratorimplementsIterator<RopeLike>{privatefinalDeque<RopeLike>stack;InOrderRopeIterator(@NonNullRopeLikeroot){stack=newArrayDeque<>();varc=root;while(c!=null){stack.push(c);c=c.getLeft();}}@OverridepublicbooleanhasNext(){returnstack.size()>0;}@OverridepublicRopeLikenext(){valresult=stack.pop();if(!stack.isEmpty()){varparent=stack.pop();varright=parent.getRight();if(right!=null){stack.push(right);varcleft=right.getLeft();while(cleft!=null){stack.push(cleft);cleft=cleft.getLeft();}}}returnresult;}}

Rebalance

Definition: Collect the set of leaves L and rebuild the tree from the bottom-up.

staticbooleanisBalanced(RopeLiker){valdepth=r.depth();if(depth>=FIBONACCI_SEQUENCE.length-2){returnfalse;}returnFIBONACCI_SEQUENCE[depth+2]<=r.weight();}staticRopeLikerebalance(RopeLiker){if(!isBalanced(r)){valleaves=Ropes.collectLeaves(r);returnmerge(leaves,0,leaves.size());}returnr;}staticRopeLikemerge(List<RopeLike>leaves){returnmerge(leaves,0,leaves.size());}staticRopeLikemerge(List<RopeLike>leaves,intstart,intend){intrange=end-start;if(range==1){returnleaves.get(start);}if(range==2){returnnewRopeLikeTree(leaves.get(start),leaves.get(start+1));}intmid=start+(range/2);returnnewRopeLikeTree(merge(leaves,start,mid),merge(leaves,mid,end));}

Insert

Definition:Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string

C 1, ..., C i, S', C i + 1, ..., C m

.

Time complexity:⁠

O(\log N)

⁠.

This operation can be done by a Split() and two Concat() operations. The cost is the sum of the three.

publicRopeinsert(intidx,CharSequencesequence){if(idx==0){returnprepend(sequence);}if(idx==length()){returnappend(sequence);}vallhs=base.split(idx);returnnewRope(Ropes.concat(lhs.fst.append(sequence),lhs.snd));}

Index

Definition:Index(i): return the character at position i

Time complexity:⁠

O(\log N)

⁠

To retrieve the i-th character, we begin a recursive search from the root node:

@OverridepublicintindexOf(charch,intstartIndex){if(startIndex>weight){returnright.indexOf(ch,startIndex-weight);}returnleft.indexOf(ch,startIndex);}

For example, to find the character at i=10 in Figure 2.1 shown on the right, start at the root node (A), find that 22 is greater than 10 and there is a left child, so go to the left child (B). 9 is less than 10, so subtract 9 from 10 (leaving i=1) and go to the right child (D). Then because 6 is greater than 1 and there's a left child, go to the left child (G). 2 is greater than 1 and there's a left child, so go to the left child again (J). Finally 2 is greater than 1 but there is no left child, so the character at index 1 of the short string "na" (ie "n") is the answer. (1-based index)

Concat

Definition:Concat(S1, S2): concatenate two ropes, S₁ and S₂, into a single rope.

Time complexity:⁠

O(1)

⁠ (or ⁠

O(\log N)

⁠ time to compute the root weight)

A concatenation can be performed simply by creating a new root node with left = S1 and right = S2, which is constant time. The weight of the parent node is set to the length of the left child S₁, which would take ⁠ $O(\log N)$ ⁠ time, if the tree is balanced.

As most rope operations require balanced trees, the tree may need to be re-balanced after concatenation.

Split

Definition:Split (i, S): split the string S into two new strings S₁ and S₂,

S 1 = C 1, ..., C i

and

S 2 = C i + 1, ..., C m

.

Time complexity:⁠

O(\log N)

⁠

There are two cases that must be dealt with:

The split point is at the end of a string (i.e. after the last character of a leaf node)
The split point is in the middle of a string.

The second case reduces to the first by splitting the string at the split point to create two new leaf nodes, then creating a new node that is the parent of the two component strings.

For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node K at the bottom level. Remove the link between K and G. Go to the parent of G and subtract the weight of K from the weight of D. Travel up the tree and remove any right links to subtrees covering characters past position 11, subtracting the weight of K from their parent nodes (only node D and A, in this case). Finally, build up the newly orphaned nodes K and H by concatenating them together and creating a new parent P with weight equal to the length of the left node K.

As most rope operations require balanced trees, the tree may need to be re-balanced after splitting.

publicPair<RopeLike,RopeLike>split(intindex){if(index<weight){valsplit=left.split(index);returnPair.of(rebalance(split.fst),rebalance(newRopeLikeTree(split.snd,right)));}elseif(index>weight){valsplit=right.split(index-weight);returnPair.of(rebalance(newRopeLikeTree(left,split.fst)),rebalance(split.snd));}else{returnPair.of(left,right);}}

Delete

Definition:Delete(i, j): delete the substring

C i, \dots, C i + j - 1

, from s to form a new string

C 1, \dots, C i - 1, C i + j, \dots, C m

.

Time complexity:⁠

O(\log N)

⁠.

This operation can be done by two Split() and one Concat() operation. First, split the rope in three, divided by i-th and i+j-th character respectively, which extracts the string to delete in a separate node. Then concatenate the other two nodes.

@OverridepublicRopeLikedelete(intstart,intlength){vallhs=split(start);valrhs=split(start+length);returnrebalance(newRopeLikeTree(lhs.fst,rhs.snd));}

Report

Definition:Report(i, j): output the string

C i, \dots, C i + j - 1

.

Time complexity:⁠

O(j+\log N)

⁠

To report the string $C i, \dots, C i + j - 1$ , find the node u that contains C_i and weight(u) >= j, and then traverse T starting at node u. Output $C i, \dots, C i + j - 1$ by doing an in-order traversal of T starting at node u.

Comparison with monolithic arrays

Complexity^{[ citation needed ]}
Operation	Rope	String
Index^[1]	O(log n)	O(1)
Split^[1]	O(log n)	O(1)
Concatenate	O(1) amortized, O(log n) worst case^{[ citation needed ]}	O(n)
Iterate over each character^[1]	O(n)	O(n)
Insert^[2]^{[ failed verification ]}	O(log n)	O(n)
Append^[2]^{[ failed verification ]}	O(1) amortized, O(log n) worst case	O(1) amortized, O(n) worst case
Delete	O(log n)	O(n)
Report	O(j + log n)	O(j)
Build	O(n)	O(n)

Advantages:

Ropes enable much faster insertion and deletion of text than monolithic string arrays, on which operations have time complexity O(n).
Ropes do not require O(n) extra memory when operated upon (arrays need that for copying operations).
Ropes do not require large contiguous memory spaces.
If only nondestructive versions of operations are used, rope is a persistent data structure. For the text editing program example, this leads to an easy support for multiple undo levels.

Disadvantages:

Greater overall space use when not being operated on, mainly to store parent nodes. There is a trade-off between how much of the total memory is such overhead and how long pieces of data are being processed as strings. The strings in example figures above are unrealistically short for modern architectures. The overhead is always O(n), but the constant can be made arbitrarily small.
Increase in time to manage the extra storage
Increased complexity of source code; greater risk of bugs

This table compares the algorithmic traits of string and rope implementations, not their raw speed. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). In summary, ropes are preferable when the data is large and modified often.

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take $O(log n)$ time in both the average and worst cases, where $is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.$

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with $k = 2$ . A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and $amortized insertion and deletion time.$

An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is an important case where m = 2; similarly, a ternary tree is one where m = 3.

In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the B^ε tree. The B^ε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the B^ε tree has degree B^ε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

References

1 2 3 4 5 Boehm, Hans-J; Atkinson, Russ; Plass, Michael (December 1995). "Ropes: an Alternative to Strings" (PDF). Software: Practice and Experience. 25 (12). New York, NY, USA: John Wiley & Sons, Inc.: 1315–1330. doi:10.1002/spe.4380251203. Archived from the original on 2020-03-08.
1 2 "Rope Implementation Overview". www.sgi.com. Archived from the original on 2017-12-19. Retrieved 2017-03-01.

External links

"absl::Cord" implementation of ropes within The Abseil library
"C cords" implementation of ropes within the Boehm Garbage Collector library
SGI C++ specification for ropes (supported by STLPort and libstdc++)
Ropes for C#
ropes for Common Lisp
Ropes for Java
String-Like Ropes for Java
Ropes for JavaScript
Ropes for Limbo
ropes for Nim
Ropes for OCaml
pyropes for Python
Ropes for Smalltalk
SwiftRope for Swift
"Ropey" for Rust

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Boehm-1] 1 2 3 4 5 Boehm, Hans-J; Atkinson, Russ; Plass, Michael (December 1995). "Ropes: an Alternative to Strings" (PDF). Software: Practice and Experience. 25 (12). New York, NY, USA: John Wiley & Sons, Inc.: 1315–1330. doi:10.1002/spe.4380251203. Archived from the original on 2020-03-08.

[:0-2] 1 2 "Rope Implementation Overview". www.sgi.com. Archived from the original on 2017-12-19. Retrieved 2017-03-01.

[1]

[2]

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Suffix array Suffix automaton Suffix tree Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations

Rope (data structure)

Contents

Description

Operations

Collect leaves

Rebalance

Insert

Index

Concat

Split

Delete

Report

Comparison with monolithic arrays

See also

Related Research Articles

References

External links