In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate longer strings or entire texts. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently. [1]
A rope is a type of binary tree where each leaf (end node) holds a string of manageable size and length (also known as a weight), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree. A node with two children thus divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part of the string, and a node's weight is the length of the first part.
For rope operations, the strings stored in nodes are assumed to be constant immutable objects in the typical nondestructive case, allowing for some copy-on-write behavior. Leaf nodes are usually implemented as basic fixed-length strings with a reference count attached for deallocation when no longer needed, although other garbage collection methods can be used as well.
In the following definitions, N is the length of the rope, that is, the weight of the root node.
finalclassInOrderRopeIteratorimplementsIterator<RopeLike>{privatefinalDeque<RopeLike>stack;InOrderRopeIterator(@NonNullRopeLikeroot){stack=newArrayDeque<>();varc=root;while(c!=null){stack.push(c);c=c.getLeft();}}@OverridepublicbooleanhasNext(){returnstack.size()>0;}@OverridepublicRopeLikenext(){valresult=stack.pop();if(!stack.isEmpty()){varparent=stack.pop();varright=parent.getRight();if(right!=null){stack.push(right);varcleft=right.getLeft();while(cleft!=null){stack.push(cleft);cleft=cleft.getLeft();}}}returnresult;}}
staticbooleanisBalanced(RopeLiker){valdepth=r.depth();if(depth>=FIBONACCI_SEQUENCE.length-2){returnfalse;}returnFIBONACCI_SEQUENCE[depth+2]<=r.weight();}staticRopeLikerebalance(RopeLiker){if(!isBalanced(r)){valleaves=Ropes.collectLeaves(r);returnmerge(leaves,0,leaves.size());}returnr;}staticRopeLikemerge(List<RopeLike>leaves){returnmerge(leaves,0,leaves.size());}staticRopeLikemerge(List<RopeLike>leaves,intstart,intend){intrange=end-start;if(range==1){returnleaves.get(start);}if(range==2){returnnewRopeLikeTree(leaves.get(start),leaves.get(start+1));}intmid=start+(range/2);returnnewRopeLikeTree(merge(leaves,start,mid),merge(leaves,mid,end));}
Insert(i, S’)
: insert the string S’ beginning at position i in the string s, to form a new string C1, ..., Ci, S', Ci + 1, ..., Cm.This operation can be done by a Split()
and two Concat()
operations. The cost is the sum of the three.
publicRopeinsert(intidx,CharSequencesequence){if(idx==0){returnprepend(sequence);}if(idx==length()){returnappend(sequence);}vallhs=base.split(idx);returnnewRope(Ropes.concat(lhs.fst.append(sequence),lhs.snd));}
Index(i)
: return the character at position iTo retrieve the i-th character, we begin a recursive search from the root node:
@OverridepublicintindexOf(charch,intstartIndex){if(startIndex>weight){returnright.indexOf(ch,startIndex-weight);}returnleft.indexOf(ch,startIndex);}
For example, to find the character at i=10
in Figure 2.1 shown on the right, start at the root node (A), find that 22 is greater than 10 and there is a left child, so go to the left child (B). 9 is less than 10, so subtract 9 from 10 (leaving i=1
) and go to the right child (D). Then because 6 is greater than 1 and there's a left child, go to the left child (G). 2 is greater than 1 and there's a left child, so go to the left child again (J). Finally 2 is greater than 1 but there is no left child, so the character at index 1 of the short string "na" (ie "n") is the answer. (1-based index)
Concat(S1, S2)
: concatenate two ropes, S1 and S2, into a single rope.A concatenation can be performed simply by creating a new root node with left = S1 and right = S2, which is constant time. The weight of the parent node is set to the length of the left child S1, which would take time, if the tree is balanced.
As most rope operations require balanced trees, the tree may need to be re-balanced after concatenation.
Split (i, S)
: split the string S into two new strings S1 and S2, S1 = C1, ..., Ci and S2 = Ci + 1, ..., Cm.There are two cases that must be dealt with:
The second case reduces to the first by splitting the string at the split point to create two new leaf nodes, then creating a new node that is the parent of the two component strings.
For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node K at the bottom level. Remove the link between K and G. Go to the parent of G and subtract the weight of K from the weight of D. Travel up the tree and remove any right links to subtrees covering characters past position 11, subtracting the weight of K from their parent nodes (only node D and A, in this case). Finally, build up the newly orphaned nodes K and H by concatenating them together and creating a new parent P with weight equal to the length of the left node K.
As most rope operations require balanced trees, the tree may need to be re-balanced after splitting.
publicPair<RopeLike,RopeLike>split(intindex){if(index<weight){valsplit=left.split(index);returnPair.of(rebalance(split.fst),rebalance(newRopeLikeTree(split.snd,right)));}elseif(index>weight){valsplit=right.split(index-weight);returnPair.of(rebalance(newRopeLikeTree(left,split.fst)),rebalance(split.snd));}else{returnPair.of(left,right);}}
Delete(i, j)
: delete the substring Ci, …, Ci + j − 1, from s to form a new string C1, …, Ci − 1, Ci + j, …, Cm.This operation can be done by two Split()
and one Concat()
operation. First, split the rope in three, divided by i-th and i+j-th character respectively, which extracts the string to delete in a separate node. Then concatenate the other two nodes.
@OverridepublicRopeLikedelete(intstart,intlength){vallhs=split(start);valrhs=split(start+length);returnrebalance(newRopeLikeTree(lhs.fst,rhs.snd));}
Report(i, j)
: output the string Ci, …, Ci + j − 1.To report the string Ci, …, Ci + j − 1, find the node u that contains Ci and weight(u) >= j
, and then traverse T starting at node u. Output Ci, …, Ci + j − 1 by doing an in-order traversal of T starting at node u.
Operation | Rope | String |
---|---|---|
Index [1] | O(log n) | O(1) |
Split [1] | O(log n) | O(1) |
Concatenate | O(1) amortized, O(log n) worst case[ citation needed ] | O(n) |
Iterate over each character [1] | O(n) | O(n) |
Insert [2] [ failed verification ] | O(log n) | O(n) |
Append [2] [ failed verification ] | O(1) amortized, O(log n) worst case | O(1) amortized, O(n) worst case |
Delete | O(log n) | O(n) |
Report | O(j + log n) | O(j) |
Build | O(n) | O(n) |
Advantages:
Disadvantages:
This table compares the algorithmic traits of string and rope implementations, not their raw speed. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). In summary, ropes are preferable when the data is large and modified often.
In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.
In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.
In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.
A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.
In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.
In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.
An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.
In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:
In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is an important case where m = 2; similarly, a ternary tree is one where m = 3.
In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.
In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.
In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.
The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.
In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.
A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.
In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.
In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.
In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.
This article relies largely or entirely on a single source .(September 2011) |