Enfilade (Xanadu)

Last updated

Enfilades are a class of tree data structures invented by computer scientist Ted Nelson and used in Project Xanadu "Green" designs of the 1970s and 1980s. Enfilades allow quick editing, versioning, retrieval and inter-comparison operations in a large, cross-linked hypertext database. The Xanadu "Gold" design starting in the 1990s used a related data structure called the Ent.

Contents

Structure and properties

Although the principles of enfilades can be applied to any tree data structure, the particular structure used in the Xanadu system was much like a B-Tree. What distinguishes enfilades is the use of dsps and wids in the indexing information within tree nodes.

Dsps are displacements, offsets or relative keys. A dsp is the difference in key between a containing node and that of a subtree or leaf. For instance, the leaf for a grid square in a map might have a certain longitude and latitude offset relative to the larger grid represented by the subtree the leaf is part of. The key of any leaf of an enfilade is found by combining all the dsps on the path down the tree to that leaf. Dsps can also be used for other context information that is imposed top-down on entire subtrees or ranges of content at once.

Wids are widths, ranges, or bounding boxes. A wid is relative to the key of a subtree or leaf, but specifies a range of addresses covering all items within the subtree. Wids identify the interesting parts of sparsely populated address spaces. In some enfilades, the wids of subtrees under a given node can overlap, and in any case, a search for data within a range of addresses must visit any subtrees whose wids intersect the search range. Wids are combined from the leaves of the tree, upward through all layers to the root (although they are maintained incrementally). Wids can also contain other summaries such as totals or maxima of data.

The relative nature of wids and dsps allows subtrees to be rearranged within an enfilade. By changing the dsp at the top of a subtree, the keys of all the data underneath are implicitly changed. Edit operations in enfilades are performed by "cutting," or splitting the tree down relevant access paths, inserting, deleting or rearranging subtrees, and splicing the pieces back together. The cost of cutting and splicing operations is generally log-like in 1-D trees and between log-like and square-root-like in 2-D trees.

Subtrees can also be shared between trees, or be linked from multiple places within a tree. This makes the enfilade a fully persistent data structure with virtual copying and versioning of content. Each use of a subtree inherits a different context from the chain of dsps down to it. Changes to a copy create new nodes only along the cut paths, and leave the entire original in place. The overhead for a version is very small, a new version's tree is balanced and fast, and its storage cost is related only to changes from the original.

One-dimensional enfilades are intermediate between arrays' direct addressability and linked lists' ease of insertion, deletion and rearrangement. Multidimensional enfilades resemble loose, rearrangeable, versionable Quad trees, Oct trees or k-d trees.

Types of enfilades in Xanadu

The Model-T enfilade, used in Xanadu designs before 1979, is a data structure very much like the Rope. It stores linear sequences of characters, with easy insertion, deletion, rearrangement and versioning, but not with links or easy comparison between versions. Text is stored directly in the leaves of the enfilade.

Later Xanadu designs are more indirect: a growing pool of sharable content pieces, called the istream (invariant stream) is organized into the documents, links and versionsall with virtual addressesthat the users see and work on. A collection of enfilade types manages the bi-directional mapping between virtual and istream addresses. Tracing correspondences and links between documents is made possible by mapping from virtual, to invariant, and back to virtual addresses. Storing documents using shared content pieces that remember their identities and can trace back to all their appearances, is called Transclusion.

The POOMfilade (permutation of order matrix) is a 2D enfilade representing a Permutation matrix. This maps virtual position in a document to istream positions in the pooled content that the document is built from. The POOM starts out an identity matrix, then each edit to the document slices and rearranges horizontal strips of the map. The POOM can be queried in the V->I or I->V directions by searching in squat, wide address ranges or tall, narrow ones.

The Spanfilade collects the union of all spans of istream content used by a document or set of documents. Taking the intersection of span-sets from two documents or versions of a document speeds up the comparison of documents. This same mechanism is used to find links from or to a document.

The Granfilade organizes the storage of all this information on disks and a network of servers.

Trade secret until 1999

Enfilades (internal data structures) and istream addresses are not exposed to Xanadu's external interfaces. Enfilades were trade-secret information until the Xanadu code was made open-source in 1999, and were mentioned but not explained in some publications before that point. [1]

Client-server communications in the Xanadu system use vstream addresses in a format called tumblers.

Hence the term Enfilade is not mentioned explicitly in the FeBe (Front end - Back end protocol) document, but is instead noted indirectly in Xanalogical Structure and several other documents. In the aforementioned document, it is noted that xu88 was based on "General Enfilade Theory".

History

In 1972, xu72 introduced the concept of the Enfilade. This was called the "Model T Enfilade", and was used in a word-processing type interface. In 1976, xu76 implemented the "tightly coupled enfilade". In 1980, the xu80 system introduced the "ent", described as a versioning enfilade. In 1988, the xu88 system utilized the concept of "General Enfilade Theory" of Mark S. Miller, Stuart Greene and Roger Gregory, described as "generating data management trees with an upwardly propagating search property and simultaneously a downwardly imposable structural property". The xu88 also extended the concept of the Enfilade over a distributed network, introduced two-dimensional Enfilades, and implemented an algorithm for searching the entire docuverse for overlapping Enfilade spans. In 1992, xu92 implemented the modern concept of the ent. [2]

See also

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Hypertext</span> Text with references (links) to other text that the reader can immediately access

Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically activated by a mouse click, keypress set, or screen touch. Apart from text, the term "hypertext" is also sometimes used to describe tables, images, and other presentational content formats with integrated hyperlinks. Hypertext is one of the key underlying concepts of the World Wide Web, where Web pages are often written in the Hypertext Markup Language (HTML). As implemented on the Web, hypertext enables the easy-to-use publication of information over the Internet.

<span class="mw-page-title-main">Tree (data structure)</span> Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

Project Xanadu was the first hypertext project, founded in 1960 by Ted Nelson. Administrators of Project Xanadu have declared it superior to the World Wide Web, with the mission statement: "Today's popular software simulates paper. The World Wide Web trivialises our original hypertext model with one-way ever-breaking links and no management of version or contents."

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

Ents are beings from Anglo-Saxon mythology, more widely known today through J.R.R Tolkien's interpretation of them as a fictional race of tree shepherds in The Lord of the Rings fantasy novels.

In the design of the Xanadu computer system, a tumbler is an address of any range of content or link or a set of ranges or links. According to Gary Wolf in Wired, the idea of tumblers was that "the address would not only point the reader to the correct machine, it would also indicate the author of the document, the version of the document, the correct span of bytes, and the links associated with these bytes." Tumblers were created by Roger Gregory and Mark Miller.

An exponential tree is a type of search tree where the number of children of its nodes decreases doubly-exponentially with increasing depth. Values are stored only in the leaf nodes. Each node contains a splitter, a value less than or equal to all values in the subtree which is used during search. Exponential trees use another data structure in inner nodes containing the splitters from children, allowing fast lookup.

<span class="mw-page-title-main">Quadtree</span> Tree data structure in which each internal node has exactly four children, to partition a 2D area

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.

<span class="mw-page-title-main">R-tree</span> Data structures used in spatial indexing

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" or "find the nearest gas station". The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.

<span class="mw-page-title-main">Rope (data structure)</span> Data structure for storing strings

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key and creating point clouds. k-d trees are a special case of binary space partitioning trees.

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

In computer science, a priority search tree is a tree data structure for storing points in two dimensions. It was originally introduced by Edward M. McCreight. It is effectively an extension of the priority queue with the purpose of improving the search time from O(n) to O(s + log n) time, where n is the number of points in the tree and s is the number of points returned by the search.

References

  1. Literary Machines: The report on, and of, Project Xanadu concerning word processing, electronic publishing, hypertext, thinkertoys, tomorrow's intellectual revolution, and certain other topics including knowledge, education and freedom (1981), Mindful Press, Sausalito, California.
  2. Xanalogical Structure