Interleaved deltas

Last updated

Interleaved deltas, or SCCS weave is a method used by the Source Code Control System to store all revisions of a file. All lines from all revisions are "woven" together in a single block of data, with interspersed control instructions indicating which lines are included in which revisions of the file. Interleaved deltas are traditionally implemented with line oriented text files in mind, although nothing prevents the method from being applied to binary files as well.

Contents

Interleaved deltas were first implemented by Marc Rochkind in the SCCS in 1975. Its design makes all versions available at the same time, so that it takes the same time to retrieve any revision. It also contains sufficient information to identify the author of each line (blaming) in one block. [1] On the other hand, because all revisions for a file are parsed, every operation grows slower as more revisions are added. The term interleaved delta was coined later in 1982 by Walter F. Tichy, author of the Revision Control System, which compares the SCCS weave to his new reverse delta mechanism in RCS. [2]

Implementation in SCCS

In SCCS, the following weave block

 ^AI 1  ^AD 2  foo  ^AE 2  bar  ^AI 2  baz  ^AE 2  ^AE 1

represents a file that contains the lines "foo" and "bar" in the first release and the lines "bar" and "baz" in the second revision. The string "^A" denotes a control-A character.

The control lines in the interleaved delta block have the following meaning: [3]

Advantages

The time it takes to extract any revision from such an interleaved delta block is proportional to the size of the archive. The size of the archive is the sum of the size of all different lines in all revisions.

In order to extract a specific revision, an array of structures needs to be constructed, telling whether a specific block, tagged by a serial number in the interleaved deltas, will be copied to the output or not. The original SCCS implementation needs approx. 100 bytes of storage for each different serial number in the deltas in order to know how to extract a specific revision. A SCCS history file with one million deltas would thus need 100 MBytes of virtual memory to unpack. The size could be reduced by approx. 32 bytes per delta if no annotated file retrieval is needed.

The advantages of the weave method are the following:

Software using interleaved deltas

Bazaar intended to use interleaved deltas in 2006, [5] but it was ditched due to poor performance after it was actually implemented in bzr 0.1. It still provides a weave-style merge algorithm. [6]

See also

Related Research Articles

Advanced Encryption Standard Standard for the encryption of electronic data

The Advanced Encryption Standard (AES), also known by its original name Rijndael, is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001.

Trivial File Transfer Protocol (TFTP) is a simple lockstep File Transfer Protocol which allows a client to get a file from or put a file onto a remote host. One of its primary uses is in the early stages of nodes booting from a local area network. TFTP has been used for this application because it is very simple to implement.

In software engineering, version control is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections of information. Version control is a component of software configuration management.

Revision Control System(RCS) is an early implementation of a version control system (VCS). It is a set of UNIX commands that allow multiple users to develop and maintain program code or documents. With RCS, users can make their own revisions of a document, commit changes, and merge them. RCS was originally developed for programs but is also useful for text documents or configuration files that are frequently revised.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Windows Plus!" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and Mac OS X.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required.

The history of software configuration management (SCM) in computing can be traced back as early as the 1950s, when CM, originally for hardware development and production control, was being applied to software development. The first software configuration management was most likely done manually. Eventually, software tools were written to manage software changes. History records tend to be based on tools and companies, and lend concepts to a secondary plane.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

GNU arch

GNU arch software is a distributed revision control system that is part of the GNU Project and licensed under the GNU General Public License. It is used to keep track of the changes made to a source tree and to help programmers combine and otherwise manipulate changes made by multiple people or at different times.

Source Code Control System (SCCS) is a version control system designed to track changes in source code and other text files during the development of a piece of software. This allows the user to retrieve any of the previous versions of the original source code and the changes which are stored. It was originally developed at Bell Labs beginning in late 1972 by Marc Rochkind for an IBM System/370 computer running OS/360.

Real-Time Messaging Protocol (RTMP) is a communication protocol for streaming audio, video, and data over the Internet. Originally developed as a proprietary protocol by Macromedia for streaming between Flash Player and the Flash Communication Server, Adobe has released an incomplete version of the specification of the protocol for public use.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

A Serializer/Deserializer (SerDes) is a pair of functional blocks commonly used in high speed communications to compensate for limited input/output. These blocks convert data between serial data and parallel interfaces in each direction. The term "SerDes" generically refers to interfaces used in various technologies and applications. The primary use of a SerDes is to provide data transmission over a single line or a differential pair in order to minimize the number of I/O pins and interconnects.

The following is a comparison of version-control software. The following tables include general and technical information on notable version control and software configuration management (SCM) software. For SCM software not suitable for source code, see Comparison of open-source configuration-management software.

GNU Bazaar

GNU Bazaar is a distributed and client–server revision control system sponsored by Canonical.

In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

There are various implementations of the Advanced Encryption Standard, also known as Rijndael.

References

  1. http://www.basepath.com/aup/talks/SCCS-Slideshow.pdf Rochkind, Marc. “The source code control system (SCCS).” IEEE Transactions on Software Engineering 1, no. 4 (1975)
  2. Tichy, Walter (1982). "Design, implementation, and evaluation of a Revision Control System". ICSE '82 Proceedings of the 6th International Conference on Software Engineering: 58–67. Retrieved 12 June 2012.
  3. http://sccs.sourceforge.net/man/sccsfile.4.html sccsfile(4) manual page
  4. "Intro to binary weave". www.bitkeeper.org.
  5. "BZR weaving its way to the front". blog.fxa.org. Archived from the original on 6 October 2006. Retrieved 12 January 2022.
  6. "BzrWeaveFormat". Bzr Wiki. Retrieved 16 January 2020.