Interleaved deltas

Last updated

Interleaved deltas, or SCCS weave is a method used by the Source Code Control System to store all revisions of a file. All lines from all revisions are "woven" together in a single block of data, with interspersed control instructions indicating which lines are included in which revisions of the file. Interleaved deltas are traditionally implemented with line oriented text files in mind, although nothing prevents the method from being applied to binary files as well.

Contents

Interleaved deltas were first implemented by Marc Rochkind in the SCCS in 1975. Its design makes all versions available at the same time, so that it takes the same time to retrieve any revision. It also contains sufficient information to identify the author of each line (blaming) in one block. [1] On the other hand, because all revisions for a file are parsed, every operation grows slower as more revisions are added. The term interleaved delta was coined later in 1982 by Walter F. Tichy, author of the Revision Control System, which compares the SCCS weave to his new reverse delta mechanism in RCS. [2]

Implementation in SCCS

In SCCS, the following weave block

 ^AI 1  ^AD 2  foo  ^AE 2  bar  ^AI 2  baz  ^AE 2  ^AE 1

represents a file that contains the lines "foo" and "bar" in the first release and the lines "bar" and "baz" in the second revision. The string "^A" denotes a control-A character.

The control lines in the interleaved delta block have the following meaning: [3]

Advantages

The time it takes to extract any revision from such an interleaved delta block is proportional to the size of the archive. The size of the archive is the sum of the size of all different lines in all revisions.

In order to extract a specific revision, an array of structures needs to be constructed, telling whether a specific block, tagged by a serial number in the interleaved deltas, will be copied to the output or not. The original SCCS implementation needs approx. 100 bytes of storage for each different serial number in the deltas in order to know how to extract a specific revision. A SCCS history file with one million deltas would thus need 100 MB of virtual memory to unpack. The size could be reduced by approx. 32 bytes per delta if no annotated file retrieval is needed.

The advantages of the weave method are the following:

Software using interleaved deltas

Bazaar intended to use interleaved deltas in 2006, [5] but it was ditched due to poor performance after it was actually implemented in bzr 0.1. It still provides a weave-style merge algorithm. [6]

See also

Related Research Articles

<span class="mw-page-title-main">Advanced Encryption Standard</span> Standard for the encryption of electronic data

The Advanced Encryption Standard (AES), also known by its original name Rijndael, is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001.

Trivial File Transfer Protocol (TFTP) is a simple lockstep File Transfer Protocol which allows a client to get a file from or put a file onto a remote host. One of its primary uses is in the early stages of nodes booting from a local area network. TFTP has been used for this application because it is very simple to implement.

Revision Control System (RCS) is an early implementation of a version control system (VCS). It is a set of UNIX commands that allow multiple users to develop and maintain program code or documents. With RCS, users can make their own revisions of a document, commit changes, and merge them. RCS was originally developed for programs but is also useful for text documents or configuration files that are frequently revised.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required.

<span class="mw-page-title-main">History of software configuration management</span>

The history of software configuration management (SCM) can be traced back as early as the 1950s, when CM, originally for hardware development and production control, was being applied to software development. Early software had a physical footprint, such as cards, tapes, and other media. The first software configuration management was a manual operation. With the advances in language and complexity, software engineering, involving configuration management and other methods, became a major concern due to issues like schedule, budget, and quality. Practical lessons, over the years, had led to the definition, and establishment, of procedures and tools. Eventually, the tools became systems to manage software changes. Industry-wide practices were offered as solutions, either in an open or proprietary manner. With the growing use of computers, systems emerged that handled a broader scope, including requirements management, design alternatives, quality control, and more; later tools followed the guidelines of organizations, such as the Capability Maturity Model of the Software Engineering Institute.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

<span class="mw-page-title-main">GNU arch</span> Distributed revision control system

GNU arch software is a distributed revision control system that is part of the GNU Project and licensed under the GNU General Public License. It is used to keep track of the changes made to a source tree and to help programmers combine and otherwise manipulate changes made by multiple people or at different times.

Source Code Control System (SCCS) is a version control system designed to track changes in source code and other text files during the development of a piece of software. This allows the user to retrieve any of the previous versions of the original source code and the changes which are stored. It was originally developed at Bell Labs beginning in late 1972 by Marc Rochkind for an IBM System/370 computer running OS/360.

In computing, a shebang is the character sequence #!, consisting of the characters number sign and exclamation mark, at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

The original Macintosh was a relatively simple machine, now of interest for its simplicity and for the fact that it was the first computer produced by Apple under the name Macintosh. The Macintosh used standard off-the-shelf components to the greatest extent possible, achieving a moderate price point by mixing complex LSI chips, readily customizable programmable array logic, and off-the-shelf components.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

A Serializer/Deserializer (SerDes) is a pair of functional blocks commonly used in high speed communications to compensate for limited input/output. These blocks convert data between serial data and parallel interfaces in each direction. The term "SerDes" generically refers to interfaces used in various technologies and applications. The primary use of a SerDes is to provide data transmission over a single line or a differential pair in order to minimize the number of I/O pins and interconnects.

The following tables describe attributes of notable version control and software configuration management (SCM) systems that can be used to compare and contrast the various systems.

<span class="mw-page-title-main">GNU Bazaar</span> Version control system

GNU Bazaar is a distributed and client–server revision control system sponsored by Canonical.

<span class="mw-page-title-main">Disk sector</span> Logical or physical division of storage media

In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs), and 2048 bytes for CD-ROMs, DVD-ROMs and BD-ROMs. Newer HDDs and SSDs use 4096 byte (4 KiB) sectors, which are known as the Advanced Format (AF).

Apple II graphics debuted on the original Apple II in 1977 and were used throughout the computer series of the same name. The graphics consist of a 16 color low-resolution mode and a high-resolution mode where visuals are dependent on artifact color. The Apple IIe added "double" versions of each of these, most prominently "double high-resolution" with twice the horizontal resolution in 16 colors. Internally, Apple II graphics modes are idiosyncratic and do not use a linear frame buffer.

References

  1. http://mrochkind.com/aup/talks/SCCS-Slideshow.pdf Rochkind, Marc. "The source code control system (SCCS)." IEEE Transactions on Software Engineering 1, no. 4 (1975)
  2. Tichy, Walter (1982). "Design, implementation, and evaluation of a Revision Control System". ICSE '82 Proceedings of the 6th International Conference on Software Engineering: 58–67. Retrieved 12 June 2012.
  3. http://sccs.sourceforge.net/man/sccsfile.4.html sccsfile(4) manual page
  4. "Intro to binary weave". www.bitkeeper.org.
  5. "BZR weaving its way to the front". blog.fxa.org. Archived from the original on 6 October 2006. Retrieved 12 January 2022.
  6. "BzrWeaveFormat". Bzr Wiki. Archived from the original on 4 September 2019. Retrieved 16 January 2020.