Archive file

Last updated

In computing, an archive file is a computer file that is composed of one or more files along with metadata. Many archive formats also support compression of member files. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space. Archive files often store directory structures, error detection and correction information, comments, and some use built-in encryption. [1] [2] [3]

Contents

Applications

Portability

Archive files are particularly useful in that they store file system data and metadata within the contents of a particular file, and thus can be stored on systems or sent over channels that do not support the file system in question, only file contents – examples include sending a directory structure over email, files with names unsupported on the target file system due to length or characters, and retaining files' date and time information. [4]

A single archive file may contain multiple member files; this can speed file transfers and other operations with processing overheads for each file, [5] [6] in addition to gains due to compression.

Software distribution

Beyond archival purposes, archive files are frequently used for packaging software for distribution, as software contents are often naturally spread across several files; the archive is then known as a package. While the archival file format is the same, there are additional conventions about contents, such as requiring a manifest file, and the resulting format is known as a package format. [7] Examples include deb for Debian, JAR for Java, APK for Android, and self-extracting Windows Installer executables.

Features

Features supported by various kinds of archives include:

Some archive programs have self-extraction, self-installation, source volume and medium information, and package notes/description.

The file extension or file header of the archive file are indicators of the file format used. Computer archive files are created by file archiver software, optical disc authoring software, and disk image software. [8]

Archive formats

An archive format is the file format of an archive file. Some formats are well-defined by their authors and have become conventions supported by multiple vendors and communities. [9]

Types

Examples

Filename extensions used to distinguish different types of archives include zip, rar, 7z, and tar, the first of which is the most widely implemented. [10]

Java also introduced a whole family of archive extensions such as jar and war (j is for Java and w is for web). They are used to exchange entire byte-code deployment. Sometimes they are also used to exchange source code and other text, HTML and XML files. By default they are all compressed. [11]

Error detection and recovery

Archive files often include parity checks and other checksums for error detection, for instance zip files use a cyclic redundancy check (CRC). RAR archives may include additional error correction data (called recovery records). [12]

Archive files that do not natively support recovery records can use separate parchive (PAR) files that allows for additional error correction and recovery of missing files in a multi-file archive. [13]

See also

Related Research Articles

A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats to reduce the size of the archive.

Waveform Audio File Format is an audio file format standard, developed by IBM and Microsoft, for storing an audio bitstream on personal computers. It is the main format used on Microsoft Windows systems for uncompressed audio. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and Mac OS X.

<span class="mw-page-title-main">JAR (file format)</span> Java archive file format

A JAR file is a package file format typically used to aggregate many Java class files and associated metadata and resources into one file for distribution.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

deb is the format, as well as filename extension of the software package format for the Debian Linux distribution and its derivatives.

<span class="mw-page-title-main">Linear Tape-Open</span> Magnetic tape-based data storage technology

Linear Tape-Open (LTO) is a magnetic tape data storage technology originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Hewlett Packard Enterprise, IBM, and Quantum control the LTO Consortium, which directs development and manages licensing and certification of media and mechanism manufacturers.

Utility software is a program specifically designed to help manage and tune system or application software. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that benefit ordinary users. However, utilities often form part of the application systems. For example, a batch job may run user-written code to update a database and may then include a step that runs a utility to back up the database, or a job may run a utility to compress a disk before copying files..

<span class="mw-page-title-main">Data corruption</span> Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Magnetic-tape data storage is a system for storing digital information on magnetic tape using digital recording.

<span class="mw-page-title-main">Apple Disk Image</span> Disk image file format developed by Apple and commonly used by macOS

AppleDisk Image is a disk image format commonly used by the macOS operating system. When opened, an Apple Disk Image is mounted as a volume within the Finder.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

Virtual memory compression is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process. The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time.

ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009–2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

References

  1. "Archive File: What it's Used For". Lifewire. Retrieved 2022-06-17.
  2. "Archive files". www.ibm.com. 2015-02-07. Retrieved 2022-06-17.
  3. "What is Archiving And Why is it Important?". Secure Data MGT. 2015-03-23. Retrieved 2022-06-17.
  4. "Data Portability and Platform Competition | Is User Data Exported From Facebook Actually Useful to Competitors?" (PDF). Archive.org . p. 22. Retrieved June 17, 2022.
  5. "Why file transfer speeds of small vs large files could be different". NetApp Knowledge Base. 2020-06-17. Retrieved 2022-06-17.
  6. "Why Small Files Take Longer to Copy Than Large Files". Dataquest. 2018-10-10. Retrieved 2022-06-17.
  7. Manager, Amit Ashbel, Senior Marketing and Strategy. "Data Archiving: The Basics and 5 Best Practices". cloud.netapp.com. Retrieved 2022-06-17.{{cite web}}: CS1 maint: multiple names: authors list (link)
  8. "What Is a File Extension & Why Are They Important?". Lifewire. Retrieved 2022-06-17.
  9. "What are Archive Files?". www.exefiles.com. Retrieved 2022-06-17.
  10. "Common file name extensions in Windows". support.microsoft.com. Retrieved 2022-06-17.
  11. Malefanem, Moses. "Learning Java Network Programming".{{cite journal}}: Cite journal requires |journal= (help)
  12. Drummond, James R. (1997). Parity, Checksums and CRC Checks (PDF) (1st ed.). Toronto. p. 13.{{cite book}}: CS1 maint: location missing publisher (link)
  13. text. "What are PAR and PAR2 Files?". Easynews. Retrieved 2022-06-17.