Libarc

Last updated
Libarc
Written inC++
Type library or framework

In programming, Libarc is a C++ library that accesses contents of GZIP compressed ARC files. These ARC files are generated by the Internet Archive's Heritrix web crawler. [1]

Overview

Libarc allows users to open and scan contents of GZIP compressed ARC Files. It also allows users to get an iterator that walks over the contents of said ARC files, member by member.

Users are able to specify the media type in order to limit the types seen. This allows them to access information in the member’s URL record and response headers from http servers and access to the member’s data in a single API call.

Additionally to the API reference documentation there are two other sources: Programming with libarc - This describes the libarac API, and the license and copyright policies held by the Basis Technology Corp.

Related Research Articles

A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats to reduce the size of the archive.

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

zlib DEFLATE codec library

zlib is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

A filename extension, file name extension or file extension is a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems it is separated with spaces. Other extension formats include dashes and/or underscores on early versions of GNU Linux and some versions of IBM AIX.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization.

compress is a Unix shell compression program based on the LZW compression algorithm. Compared to more modern compression utilities such as gzip and bzip2, compress performs faster and with less memory usage, at the cost of a significantly lower compression ratio.

In object-oriented programming (OOP), encapsulation refers to the bundling of data with the methods that operate on that data, or the restricting of direct access to some of an object's components. Encapsulation is used to hide the values or state of a structured data object inside a class, preventing direct access to them by clients in a way that could expose hidden implementation details or violate state invariance maintained by the methods.

7-Zip Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication (IPC). The concept is also found in OS/2 and Microsoft Windows, although the semantics differ substantially. A traditional pipe is "unnamed" and lasts only as long as the process. A named pipe, however, can last as long as the system is up, beyond the life of the process. It can be deleted if no longer used. Usually a named pipe appears as a file, and generally processes attach to it for IPC.

rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.

The file command is a standard program of Unix and Unix-like operating systems for recognizing the type of data contained in a computer file.

The following tables compare general and technical information for a number of file archivers. Please see the individual products' articles for further information. They are neither all-inclusive nor are some entries necessarily up to date. Unless otherwise specified in the footnotes section, comparisons are based on the stable versions—without add-ons, extensions or external programs.

ARC is a lossless data compression and archival format by System Enhancement Associates (SEA). The file format and the program were both called ARC. The format is known as the subject of controversy in the 1980s, part of important debates over what would later be known as open formats.

A file system API is an application programming interface through which a utility or user program requests services of a file system. An operating system may provide abstractions for accessing different file systems transparently.

DriveSpace is a disk compression utility supplied with MS-DOS starting from version 6.0 in 1993 and ending in 2000 with the release of Windows Me. The purpose of DriveSpace is to increase the amount of data the user could store on disks by transparently compressing and decompressing data on-the-fly. It is primarily intended for use with hard drives, but use for floppy disks is also supported. This feature was removed in Windows XP and later.

XAR is an open source file archiver and the archiver’s file format. It was created within the OpenDarwin project and is used in macOS X 10.5 and up for software installation routines, as well as browser extensions in Safari 5.0 and up. Xar replaced the use of gzipped pax files.

XZ Utils is a set of free software command-line lossless data compressors, including lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows.

lzip

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

Brotli is a data format specification for data streams compressed with a specific combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd order context modelling. Brotli is a compression algorithm developed by Google and works best for text compression. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and is becoming increasingly popular, as it provides better compression than gzip.

References

  1. "libarc". Sourceforge.net. 2004-06-08. Retrieved 2010-03-27.