PetaBox

Last updated
Internet Archive Petabox Internet Archive Protected Petabox.jpeg
Internet Archive Petabox

PetaBox, also stylized Petabox, is a storage unit from Capricorn Technologies and the Internet Archive. [1] [2] It was designed by the staff of the Internet Archive and C. R. Saikley to store and process one petabyte (a million gigabytes) of information. [3]

Contents

Specifications

Design

Design goals of the Petabox included: [3]

History

The first 100 terabyte rack became operational in Amsterdam at the Internet Archive's European arm, the Stichting Internet Archive (SIA), in June 2004. The second 80 terabyte rack became operational in their main San Francisco location that same year. The Internet Archive then spun off its Petabox production to the newly-formed company Capricorn Technologies. [3]

Between 2004 and 2007, Capricorn replicated the Internet Archive's deployment of the Petabox for major academic institutions, digital preservationists, government agencies, high-performance computing (HPC) and major research sites, medical imaging providers, digital image repositories, storage outsourcing sites, and other enterprises. Their largest product uses 750 gigabyte disks. In 2007, the Internet Archive data center housed approximately three petabytes of Petabox storage technology.

In 2010, the fourth version of the Petabox began operation. Each Petabox allowed for 480 TB of raw storage (240 disks of 2 TB each, set up with 24 disks per 4U high rack units and with 10 units per rack) running on Linux. [4] [5]

As of December 2021, the Internet Archive's Petabox storage system is comprised of four data centers, 745 nodes, and 28,000 spinning disks. The Wayback Machine contains 57 petabytes of information; book, music and video collections contain an extra 42 petabytes of information, and "unique data" comprises an extra 99 petabytes of information, for a total of 212 petabytes of storage. [3]

Related Research Articles

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk, is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

<span class="mw-page-title-main">Sneakernet</span> Informal term for the transfer of electronic information by physically moving media

Sneakernet, also called sneaker net, is an informal term for the transfer of electronic information by physically moving media such as magnetic tape, floppy disks, optical discs, USB flash drives or external hard drives between computers, rather than transmitting it over a computer network. The term, a tongue-in-cheek play on net(work) as in Internet or Ethernet, refers to walking in sneakers as the transport mechanism. Alternative terms may be floppy net, train net, or pigeon net.

An order of magnitude is usually a factor of ten. Thus, four orders of magnitude is a factor of 10,000 or 104.

Density is a measure of the quantity of information bits that can be stored on a given length of track, area of the surface, or in a given volume of a computer storage medium. Generally, higher density is more desirable, for it allows more data to be stored in the same physical space. Density therefore has a direct relationship to storage capacity of a given medium. Density also generally affects the performance within a particular medium, as well as price.

<span class="mw-page-title-main">MareNostrum</span> Supercomputer in the Barcelona Supercomputing Center

MareNostrum is the main supercomputer in the Barcelona Supercomputing Center. It is the most powerful supercomputer in Spain, one of thirteen supercomputers in the Spanish Supercomputing Network and one of the seven supercomputers of the European infrastructure PRACE.

Perpendicular recording, also known as conventional magnetic recording (CMR), is a technology for data recording on magnetic media, particularly hard disks. It was first proven advantageous in 1976 by Shun-ichi Iwasaki, then professor of the Tohoku University in Japan, and first commercially implemented in 2005. The first industry-standard demonstration showing unprecedented advantage of PMR over longitudinal magnetic recording (LMR) at nanoscale dimensions was made in 1998 at IBM Almaden Research Center in collaboration with researchers of Data Storage Systems Center (DSSC) – a National Science Foundation (NSF) Engineering Research Center (ERCs) at Carnegie Mellon University (CMU).

<span class="mw-page-title-main">NASA Advanced Supercomputing Division</span> Provides computing resources for various NASA projects

The NASA Advanced Supercomputing (NAS) Division is located at NASA Ames Research Center, Moffett Field in the heart of Silicon Valley in Mountain View, California. It has been the major supercomputing and modeling and simulation resource for NASA missions in aerodynamics, space exploration, studies in weather patterns and ocean currents, and space shuttle and aircraft design and development for almost forty years.

An optical jukebox is a robotic data storage device that can automatically load and unload optical discs, such as Compact Disc, DVD, Ultra Density Optical or Blu-ray and can provide terabytes (TB) or petabytes (PB) of tertiary storage. The devices are often called optical disk libraries, "optical storage archives", robotic drives, or autochangers. Jukebox devices may have up to 2,000 slots for disks, and usually have a picking device that traverses the slots and drives. Zerras Inc. provides a removeable capsule that holds up to 200 discs per library which can be scaled-out to manage 1600 discs per 42U rack unit. The arrangement of the slots and picking devices affects performance and maintenance costs, depending on the robotics design, the space between a disk and the picking device. Seek times and transfer rates vary depending upon the optical technology used.

Heat-assisted magnetic recording (HAMR) is a magnetic storage technology for greatly increasing the amount of data that can be stored on a magnetic device such as a hard disk drive by temporarily heating the disk material during writing, which makes it much more receptive to magnetic effects and allows writing to much smaller regions.

QFS is a filesystem from Oracle. It is tightly integrated with SAM, the Storage and Archive Manager, and hence is often referred to as SAM-QFS. SAM provides the functionality of a hierarchical storage manager.

<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device or a solid-state disk, even though SSDs lack the physical spinning disks and movable read–write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

<span class="mw-page-title-main">Sun Modular Datacenter</span> Portable data center built into a 20-foot shipping container

Sun Modular Datacenter is a portable data center built into a standard 20-foot intermodal container manufactured and marketed by Sun Microsystems. An external chiller and power were required for the operation of a Sun MD. A data center of up to 280 servers could be rapidly deployed by shipping the container in a regular way to locations that might not be suitable for a building or another structure, and connecting it to the required infrastructure. Sun stated that the system could be made operational for 1% of the cost of building a traditional data center.

Infrastructure as a service (IaaS) is a cloud computing service model by means of which computing resources are supplied by a cloud services provider. The IaaS vendor provides the storage, network, servers, and virtualization (which mostly refers, in this case, to emulating computer hardware). This service enables users to free themselves from maintaining an on-premises data center. The IaaS provider is hosting these resources in either the public cloud (meaning users share the same hardware, storage, and network devices with other users), the private cloud (meaning users do not share these resources), or the hybrid cloud (combination of both).

High Performance Storage System (HPSS) is a flexible, scalable, policy-based, software-defined Hierarchical Storage Management product developed by the HPSS Collaboration. It provides scalable hierarchical storage management (HSM), archive, and file system services using cluster, LAN and SAN technologies to aggregate the capacity and performance of many computers, disks, disk systems, tape drives, and tape libraries.

<span class="mw-page-title-main">Worldwide LHC Computing Grid</span> Grid computing project

The Worldwide LHC Computing Grid (WLCG), formerly the LHC Computing Grid (LCG), is an international collaborative project that consists of a grid-based computer network infrastructure incorporating over 170 computing centers in 42 countries, as of 2017. It was designed by CERN to handle the prodigious volume of data produced by Large Hadron Collider (LHC) experiments.

The National Institute for Computational Sciences (NICS) is funded by the National Science Foundation and managed by the University of Tennessee. NICS was home to Kraken, the most powerful computer in the world managed by academia. The NICS petascale scientific computing environment is housed at Oak Ridge National Laboratory (ORNL), home to the world's most powerful computing complex. The mission of NICS, a member of the Extreme Science and Engineering Discovery Environment (XSEDE - formerly TeraGrid), is to enable the scientific discoveries of researchers nationwide by providing leading-edge computational resources, together with support for their effective use, and leveraging extensive partnership opportunities.

<span class="mw-page-title-main">National Computational Infrastructure</span> HPC facility in Canberra, Australia

The National Computational Infrastructure is a high-performance computing and data services facility, located at the Australian National University (ANU) in Canberra, Australian Capital Territory. The NCI is supported by the Australian Government's National Collaborative Research Infrastructure Strategy (NCRIS), with operational funding provided through a formal collaboration incorporating CSIRO, the Bureau of Meteorology, the Australian National University, Geoscience Australia, the Australian Research Council, and a number of research intensive universities and medical research institutes.

Virtual Storage Platform is the brand name for a Hitachi Data Systems line of computer data storage systems for data centers. Model numbers include G200, G400, G600, G800, G1000, G1500 and G5500

<span class="mw-page-title-main">NCAR-Wyoming Supercomputing Center</span> High performance computing center in Wyoming, US

The NCAR-Wyoming Supercomputing Center (NWSC) is a high-performance computing (HPC) and data archival facility located in Cheyenne, Wyoming, that provides advanced computing services to researchers in the Earth system sciences.

<span class="mw-page-title-main">Archival Disc</span> Optical disc designed by Sony and Panasonic meant for data archiving

Archival Disc (AD) is the name of a trademark owned by Sony and Panasonic describing an optical disc storage medium designed for long-term digital storage. First announced on 10 March 2014 and introduced in the second quarter of 2015, the discs are intended to be able to withstand changes in temperature and humidity, in addition to dust and water, ensuring that the disc is readable for at least 50 years. The agreement between Sony and Panasonic to jointly develop the next generation optical media standard was first announced on 29 July 2013.

References

  1. "Big storage on the cheap". CNET .
  2. "PetaBox Product Family". Capricorn Technologies. Retrieved 2023-07-10.{{cite web}}: CS1 maint: url-status (link)
  3. 1 2 3 4 "Internet Archive: Petabox". Internet Archive . Retrieved 2023-07-10.{{cite web}}: CS1 maint: url-status (link)
  4. Jeff Kaplan (27 July 2010). "The Fourth Generation Petabox". Internet Archive.
  5. "eWEEK Labs Walk-Through: the Internet Archive". PCMag UK. Archived from the original on 2022-04-27. Retrieved 2021-11-09.