Thin provisioning

Last updated

In computing, thin provisioning involves using virtualization technology to give the appearance of having more physical resources than are actually available. If a system always has enough resource to simultaneously support all of the virtualized resources, then it is not thin provisioned. The term thin provisioning is applied to disk layer in this article, but could refer to an allocation scheme for any resource. For example, real memory in a computer is typically thin-provisioned to running tasks with some form of address translation technology doing the virtualization. Each task acts as if it has real memory allocated. The sum of the allocated virtual memory assigned to tasks typically exceeds the total of real memory.

Computing activity requiring, benefiting from, or creating computers

Computing is any activity that uses computers. It includes developing hardware and software, and using computers to manage and process information, communicate and entertain. Computing is a critically important, integral component of modern industrial technology. Major computing disciplines include computer engineering, software engineering, computer science, information systems, and information technology.

Contents

The efficiency of thin or thick/fat provisioning is a function of the use case, not of the technology. Thick provisioning is typically more efficient when the amount of resource used very closely approximates to the amount of resource allocated. Thin provisioning offers more efficiency where the amount of resource used is much smaller than allocated, so that the benefit of providing only the resource needed exceeds the cost of the virtualization technology used.

In software and systems engineering, a use case is a list of actions or event steps typically defining the interactions between a role and a system to achieve a goal. The actor can be a human or other external system. In systems engineering, use cases are used at a higher level than within software engineering, often representing missions or stakeholder goals. The detailed requirements may then be captured in the Systems Modeling Language (SysML) or as contractual statements.

Just-in-time allocation differs from thin provisioning. Most file systems back files just-in-time but are not thin provisioned. Overallocation also differs from thin provisioning; resources can be over-allocated / oversubscribed without using virtualization technology, for example overselling seats on a flight without allocating actual seats at time of sale, avoiding having each consumer having a claim on a specific seat number.

Thin provisioning is a mechanism that applies to large-scale centralized computer disk-storage systems, SANs, and storage virtualization systems. Thin provisioning allows space to be easily allocated to servers, on a just-enough and just-in-time basis. Thin provisioning is called "sparse volumes" in some contexts.

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

Overview

Thin provisioning, in a shared-storage environment, provides a method for optimizing utilization of available storage. It relies on on-demand allocation of blocks of data versus the traditional method of allocating all the blocks in advance. This methodology eliminates almost all whitespace which helps avoid the poor utilization rates, often as low as 10%, that occur in the traditional storage allocation method where large pools of storage capacity are allocated to individual servers but remain unused (not written to). This traditional model is often called "fat" or "thick" provisioning.

With thin provisioning, storage capacity utilization efficiency can be automatically driven up towards 100% with very little administrative overhead. Organizations can purchase less storage capacity up front, defer storage capacity upgrades in line with actual business usage, and save the operating costs (electricity and floorspace) associated with keeping unused disk capacity spinning.

Thin technology on a storage virtualization frame was first introduced by VMware as part of their VMware Workstation and VMware ESX products in early 2001. [1] Previous systems generally required large amounts of storage to be physically pre-allocated because of the complexity and impact of growing volume (LUN) space. Thin provisioning enables over-allocation or over-subscription.

VMware for-profit maker of virtualization software, acquired by EMC Corporation in 2004

VMware, Inc. is a subsidiary of Dell Technologies that provides cloud computing and platform virtualization software and services. It was one of the first commercially successful companies to virtualize the x86 architecture.

VMware Workstation virtual machine

VMware Workstation is a hosted hypervisor that runs on x64 versions of Windows and Linux operating systems ; it enables users to set up virtual machines (VMs) on a single physical machine, and use them simultaneously along with the actual machine. Each virtual machine can execute its own operating system, including versions of Microsoft Windows, Linux, BSD, and MS-DOS. VMware Workstation is developed and sold by VMware, Inc., a division of Dell Technologies. There is a free-of-charge version, VMware Workstation Player, for non-commercial use. An operating systems license is needed to use proprietary ones such as Windows. Ready-made Linux VMs set up for different purposes are available from several sources.

Over-allocation

Over-allocation or over-subscription is a mechanism that allows a server to view more storage capacity than has been physically reserved on the storage array itself. This allows flexibility in growth of storage volumes, without having to predict accurately how much a volume will grow. Instead, block growth becomes sequential. Physical storage capacity on the array is only dedicated when data is actually written by the application, not when the storage volume is initially allocated. The servers, and by extension the applications that reside on them, view a full size volume from the storage but the storage itself only allocates the blocks of data when they are written.

Server (computing) computer to access a central resource or service on a network

In computing, a server is a computer program or a device that provides functionality for other programs or devices, called "clients". This architecture is called the client–server model, and a single overall computation is distributed across multiple processes or devices. Servers can provide various functionalities, often called "services", such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers.

As a practical consideration, a storage manager needs to monitor actual storage used, adding additional storage capacity such as disks, tapes, solid-state drives (SSD), etc. as necessary to satisfy the write requests of the server and residing application(s).

The over-allocation concept was first introduced when StorageTek (STK) announced their Iceberg product in 1991 (released later in 1994). [2] [3]

Banking analogy

There is an analogy between thin provisioning in computers and the keeping of cash reserve ratios in banks. Much as all processes running on a computer whose memory is thinly provisioned may not simultaneously use the sum total of their allotments of memory because it does not all exist in the computer at one time; all depositors to a bank may generally not simultaneously close their accounts by taking cash withdrawals since their combined total usually exceeds the cash kept by the bank.

See also

Related Research Articles

In computing, iSCSI is an acronym for Internet Small Computer Systems Interface, an Internet Protocol (IP)-based storage networking standard for linking data storage facilities. It provides block-level access to storage devices by carrying SCSI commands over a TCP/IP network. iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances. It can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval.

In computer operating systems, paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

Copy-on-write, sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred to the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, while adding a small overhead to resource-modifying operations.

Diskless node

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

VMware ESXi enterprise-class, type-1 hypervisor for deploying and serving virtual computers

VMware ESXi is an enterprise-class, type-1 hypervisor developed by VMware for deploying and serving virtual computers. As a type-1 hypervisor, ESXi is not a software application that is installed on an operating system (OS); instead, it includes and integrates vital OS components, such as a kernel.

VMware VMFS is VMware, Inc.'s clustered file system used by the company's flagship server virtualization suite, vSphere. It was developed to store virtual machine disk images, including snapshots. Multiple servers can read/write the same filesystem simultaneously while individual virtual machine files are locked. VMFS volumes can be logically "grown" by spanning multiple VMFS volumes together.

A logical disk, logical volume or virtual disk is a virtual device that provides an area of usable storage capacity on one or more physical disk drive(s) in a computer system. The disk is described as logical or virtual because it does not actually exist as a single physical entity in its own right. The goal of the logical disk is to provide computer software with what seems a contiguous storage area, sparing them the burden of dealing with the intricacies of storing files on multiple physical units. Most modern operating systems provide some form of logical volume management.

Catalogic DPX is an enterprise-level data protection solution that backs up and restores data and applications for a variety of operating systems. It has data protection, disaster recovery and business continuity planning capabilities. Catalogic DPX protects physical or virtual servers including VMWare, supports many database applications, including Oracle, SQL, SharePoint, and Exchange. Users can map to and use a backed up version of the database if something goes wrong with the primary version. DPX is managed from a single console and catalog. This allows for centralized control of both tape-based and disk-based data protection jobs across heterogeneous operating systems. DPX can protect data to disk, tape or cloud. It is used for various recovery use cases including file, application, BMR, VM or DR. DPX can spin up VMs from backup images, recover physical servers, bring up applications online from snapshot based backups, it can be used to recover from Ransomware.

VMDK is a file format that describes containers for virtual hard disk drives to be used in virtual machines like VMware Workstation or VirtualBox.

Eucalyptus is a paid and open-source computer software for building Amazon Web Services (AWS)-compatible private and hybrid cloud computing environments, originally developed by the company Eucalyptus Systems. Eucalyptus is an acronym for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. Eucalyptus enables pooling compute, storage, and network resources that can be dynamically scaled up or down as application workloads change. Mårten Mickos was the CEO of Eucalyptus. In September 2014, Eucalyptus was acquired by Hewlett-Packard and then maintained by DXC Technology. After DXC stopped developing the product in late 2017, AppScale Systems forked the code and started supporting Eucalyptus customers.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data center, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

Dell Compellent

Compellent Technologies, Inc, founded in 2002, was a provider of enterprise computer data storage systems that automate data movement at the block level. The company was headquartered in Eden Prairie, Minnesota, USA. Compellent's chief product, Storage Center, is a storage area network (SAN) system that combines a standards-based hardware platform and a suite of virtualized storage management applications, including automated tiered storage, thin provisioning and replication. The company developed software and products aimed at mid-size enterprises and sold through a channel network of independent providers and resellers. It became part of Dell in February 2011.

Universal Storage Platform (USP) was the brand name for an Hitachi Data Systems line of computer data storage disk arrays circa 2004 to 2010.

Virtual Storage Platform is the brand name for a Hitachi Data Systems line of computer data storage systems for data centers. Model numbers include G200, G400, G600, G800, G1000, and G1500.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

VM-aware storage (VAS) is computer data storage designed specifically for managing storage for virtual machines (VMs) within a data center. The goal is to provide storage that is simpler to use with functionality better suited for VMs compared with general-purpose storage. VM-aware storage allows storage to be managed as an integrated part of managing VMs rather than as logical unit numbers (LUNs) or volumes that are separately configured and managed.

Software-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots and backup.

vCenter Server is the centralized management utility for VMware, and is used to manage virtual machines, multiple ESXi hosts, and all dependant components from a single centralized location.

References

  1. Mike Laverick. "Thin provisioning myth-busters: The benefits of thin virtual disks". Since the days of VMware ESX 3, many IT folks have been wary of thin virtual disks...
  2. "Iceberg finally thaws out". Computerworld. May 2, 1994.
  3. Jon William Toigo. "Thin Is In -- Or Is It?". It was first offered by StorageTek, prior to its acquisition by Sun Microsystems, in its Iceberg (mainframe) and Shared Virtual Array (SVA) (open systems) arrays