Independent Reference Model

Last updated

The Independent Reference Model (I.R.M.) is a conceptual model used in the analysis of storage system: disk drives, caches, etc.

Under this model the references to stored objects are independent random variables. [1]

Description

The motivation for coming up with this model (and others like it) is to compensate for the lack of "traces" in such storage devices.

A "trace" is simply a logging of a set of data regarding the performance of the storage device, focusing on the I/O requests: how many read/write operations, the size of each request, the exact address (LUN-wise), and a time-stamp.

Accurate, valid and detailed traces of actual storage systems are very difficult to get for the purpose of academic analysis (for reasons that are beyond the scope of this article), and that is why such models are a necessity.

Usually, the data that is available is much coarser and low in quality, in comparison to a full trace: For example, the data might record for every unit of time T, the number of I/Os that took place at each LUN (or track), along with the total hit/miss ratios.

For example: For a disk drive with 4 tracks, A, B, C and D and after 15 minutes of work, the I/O requests were the following: 7600, 20, 50, 6000 for A, B, C and D respectively.

It is easy to see why this data is insufficient to determine the actual workload: Consider a second even more simple example: Two tracks, A and B, which each have 1,000 I/Os during 15 minutes.

To answer the simple question, "How hard did the disk work during those 15 minutes?" then consider these two following scenarios:

It is easy to see that, at each of these scenarios, the amount of work done by the disks is very different (in the first scenario, the case being that the disk did a minimal amount work, not having to travel between tracks more than once, and in the second scenario, a maximal amount of work).

The I.R.M. was first introduced by E. Coffman and P. Denning, [2] and it is still in active use today. It is the most simplified model.

In this "memoryless" model, every I/O reference represents an i.i.d multinomial random variable, whose outcome is the location of the next reference track. Thus, arrivals to any given track must occur at a specific average rate, which is directly proportional to the probability of requesting the track. More specifically, the arrival rate of requests to a given track equals its access probability, of being referenced, multiplied by the overall rate of requests.

That is, we denote N as the sum of all the I/O requests (both read and write) and assign to each track the probability the number of I/Os, which came from it, divided by N. In the case of our first example: N = 7600 + 20 + 50 + 6000 = 13,670 and we'll assign the following probabilities to each track:

A → 7600/N, B → 20/N, C → 50/N and D → 6000/N.

The benefit of this model, other than being simple and easy to work with, is its conservative property. This means that when analyzing the worst-case-scenario, we cannot be off by very much in the result of the model, as illustrated in the following example:

In the best-case-scenario - the disk only traveled once from A to B, and in the worst-case, the disk traveled 2,000 times back or forth.
Using the I.R.M. model (a technical computation that will not be brought here), then the expectancy is 1,000 travels between tracks.
That is: the result was off from the worst-case-scenario by a multiple of two, while in the best-case-scenario, it actually spared a multiple of 1,000!

Indeed, it can be proven that the I.R.M. model always satisfies that it is always "off" by, at most, a multiple of two.

Related Research Articles

<span class="mw-page-title-main">Floppy disk</span> Removable disk storage medium

A floppy disk or floppy diskette is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a fabric that removes dust particles from the spinning disk. Floppy disks store digital data which can be read and written when the disk is inserted into a floppy disk drive (FDD) connected to or inside a computer or other device.

<span class="mw-page-title-main">SCSI</span> Set of computer and peripheral connection standards

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced in the 1980s and has seen widespread use on servers and high-end workstations, with new SCSI standards being published as recently as SAS-4 in 2017.

The Honeywell 6000 series computers were rebadged versions of General Electric's 600-series mainframes manufactured by Honeywell International, Inc. from 1970 to 1989. Honeywell acquired the line when it purchased GE's computer division in 1970 and continued to develop them under a variety of names for many years. In 1989, Honeywell sold its computer division to the French company Groupe Bull who continued to market compatible machines.

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional array.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

Logical block addressing (LBA) is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disk drives. LBA is a particularly simple linear addressing scheme; blocks are located by an integer index, with the first block being LBA 0, the second LBA 1, and so on.

<span class="mw-page-title-main">TRSDOS</span> Operating system for Tandy TRS-80 computers

TRSDOS is the operating system for the Tandy TRS-80 line of eight-bit Zilog Z80 microcomputers that were sold through Radio Shack from 1977 through 1991. Tandy's manuals recommended that it be pronounced triss-doss. TRSDOS should not be confused with Tandy DOS, a version of MS-DOS licensed from Microsoft for Tandy's x86 line of personal computers (PCs).

<span class="mw-page-title-main">Clariion</span> Storage array product

Clariion is a discontinued SAN disk array manufactured and sold by EMC Corporation, it occupied the entry-level and mid-range of EMC's SAN disk array products. In 2011, EMC introduced the EMC VNX Series, designed to replace both the Clariion and Celerra products.

<span class="mw-page-title-main">File system</span> Computer filing system

In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

A NetApp FAS is a computer storage product by NetApp running the ONTAP operating system; the terms ONTAP, AFF, ASA, FAS are often used as synonyms. "Filer" is also used as a synonym although this is not an official name. There are three types of FAS systems: Hybrid, All-Flash, and All SAN Array:

  1. NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
  2. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS
  3. All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

<span class="mw-page-title-main">IBM 7070</span> Decimal computer introduced by IBM in 1958

IBM 7070 is a decimal-architecture intermediate data-processing system that was introduced by IBM in 1958. It was part of the IBM 700/7000 series, and was based on discrete transistors rather than the vacuum tubes of the 1950s. It was the company's first transistorized stored-program computer.

<span class="mw-page-title-main">CDC 6000 series</span> Family of 1960s mainframe computers

The CDC 6000 series is a discontinued family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of the CDC 6200, CDC 6300, CDC 6400, CDC 6500, CDC 6600 and CDC 6700 computers, which were all extremely rapid and efficient for their time. Each is a large, solid-state, general-purpose, digital computer that performs scientific and business data processing as well as multiprogramming, multiprocessing, Remote Job Entry, time-sharing, and data management tasks under the control of the operating system called SCOPE. By 1970 there also was a time-sharing oriented operating system named KRONOS. They were part of the first generation of supercomputers. The 6600 was the flagship of Control Data's 6000 series.

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

Count key data (CKD) is a direct-access storage device (DASD) data recording format introduced in 1964, by IBM with its IBM System/360 and still being emulated on IBM mainframes. It is a self-defining format with each data record represented by a Count Area that identifies the record and provides the number of bytes in an optional Key Area and an optional Data Area. This is in contrast to devices using fixed sector size or a separate format track.

<span class="mw-page-title-main">Process management (computing)</span> Computer system for maintaining order among running programs

A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, the OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the OS to exert control over each process.

<span class="mw-page-title-main">Storage area network</span> Network which provides access to consolidated, block-level data storage

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

Electronic systems’ power consumption has been a real challenge for Hardware and Software designers as well as users especially in portable devices like cell phones and laptop computers. Power consumption also has been an issue for many industries that use computer systems heavily such as Internet service providers using servers or companies with many employees using computers and other computational devices. Many different approaches have been discovered by researchers to estimate power consumption efficiently. This survey paper focuses on the different methods where power consumption can be estimated or measured in real-time.

References

  1. Flajolet, Philippe (1992). "Birthday paradox, coupon collectors, caching algorithms and self-organizing search". Discrete Applied Mathematics. 39 (3): 207–229. doi: 10.1016/0166-218x(92)90177-c .
  2. Coffman, Edward G. Jr.; Denning, Peter J. (1973-01-01). Operating Systems Theory . Prentice Hall Professional Technical Reference. ISBN   978-0136378686.