File size

Last updated

File size is a measure of how much data a computer file contains or, alternately, how much storage it consumes. Typically, file size is expressed in units of measurement based on the byte. By convention, file size units use either a metric prefix (as in megabyte and gigabyte) or a binary prefix (as in mebibyte and gibibyte). [1]

Contents

When a file is written to a file system, which is the case in most modern devices, it may consume slightly more disk space than the file requires. This is because the file system rounds the size up to include any unused space left over in the last disk sector/block used by the file. (A sector is the smallest amount of space addressable by the file system. The size of a disk sector ranges from several hundred to several thousand bytes.) The unused space is called slack space or internal fragmentation. [2] Although smaller sector sizes allow for denser use of disk space, they decrease the operational efficiency of the file system.

Maximum size

The maximum file size a file system supports depends not only on the capacity of the file system, but also on the number of bits reserved for the storage of file size information. The maximum file size in the FAT32 file system, for example, is 4,294,967,295 bytes, which is one byte less than four gigabytes. [3] The table below details the maximum file size for a number of common or historical file systems.

File system Maximum size [lower-alpha 1]
APFS 8 EB
exFAT 16 EB - 1 byte
FAT12 16 MB (4 KB clusters) or 32 MB (8 KB clusters)
FAT16B 2 GB (without LFS) or 4 GB - 1 byte (with LFS)
FAT32 4 GB - 1 byte
HFS 2 GB
HFS+ 8 EB
HPFS 2 GB
NTFS 16 EB - 1 KB
Btrfs 16 EB

Units of information

Bytes are the typical base unit of information. Larger files will typically have their sizes expressed using kilobyte, megabyte or gigabyte depending upon how large the file is. While these larger units are not as accurate as the byte size, most operating systems will expose the true byte size of a file by inspecting the file properties directly. Command line tools can also expose the exact byte size as well.

A file system may display all sizes with the metric system with only 'kB' on small files indicating it, while some file systems/operating systems would display sizes in, the traditionally used on computers, binary system for all sizes, e.g. 'KB', while hard disk manufacturers use the metric system (for e.g. GB = 1,000,000,000 bytes and TB = 1000 GB).

Kilobyte (KB) (JEDEC), is sometimes referred to unambiguously as kibibyte (KiB)(IEC). Sometimes kB, with lower cased SI-prefix 'k-' for kilo (1000), is used, then always equaling 1000 bytes.

File transfers (e.g. "downloads") may use rates of units of bytes (e.g. MB/s) in binary rather than metric system, while networking hardware, such as WiFi, always uses the metric system (Mbit/s, Gbit/s etc.). of units of bits (and it needs to send more than the files themselves, so some overhead needs to be factored in), making superficially similar terms very incompatible.[ citation needed ]

See also

Notes

  1. Based on the format standard, individual implementations may have different limits. See respective file system article for details.

Related Research Articles

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.

A binary prefix is a unit prefix that indicates a multiple of a unit of measurement by an integer power of two. The most commonly used binary prefixes are kibi (symbol Ki, meaning 210 = 1024), mebi (Mi, 220 = 1048576), and gibi (Gi, 230 = 1073741824). They are most often used in information technology as multipliers of bit and byte, when expressing the capacity of storage devices or the size of computer files.

<span class="mw-page-title-main">Gigabyte</span> Unit of digital information

The gigabyte is a multiple of the unit byte for digital information. The prefix giga means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB.

The kilobyte is a multiple of the unit byte for digital information. The International System of Units (SI) defines the prefix kilo as a multiplication factor of 1000 (103); therefore, one kilobyte is 1000 bytes. The internationally recommended unit symbol for the kilobyte is kB.

Kilo is a decimal unit prefix in the metric system denoting multiplication by one thousand (103). It is used in the International System of Units, where it has the symbol k, in lowercase.

The kilobit is a multiple of the unit bit for digital information or computer storage. The prefix kilo- (symbol k) is defined in the International System of Units (SI) as a multiplier of 103 (1 thousand), and therefore,

The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix mega is a multiplier of 1000000 (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes of information. This definition has been incorporated into the International System of Quantities.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

File Allocation Table (FAT) is a file system developed for personal computers and was the default filesystem for MS-DOS and Windows 9x operating systems. Originally developed in 1977 for use on floppy disks, it was adapted for use on hard disks and other devices. The increase in disk drives capacity required three major variants: FAT12, FAT16 and FAT32. FAT was replaced with NTFS as the default file system on Microsoft operating systems starting with Windows XP. Nevertheless, FAT continues to be used on flash and other solid-state memory cards and modules, many portable and embedded devices because of its compatibility and ease of implementation.

The megabit is a multiple of the unit bit for digital information. The prefix mega (symbol M) is defined in the International System of Units (SI) as a multiplier of 106 (1 million), and therefore

An order of magnitude is usually a factor of ten. Thus, four orders of magnitude is a factor of 10,000 or 104.

du (Unix) Standard Unix program

du is a standard Unix program used to estimate file space usage—space used under a particular directory or files on a file system. A Windows commandline version of this program is part of Sysinternals suite by Mark Russinovich.

A unit prefix is a specifier or mnemonic that is prepended to units of measurement to indicate multiples or fractions of the units. Units of various sizes are commonly formed by the use of such prefixes. The prefixes of the metric system, such as kilo and milli, represent multiplication by positive or negative powers of ten. In information technology it is common to use binary prefixes, which are based on powers of two. Historically, many prefixes have been used or proposed by various sources, but only a narrow set has been recognised by standards organisations.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">Power Macintosh 5500</span> Personal computer by Apple Computer

The Power Macintosh 5500 is a personal computer designed, manufactured, and sold by Apple Computer from February 1997 to March 1998. Like the Power Macintosh 5260 and 5400 that preceded it, the 5500 is an all-in-one design, built around a PowerPC 603ev processor operating at 225, 250 or 275 megahertz (MHz).

In telecommunications, data-transfer rate is the average number of bits (bitrate), characters or symbols (baudrate), or data blocks per unit time passing through a communication link in a data-transmission system. Common data rate units are multiples of bits per second (bit/s) and bytes per second (B/s). For example, the data rates of modern residential high-speed Internet connections are commonly expressed in megabits per second (Mbit/s).

This timeline of binary prefixes lists events in the history of the evolution, development, and use of units of measure which are germane to the definition of the binary prefixes by the International Electrotechnical Commission (IEC) in 1998, used primarily with units of information such as the bit and the byte.

In digital computing and telecommunications, a unit of information is the capacity of some standard data storage system or communication channel, used to measure the capacities of other systems and channels. In information theory, units of information are also used to measure information contained in messages and the entropy of random variables.

The maximum random access memory (RAM) installed in any computer system is limited by hardware, software and economic factors. The hardware may have a limited number of address bus bits, limited by the processor package or design of the system. Some of the address space may be shared between RAM, peripherals, and read-only memory. In the case of a microcontroller with no external RAM, the size of the RAM array is limited by the size of the integrated circuit die. In a packaged system, only enough RAM may be provided for the system's required functions, with no provision for addition of memory after manufacture.

The FAT file system is a file system used on MS-DOS and Windows 9x family of operating systems. It continues to be used on mobile devices and embedded systems, and thus is a well suited file system for data exchange between computers and devices of almost any type and age from 1981 through the present.

References

  1. JEDEC Solid State Technology Association (November 2019). "Terms, Definitions, and Letter Symbols for Microprocessors, and Memory Integrated Circuits". JESD 100B.01. p. 8. Retrieved 2009-04-05.
  2. "What is Slack Space?". IT Pro. 2010-01-19. Retrieved 2018-02-17.
  3. "Microsoft Extensible Firmware Initiative FAT32 File System Specification, FAT: General Overview of On-Disk Format". Microsoft. 2000-12-06. Retrieved 2011-07-03.