Hard link

Last updated

In computing, a hard link is a directory entry (in a directory-based file system) that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file accessible via additional paths (i.e., via different names or in different directories). [1] This causes an alias effect: a process can open the file by any one of its paths and change its content. By contrast, a soft link or “shortcut” to a file is not a direct link to the data itself, but rather a reference to a hard link or another soft link.

Contents

Every directory is itself a special file on many systems, containing a list of file names instead of other data. Hence, multiple hard links to directories are possible, which could create a circular directory structure, rather than a branching structure like a tree. For that reason, some file systems forbid the creation of additional hard links to directories.

POSIX-compliant operating systems, such as Linux, Android, macOS, and the non POSIX compliant Windows NT family, [2] support multiple hard links to the same file, depending on the file system. For instance, NTFS and ReFS support hard links, [3] while FAT does not.

Operation

An illustration of the concept of hard linking Hard Link Illustration.svg
An illustration of the concept of hard linking

Let two hard links, named "LINK A.TXT" and "LINK B.TXT", point to the same physical data. A text editor opens "LINK A.TXT", modifies it and saves it. When the editor (or any other app) opens "LINK B.TXT", it can see those changes made to "LINK A.TXT", since both file names point to the same data. So from a user's point of view this is one file with several filenames. Editing any filename modifies "all" files, however deleting "any" filename except the last one keeps the file around.

However, some editors, such as GNU Emacs, break the hard link concept. When opening a file for editing, e.g., "LINK B.TXT", emacs renames "LINK B.TXT" to "LINK B.TXT~", loads "LINK B.TXT~" into the editor, and saves the modified contents to a newly created "LINK B.TXT". Now, "LINK A.TXT" and "LINK B.TXT" no longer shares the same data. (This behavior can be changed using the emacs variable backup-by-copying.)

Any number of hard links to the physical data may be created. To access the data, a user only needs to specify the name of any existing link; the operating system will resolve the location of the actual data. Even if the user deletes one of the hard links, the data is still accessible through any other link that remains. Once the user deletes all of the links, if no process has the file open, the operating system frees the disk space that the file once occupied.

Reference counting

Simplified illustration of hard links on typical Unix filesystem. Note that files "A" and "D" both point to same index entry in filesystem's inode table, making its reference count 2. Simplified illustration of hard links on typical UN*X filesystem.png
Simplified illustration of hard links on typical Unix filesystem. Note that files "A" and "D" both point to same index entry in filesystem's inode table, making its reference count 2.

Most file systems that support hard links use reference counting. The system stores an integer value with each logical data section that represents the total number of hard links that have been created to point to the data. When a new link is created, this value is increased by one. When a link is removed, the value is decreased by one. When the counter becomes zero, the operating system frees the logical data section. (The OS may not to do so immediately, e.g., when there are outstanding file handles open, for performance reasons, or to enable the undelete command.)

This is a simple method for the file system to track the use of a given area of storage, as zero values indicate free space and nonzero values indicate used space. The maintenance of this value guarantees that there will be no dangling hard links pointing nowhere. The data section and the associated inode are preserved as long as a single hard link (directory reference) points to it or any process keeps the associated file open.

On POSIX-compliant operating systems, the reference count for a file or directory is returned by the stat() or fstat() system calls in the st_nlink field of struct stat.

Limitations

To prevent loops in the filesystem, and to keep the interpretation of the ".." file (parent directory) consistent, operating systems do not generally allow hard links to directories. UNIX System V allowed them, but only the superuser had permission to make such links. [4] Mac OS X v10.5 (Leopard) and newer use hard links on directories for the Time Machine backup mechanism only. [5]

Hard links can be created to files only on the same volume, i.e., within the same file system. (Different volumes may have different file systems. There is no guarantee that the target volume's file system is compatible with hard linking.)

The maximum number of hard links to a single file is limited by the size of the reference counter. On Unix-like systems the counter is 4,294,967,295 (on 32-bit machines) or 18,446,744,073,709,551,615 (on 64-bit machines). In some file systems, the number of hard links is limited more strictly by their on-disk format. For example, as of Linux 3.11, the ext4 file system limits the number of hard links on a file to 65,000. [6] Windows limits enforces a limit of 1024 hard links to a file on NTFS volumes. [7]

On Linux Weekly News, Neil Brown criticized hard links as high-maintenance, since they complicate the design of programs that handle directory trees, including archivers and disk usage tools. These apps must take care to de-duplicate files that are linked multiple times in a hierarchy. Brown notes that Plan 9 from Bell Labs, the intended successor to Unix, does not include the concept of a hard link. [8]

Platform support

Windows NT 3.1 and later support hard links on the NTFS file system. [9] Windows 2000 introduces a CreateHardLink() function to create hard links, but only for files, not directories. [10] The DeleteFile() function can remove them.

To create a hard link on Windows, end-users can use:

To interrogate a file for its hard links, end-users can use:

The Windows Component Store uses hard links to keep track of different versions of components stored on the hard disk drive.

On Unix-like systems, the link() system call can create additional hard links to existing files. To create hard links, end-users can use:

To interrogate a file for its hard links, end-users can use:

Unix-like emulation or compatibility software running on Microsoft Windows, such as Cygwin and Subsystem for UNIX-based Applications, allow the use of POSIX interfaces.

OpenVMS supports hard links on the ODS-5 file system. [15] Unlike Unix, VMS can create hard links to directories.

See also

Related Research Articles

NT File System (NTFS) is a proprietary journaling file system developed by Microsoft in the 1990s.

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk after a partitioning scheme is chosen for the new disk before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

cd (command) Computer command in various operating systems

The cd command, also known as chdir, is a command-line shell command used to change the current working directory in various operating systems. It can be used in shell scripts and batch files.

In computing, a symbolic link is a file whose purpose is to point to a file or directory by specifying a path thereto.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

<span class="mw-page-title-main">Filename</span> Text string used to uniquely identify a computer file

A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths.

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.

The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.

<span class="mw-page-title-main">File system</span> Computer filing system

In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

In computing, an extent is a contiguous area of storage reserved for a file in a file system, represented as a range of block numbers, or tracks on count key data devices. A file can consist of zero or more extents; one file fragment requires one extent. The direct benefit is in storing each range compactly as two numbers, instead of canonically storing every block number in the range. Also, extent allocation results in less file fragmentation.

File attributes are a type of metadata that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted. The availability of most file attributes depends on support by the underlying filesystem where attribute data must be stored along with other control structures. Each attribute can have one of two states: set and cleared. Attributes are considered distinct from other metadata, such as dates and times, filename extensions or file system permissions. In addition to files, folders, volumes and other file system objects may have attributes.

In computing, a file shortcut is a handle in a user interface that allows the user to find a file or resource located in a different directory or folder from the place where the shortcut is located. Similarly, an Internet shortcut allows the user to open a page, file or resource located at a remote Internet location or Web site.

In computer data storage, a volume or logical drive is a single accessible storage area with a single file system, typically resident on a single partition of a hard disk. Although a volume might be different from a physical disk drive, it can still be accessed with an operating system's logical interface. However, a volume differs from a partition.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

The following tables compare general and technical information for a number of file systems.

An NTFS reparse point is a type of NTFS file system object. It is available with the NTFS v3.0 found in Windows 2000 or later versions. Reparse points provide a way to extend the NTFS filesystem. A reparse point contains a reparse tag and data that are interpreted by a filesystem filter driver identified by the tag. Microsoft includes several default tags including NTFS symbolic links, directory junction points, volume mount points and Unix domain sockets. Also, reparse points are used as placeholders for files moved by Windows 2000's Remote Storage Hierarchical Storage System. They also can act as hard links, but are not limited to pointing to files on the same volume: they can point to directories on any local volume. The feature is inherited to ReFS.

NTFS links are the abstraction used in the NTFS file system—the default file system for all Microsoft Windows versions belonging to the Windows NT family—to associate pathnames and certain kinds of metadata, with entries in the NTFS Master File Table (MFT). NTFS broadly adopts a pattern akin to typical Unix file systems in the way it stores and references file data and metadata; the most significant difference is that in NTFS, the MFT "takes the place of" inodes, fulfilling most of the functions which inodes fulfill in a typical Unix filesystem.

<span class="mw-page-title-main">PowerShell</span> Cross-platform command-line interface and scripting language for system and network administration

PowerShell is a task automation and configuration management program from Microsoft, consisting of a command-line shell and the associated scripting language. Initially a Windows component only, known as Windows PowerShell, it was made open-source and cross-platform on August 18, 2016, with the introduction of PowerShell Core. The former is built on the .NET Framework, the latter on .NET.

References

  1. Pitcher, Lew. "Q & A: The difference between hard and soft links".
  2. "Link Shell Extension".
  3. "Resilient File System (ReFS) overview". Microsoft Learn. 26 October 2022 via Microsoft Docs.
  4. Bach, Maurice J. (1986). The Design of the UNIX Operating System . Prentice Hall. p. 128. ISBN   9780132017992.
  5. Pond, James (August 31, 2013). "How Time Machine Works its Magic". File System Event Store, Hard Links. Archived from the original on June 21, 2019. Retrieved May 19, 2019.
  6. "Linux kernel source tree, fs/ext4/ext4.h, line 229".
  7. "CreateHardLinkA function (winbase.h)". Windows App Development. 13 October 2021 via Microsoft Docs.
  8. Brown, Neil (23 November 2010). "Ghosts of Unix past, part 4: High-maintenance designs". Linux Weekly News. Retrieved 20 April 2014.
  9. "How hard links work". Microsoft Docs. 6 January 2011.
  10. "CreateHardLink Function". Windows Development. Microsoft. 10 March 2011. Archived from the original on 2 July 2011 via MSDN. Establishes a hard link between an existing file and a new file. This function is only supported on the NTFS file system, and only for files, not directories.{{cite web}}: CS1 maint: unfit URL (link)
  11. 1 2 "Fsutil hardlink". Windows App Development. Microsoft. 18 April 2012 via Microsoft Docs.
  12. "Mklink". Microsoft Docs. Microsoft. 18 April 2012.
  13. 1 2 "New-Item (PowerShell 3.0)". Microsoft Docs. Microsoft. 22 June 2020. If your location is in a FileSystem drive, the following values are allowed: If your location is in a FileSystem drive, the following values are allowed: File[,] Directory[,] Junction[,] HardLink
  14. 1 2 "FileSystemProvider.cs". PowerShell / PowerShell repo. Microsoft. 20 November 2021. Lines 8139–8234 via GitHub.
  15. "OpenVMS System Manager's Manual, Vol. I" (PDF). VSI. August 2019. Retrieved 2021-01-23.