Hard link

Last updated

In computing, a hard link is a directory entry (in a directory-based file system) that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file accessible via additional paths (i.e., via different names or in different directories). [1] This causes an alias effect: a process can open the file by any one of its paths and change its content. By contrast, a soft link or “shortcut” to a file is not a direct link to the data itself, but rather a reference to a hard link or another soft link.

Contents

Every directory is itself a special file on many systems, containing a list of file names instead of other data. Hence, multiple hard links to directories are possible, which could create a circular directory structure, rather than a branching structure like a tree. For that reason, some file systems forbid the creation of hard links to directories.

POSIX-compliant operating systems, such as Linux, Android, macOS, and the Windows NT family, [2] support multiple hard links to the same file, depending on the file system. For instance, NTFS and ReFS support hard links, [3] while FAT does not.

Operation

An illustration of the concept of hard linking Hard Link Illustration.svg
An illustration of the concept of hard linking

Let two hard links, named "LINK A.TXT" and "LINK B.TXT", point to the same physical data. A text editor opens "LINK A.TXT", modifies it and saves it. When the editor (or any other app) opens "LINK B.TXT", it can see those changes made to "LINK A.TXT", since both file names point to the same data. So from a user's point of view this is one file with several filenames. Editing any filename modifies "all" files, however deleting "any" filename except the last one keeps the file around.

However, some editors, such as GNU Emacs, break the hard link concept. When opening a file for editing, e.g., "LINK B.TXT", emacs renames "LINK B.TXT" to "LINK B.TXT~", loads "LINK B.TXT~" into the editor, and saves the modified contents to a newly created "LINK B.TXT". Now, "LINK A.TXT" and "LINK B.TXT" no longer shares the same data. (This behavior can be changed using the emacs variable backup-by-copying.)

Any number of hard links to the physical data may be created. To access the data, a user only needs to specify the name of any existing link; the operating system will resolve the location of the actual data. Even if the user deletes one of the hard links, the data is still accessible through any other link that remains. Once the user deletes all of the links, if no process has the file open, the operating system frees the disk space that the file once occupied.

Reference counting

Simplified illustration of hard links on typical Unix filesystem. Note that files "A" and "D" both point to same index entry in filesystem's inode table, making its reference count 2. Simplified illustration of hard links on typical UN*X filesystem.png
Simplified illustration of hard links on typical Unix filesystem. Note that files "A" and "D" both point to same index entry in filesystem's inode table, making its reference count 2.

Most file systems that support hard links use reference counting. The system stores an integer value with each logical data section that represents the total number of hard links that have been created to point to the data. When a new link is created, this value is increased by one. When a link is removed, the value is decreased by one. When the counter becomes zero, the operating system frees the logical data section. (The OS may not to do so immediately, e.g., when there are outstanding file handles open, for performance reasons, or to enable the undelete command.)

This is a simple method for the file system to track the use of a given area of storage, as zero values indicate free space and nonzero values indicate used space. The maintenance of this value guarantees that there will be no dangling hard links pointing nowhere. The data section and the associated inode are preserved as long as a single hard link (directory reference) points to it or any process keeps the associated file open.

On POSIX-compliant operating systems, the reference count for a file or directory is returned by the stat() or fstat() system calls in the st_nlink field of struct stat.

Limitations

To prevent loops in the filesystem, and to keep the interpretation of the ".." file (parent directory) consistent, operating systems do not generally allow hard links to directories. UNIX System V allowed them, but only the superuser had permission to make such links. [4] Mac OS X v10.5 (Leopard) and newer use hard links on directories for the Time Machine backup mechanism only. [5]

Hard links can be created to files only on the same volume, i.e., within the same file system. (Different volumes may have different file systems. There is no guarantee that the target volume's file system is compatible with hard linking.)

The maximum number of hard links to a single file is limited by the size of the reference counter. On Unix-like systems the counter is 4,294,967,295 (on 32-bit machines) or 18,446,744,073,709,551,615 (on 64-bit machines). In some file systems, the number of hard links is limited more strictly by their on-disk format. For example, as of Linux 3.11, the ext4 file system limits the number of hard links on a file to 65,000. [6] Windows limits enforces a limit of 1024 hard links to a file on NTFS volumes. [7]

On Linux Weekly News, Neil Brown criticized hard links as high-maintenance, since they complicate the design of programs that handle directory trees, including archivers and disk usage tools. These apps must take care to de-duplicate files that are linked multiple times in a hierarchy. Brown notes that Plan 9 from Bell Labs, the intended successor to Unix, does not include the concept of a hard link. [8]

Platform support

Windows NT 3.1 and later support hard links on the NTFS file system. [9] Windows 2000 introduces a CreateHardLink() function to create hard links, but only for files, not directories. [10] The DeleteFile() function can remove them.

To create a hard link on Windows, end-users can use:

To interrogate a file for its hard links, end-users can use:

The Windows Component Store uses hard links to keep track of different versions of components stored on the hard disk drive.

On Unix-like systems, the link() system call can create additional hard links to existing files. To create hard links, end-users can use:

To interrogate a file for its hard links, end-users can use:

Unix-like emulation or compatibility software running on Microsoft Windows, such as Cygwin and Subsystem for UNIX-based Applications, allow the use of POSIX interfaces.

OpenVMS supports hard links on the ODS-5 file system. [15] Unlike Unix, VMS can create hard links to directories.

See also

Related Research Articles

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk, before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

In computing, a symbolic link is a file whose purpose is to point to a file or directory by specifying a path thereto.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

<span class="mw-page-title-main">Filename</span> Text string used to uniquely identify a computer file

A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths.

The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

File attributes are a type of meta-data that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted. The availability of most file attributes depends on support by the underlying filesystem where attribute data must be stored along with other control structures. Each attribute can have one of two states: set and cleared. Attributes are considered distinct from other metadata, such as dates and times, filename extensions or file system permissions. In addition to files, folders, volumes and other file system objects may have attributes.

In computing, a file shortcut is a handle in a user interface that allows the user to find a file or resource located in a different directory or folder from the place where the shortcut is located. Similarly, an Internet shortcut allows the user to open a page, file or resource located at a remote Internet location or Web site.

In computer data storage, a volume or logical drive is a single accessible storage area with a single file system, typically resident on a single partition of a hard disk. Although a volume might be different from a physical disk drive, it can still be accessed with an operating system's logical interface. However, a volume differs from a partition.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

alias (command) Command in various command line interpreters

In computing, alias is a command in various command-line interpreters (shells), which enables a replacement of a word by another string. It is mainly used for abbreviating a system command, or for adding default arguments to a regularly used command. alias is available in Unix shells, AmigaDOS, 4DOS/4NT, KolibriOS, Windows PowerShell, ReactOS, and the EFI shell. Aliasing functionality in the MS-DOS and Microsoft Windows operating systems is provided by the DOSKey command-line utility.

<span class="mw-page-title-main">PhotoRec</span> Open source data recovery software

PhotoRec is a free and open-source utility software for data recovery with text-based user interface using data carving techniques, designed to recover lost files from various digital camera memory, hard disk and CD-ROM. It can recover the files with more than 480 file extensions . It is also possible to add custom file signature to detect less known files.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">The Sleuth Kit</span>

The Sleuth Kit (TSK) is a library and collection of Unix- and Windows-based utilities for extracting data from disk drives and other storage so as to facilitate the forensic analysis of computer systems. It forms the foundation for Autopsy, a better known tool that is essentially a graphical user interface to the command line utilities bundled with The Sleuth Kit.

An NTFS reparse point is a type of NTFS file system object. It is available with the NTFS v3.0 found in Windows 2000 or later versions. Reparse points provide a way to extend the NTFS filesystem. A reparse point contains a reparse tag and data that are interpreted by a filesystem filter driver identified by the tag. Microsoft includes several default tags including NTFS symbolic links, directory junction points, volume mount points and Unix domain sockets. Also, reparse points are used as placeholders for files moved by Windows 2000's Remote Storage Hierarchical Storage System. They also can act as hard links, but are not limited to pointing to files on the same volume: they can point to directories on any local volume. The feature is inherited to ReFS.

The NTFS file system defines various ways to redirect files and folders, e.g., to make a file point to another file or its contents without making a copy of it. The object being pointed to is called the target. Such file is called a hard or symbolic link depending on a way it's stored on the filesystem.

<span class="mw-page-title-main">PowerShell</span> Cross-platform command-line interface and scripting language for system and network administration

PowerShell is a task automation and configuration management program from Microsoft, consisting of a command-line shell and the associated scripting language. Initially a Windows component only, known as Windows PowerShell, it was made open-source and cross-platform on August 18, 2016, with the introduction of PowerShell Core. The former is built on the .NET Framework, the latter on .NET.

In computing, pushd and popd are a pair of commands which allow users to quickly switch between the current and previous directory when using the command line. When called, they use a directory stack to sequentially save and retrieve directories visited by the user.

References

  1. Pitcher, Lew. "Q & A: The difference between hard and soft links".
  2. "Link Shell Extension".
  3. "Resilient File System (ReFS) overview". Microsoft Learn. 26 October 2022 via Microsoft Docs.
  4. Bach, Maurice J. (1986). The Design of the UNIX Operating System . Prentice Hall. p. 128. ISBN   9780132017992.
  5. Pond, James (August 31, 2013). "How Time Machine Works its Magic". File System Event Store, Hard Links. Retrieved May 19, 2019.
  6. "Linux kernel source tree, fs/ext4/ext4.h, line 229".
  7. "CreateHardLinkA function (winbase.h)". Windows App Development. 13 October 2021 via Microsoft Docs.
  8. Brown, Neil (23 November 2010). "Ghosts of Unix past, part 4: High-maintenance designs". Linux Weekly News. Retrieved 20 April 2014.
  9. "How hard links work". Microsoft Docs.
  10. "CreateHardLink Function". Windows Development. Microsoft. 10 March 2011. Archived from the original on 2 July 2011 via MSDN. Establishes a hard link between an existing file and a new file. This function is only supported on the NTFS file system, and only for files, not directories.{{cite web}}: CS1 maint: unfit URL (link)
  11. 1 2 "Fsutil hardlink". Windows App Development. Microsoft. 18 April 2012 via Microsoft Docs.
  12. "Mklink". Microsoft Docs. Microsoft. 18 April 2012.
  13. 1 2 "New-Item (PowerShell 3.0)". Microsoft Docs. Microsoft. 22 June 2020. If your location is in a FileSystem drive, the following values are allowed: If your location is in a FileSystem drive, the following values are allowed: File[,] Directory[,] Junction[,] HardLink
  14. 1 2 "FileSystemProvider.cs". PowerShell / PowerShell repo. Microsoft. 20 November 2021. Lines 8139–8234 via GitHub.
  15. "OpenVMS System Manager's Manual, Vol. I" (PDF). VSI. August 2019. Retrieved 2021-01-23.