Case preservation

Last updated
The lowercase "a" and uppercase "A" are the two case variants of the first letter in the English alphabet. A (capital and small).svg
The lowercase "a" and uppercase "A" are the two case variants of the first letter in the English alphabet.

In file systems, case preservation is the preservation of the letter case (uppercase or lowercase) of letters in file names. If an attempt is made to create a file named "ThisIsAFile" on a file system that preserves letter case, the file's name will be "ThisIsAFile", rather than, for example, "thisisafile" or "THISISAFILE".

Contents

In contrast, a file system that does not preserve letter case will typically store letters in file names either as all lowercase or as all uppercase, and the letter case information will thus be lost. If an attempt is made to create a file named "ThisIsAFile" on a file system that does not preserve letter case, the file's name will be "thisisafile" if letters are stored as all lowercase or "THISISAFILE" if letters are stored as all uppercase.

Combinations of preservation and sensitivity

Case-preserving, case-insensitive

It is possible and common for a system to be case-insensitive, yet case-preserving. This combination is often considered most natural for people to understand, because most people prefer using the correct capitalization but will still recognize others. For example, if someone refers to the "uNiTeD states oF AMERICA," it is understood to mean the United States of America, even though the capitalization is incorrect.

Most of the file systems in macOS, current versions of Microsoft Windows, and all versions of Amiga OS are case-preserving and case-insensitive. Since they are case-insensitive, any combination of lowercase or uppercase letters can be used when referring a file, so that a file named "ThisIsAFile" can be referred to as "thisisafile", "THISISAFILE", "thisISaFILE", and so on. However, since they are case-preserving, when a file is created, the file name will be stored in the combination of lower and upper case letters specified, so that if a file is created as "ThisIsAFile", the name of the file will be "ThisIsAFile" rather than, for example, "thisisafile" or "THISISAFILE".

This means that one cannot save two files with the same name in the same place if the only difference in their file names is capitalization (lowercase or uppercase letters). For example, one cannot have files named readme.txt and Readme.tXT in the same folder.

Non-case-preserving, case-insensitive

A system that is non-case-preserving is necessarily also case-insensitive.

This applies, for example, to Identifiers (column and table names) in some relational databases (for example DB2, Interbase/Firebird, Oracle and Snowflake [1] ), unless the identifier is specified within double quotation marks (in which case the identifier becomes case-sensitive). [2]

In a non-case-preserving system, arbitrary capitalization may be used by the system for storage and display, such as for example all letters being store in lowercase (or alternatively all in uppercase). For example, in Oracle Database, a table created with the name CustomersRegion1 will be stored as CUSTOMERSREGION1 (unless it is created under the name "CustomersRegion1", which means that the identifier will be treated as case-sensitive, and therefore must be referenced with that exact casing).

Case-sensitive (results in case-preservation)

Most of the file systems in Unix-like systems other than macOS, such as file systems in Linux, are case-sensitive. This means that there can be two files in the same folder whose only difference is capitalization. For example, readme.txt and Readme.tXT can be found in the same folder.

Examples of file systems

Some examples of file systems with various case-sensitivity and case-preservation are:

Case-sensitiveCase-insensitive
Case-preserving UFS, ext3, ext4, HFS Plus (optional), NTFS (in Unix-like systems), APFS (optional) VFAT, FAT32 which is basically always used with long filename support, NTFS, HFS Plus (default), APFS (default)
Non-case-preservingImpossible FAT12, FAT16 only when without long filename support

Related Research Articles

<span class="mw-page-title-main">Camel case</span> Writing words with internal uppercase letters

Camel case is the practice of writing phrases without spaces or punctuation and with capitalized words. The format indicates the first word starting with either case, then the following words having an initial uppercase letter. Common examples include YouTube, PowerPoint, HarperCollins, FedEx, iPhone, eBay, and LaGuardia. Camel case is often used as a naming convention in computer programming. It is also sometimes used in online usernames such as JohnSmith, and to make multi-word domain names more legible, for example in promoting EasyWidgetCompany.com.

In computing, a computer file is a resource for recording data on a computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a computer file. Files can be shared with and transferred between computers and mobile devices via removable media, networks, or the Internet.

<span class="mw-page-title-main">Case sensitivity</span> Defines whether uppercase and lowercase letters are treated as distinct

In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct (case-sensitive) or equivalent (case-insensitive). For instance, when users interested in learning about dogs search an e-book, "dog" and "Dog" are of the same significance to them. Thus, they request a case-insensitive search. But when they search an online encyclopedia for information about the United Nations, for example, or something with no ambiguity regarding capitalization and ambiguity between two or more terms cut down by capitalization, they may prefer a case-sensitive search.

An 8.3 filename is one that obeys the filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alternate filename to the long filename, to provide compatibility with legacy programs. The filename convention is limited by the FAT file system. Similar 8.3 file naming schemes have also existed on earlier CP/M, TRS-80, Atari, and some Data General and Digital Equipment Corporation minicomputer operating systems.

Aljbore.com

Alternating caps, also known as studly caps or sticky caps, is a form of text notation in which the capitalization of letters varies by some pattern, or arbitrarily, such as "aLtErNaTiNg CaPs, "sTuDlY cApS" or "sTiCkY cApS".

Capitalization or capitalisation is writing a word with its first letter as a capital letter and the remaining letters in lower case, in writing systems with a case distinction. The term also may refer to the choice of the casing applied to text.

An environment variable is a user-definable value that can affect the way running processes will behave on a computer. Environment variables are part of the environment in which a process runs. For example, a running process can query the value of the TEMP environment variable to discover a suitable location to store temporary files, or the HOME or USERPROFILE variable to find the directory structure owned by the user running the process.

<span class="mw-page-title-main">Filename</span> Text string used to uniquely identify a computer file

A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths.

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.

<span class="mw-page-title-main">README</span> Software information file

In software development, a README file contains information about the other files in a directory or archive of computer software. A form of documentation, it is usually a simple plain text file called README, Read Me, READ.ME, README.TXT, README.md, or README.1ST.

<span class="mw-page-title-main">Letter case</span> Uppercase or lowercase

Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.

<span class="mw-page-title-main">Iota subscript</span> Diacritic mark in the Greek alphabet

The iota subscript is a diacritic mark in the Greek alphabet shaped like a small vertical stroke or miniature iota ⟨ι⟩ placed below the letter. It can occur with the vowel letters eta ⟨η⟩, omega ⟨ω⟩, and alpha ⟨α⟩. It represents the former presence of an offglide after the vowel, forming a so‐called "long diphthong". Such diphthongs —phonologically distinct from the corresponding normal or "short" diphthongs —were a feature of ancient Greek in the pre-classical and classical eras.

In computer programming, a naming convention is a set of rules for choosing the character sequence to be used for identifiers which denote variables, types, functions, and other entities in source code and documentation.

In computer data storage, a volume or logical drive is a single accessible storage area with a single file system, typically resident on a single partition of a hard disk. Although a volume might be different from a physical disk drive, it can still be accessed with an operating system's logical interface. However, a volume differs from a partition.

Long filename (LFN) support is Microsoft's backward-compatible extension of the 8.3 filename naming scheme used in DOS. Long filenames can be more descriptive, including longer filename extensions such as .jpeg, .tiff, .html, and .xhtml that are common on other operating systems, rather than specialized shortened names such as .jpg, .tif, .htm, or .xht. The standard has been common with File Allocation Table (FAT) filesystems since its first implementation in Windows NT 3.5 of 1994.

In a writing system, a letter is a grapheme that generally corresponds to a phoneme—the smallest functional unit of speech—though there is rarely total one-to-one correspondence between the two. An alphabet is a writing system that uses letters.

Gene nomenclature is the scientific naming of genes, the units of heredity in living organisms. It is also closely associated with protein nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international committee published recommendations for genetic symbols and nomenclature in 1957. The need to develop formal guidelines for human gene names and symbols was recognized in the 1960s and full guidelines were issued in 1979. Several other genus-specific research communities have adopted nomenclature standards, as well, and have published them on the relevant model organism websites and in scientific journals, including the Trends in Genetics Genetic Nomenclature Guide. Scientists familiar with a particular gene family may work together to revise the nomenclature for the entire set of genes when new information becomes available. For many genes and their corresponding proteins, an assortment of alternate names is in use across the scientific literature and public biological databases, posing a challenge to effective organization and exchange of biological information. Standardization of nomenclature thus tries to achieve the benefits of vocabulary control and bibliographic control, although adherence is voluntary. The advent of the information age has brought gene ontology, which in some ways is a next step of gene nomenclature, because it aims to unify the representation of gene and gene product attributes across all species.

In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

Truecasing, also called capitalization recovery, capitalization correction, or case restoration, is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. This commonly comes up due to the standard practice of automatically capitalizing the first word of a sentence. It can also arise in badly cased or noncased text.

References

  1. "Identifier requirements | Snowflake Documentation". docs.snowflake.com. Retrieved 2024-02-08.
  2. "Database identifiers, quoting and case sensitivity". Lorenzo Alberton. Retrieved 2024-02-08.