Uniform Type Identifier

Last updated

A Uniform Type Identifier (UTI) is a text string used on software provided by Apple Inc. to uniquely identify a given class or type of item. Apple provides built-in UTIs to identify common system objects – document or image file types, folders and application bundles, streaming data, clipping data, movie data – and allows third party developers to add their own UTIs for application-specific or proprietary uses. Support for UTIs was added in the Mac OS X 10.4 operating system, integrated into the Spotlight desktop search technology, which uses UTIs to categorize documents. One of the primary design goals of UTIs was to eliminate the ambiguities and problems associated with inferring a file's content from its MIME type, filename extension, or type or creator code. [1]

Contents

UTIs use a reverse-DNS naming structure. Names may include the ASCII characters A–Z, a–z, 0–9, hyphen ("-"), and period ("."), and all Unicode characters above U+007F. [1] Colons and slashes are prohibited for compatibility with Macintosh and POSIX file path conventions. UTIs support multiple inheritance, allowing files to be identified with any number of relevant types, as appropriate to the contained data.

Background

One of the difficulties in maintaining a user-accessible operating system is establishing connections between data types and the applications or processes that can effectively use such data. For example, a file that contains picture data in a particular compression format can only be opened and processed in applications that are capable of handling picture data, and those applications must be able to identify which compression type was used in order to extract and work with that data. In early computer systems – particularly DOS, its variants, and some versions of Windows – file associations are maintained by file extensions. The three to four character code following a file name instructs the system to open the file in particular applications.

Beginning with System 1, [2] Macintosh operating systems have attached type codes and creator codes as part of the file metadata. These four-character codes were designed to specify both the application that created the file (the creator code) and the specific type of the file (the type code) so that other applications could easily open and process the file data. However, while type and creator codes extended the flexibility of the system — a particular type of file was not restricted to opening in a particular application — they suffered many of the same problems as file extensions. Type and creator codes could be lost when files were transferred across non-Macintosh systems (such as Unix-based servers), and the plethora of type codes made identification problematic.

In addition, the classic Mac OS did not recognize file extensions at all, leading to unrecognized file errors when files were transferred from DOS/Windows systems. OPENSTEP, which formed the basis of Mac OS X, used extensions, and early versions of Mac OS X followed suit. This led to some controversy with users and developers coming to OS X from NeXT or Windows origins advocating for continued use of file extensions, and those coming from Classic Mac OS urging Apple to replace or supplement file extensions with type and creators. [3]

Other file identification types exist: for example, MIME types are used for identifying data that is transferred over the web. However, Apple's UTI system was designed to create a flexible file association system that would describe data hierarchically and allow for better categorization and searching, standardize data descriptions across contexts, and provide a uniform method of expanding data types. For instance, the public.jpeg and public.png UTIs inherit from the public.image UTI, allowing users to search narrowly for JPEG images or PNG images or broadly for any kind of image merely by changing the specificity of the UTI used in the search. Further, application developers who design new data types can easily extend the UTIs available. For example, a new image format developed by a company may have a UTI of com.company.proprietary-image and be specified to inherit from the public.image type.

Apple's macOS continues to support other forms of file association, and contains utilities for translating between them, but will use UTIs by preference where available.

UTI structure

Apple maintains the public.* domain as a set base data types for all UTIs. Other UTIs are associated with these base UTIs by conformance, a system similar to class inheritance. UTIs that conform to other UTIs share a basic types, and in general any application that works with data of a more general UTI should be able to work with data of any UTI that conforms to that general UTI.

Apple public UTIs

The most basic public UTIs in the Apple hierarchy are as follows:

IdentifierConforms toComment
public.itembase class in the physical hierarchy
public.contentbase class for all document content
public.datapublic.itembase class for all files, byte streams, pasteboard, etc.
public.imagepublic.data, public.contentbase class for all images

UTIs are even used to identify other file type identifiers:

IdentifierConforms toComment
public.filename-extensionpublic.case-insensitive-text Filename extension
public.mime-typepublic.case-insensitive-text MIME type
com.apple.ostypepublic.textFour-character code (type OSType)
com.apple.nspboard-typepublic.text NSPasteboard type

Dynamic UTIs can be created as needed by applications; these have the prefix dyn. and take the form of "a UTI-compatible wrapper around an otherwise unknown filename extension, MIME type, OSType, and so on." [1]

Third-party UTIs

Apple provides a large collection of system-declared Uniform Type Identifiers. Third-party applications can add UTIs to the database maintained by macOS by "exporting" UTIs declared within the application package. Because new UTIs can be declared to "conform to" existing system UTIs, and declarations can associate the new UTIs with file extensions, an exported declaration alone can provide the operating system with enough information to enable new functions, such as enabling Quick Look for new file types.

List of common third-party UTIs

DescriptionUTIExtensionsConforms toMIME typesReference URL
OPML documentorg.opml.opml.opmlpublic.xmltext/xml, text/x-opml, application/xml http://dev.opml.org/spec2.html
Markdown documentnet.daringfireball.markdown [4] .md, .markdownpublic.plain-texttext/markdown http://daringfireball.net/projects/markdown/
SQLite databasevnd.sqlite3 [5] .sqlite3, .sqlite, .dbpublic.database, public.dataapplication/vnd.sqlite3 https://www.sqlite.org/fileformat2.html


Looking up a UTI

To get the UTI of a given file, use the mdls (meta data list, part of Spotlight) command in the Terminal.

mdls -name kMDItemContentType -name kMDItemContentTypeTree -name kMDItemKind FILE

Related Research Articles

ISO 9660 File system for CD-R and CD-ROM optical discs

ISO 9660 is a file system for optical disc media. Being sold by the International Organization for Standardization (ISO) the file system is considered an international technical standard. Since the specification is available for anybody to purchase, implementations have been written for many operating systems.

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

The resource fork is a fork or section of a file on Apple's classic Mac OS operating system, which was also carried over to the modern macOS for compatibility, used to store structured data along with the unstructured data stored within the data fork.

A filename extension, file extension or file type is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces.

Creator code

A creator code is a mechanism introduced in the classic Mac OS to link a data file to the application program which created it. The similar type code held the file type, like "TEXT". Together, the type and creator indicated what application should be used to open a file, similar to the file extensions in other operating systems.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

Filename Text string used to uniquely identify a computer file

A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths and the allowed characters within filenames.

Mac OS X 10.1 Second major release of OS X

Mac OS X 10.1 is the second major release of macOS, Apple's desktop and server operating system. It superseded Mac OS X 10.0 and preceded Mac OS X 10.2. Version 10.1 was released on September 25, 2001 as a free update for Mac OS X 10.0 users. The operating system was handed out for no charge by Apple employees after Steve Jobs' keynote speech at the Seybold publishing conference in San Francisco. It was subsequently distributed to Mac users on October 25, 2001 at Apple Stores and other retail stores that carried Apple products.

MacBinary is a file format that combines the two forks of a classic Mac OS file into a single file, along with HFS's extended metadata. The resulting file is suitable for transmission over FTP, the World Wide Web, and electronic mail. The documents can also be stored on computers that run operating systems with no HFS support, such as Unix or Windows.

A FourCC is a sequence of four bytes used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts Interchange File Format and derivatives. The idea was later reused to identify compressed data types in QuickTime and DirectShow.

In computing, a file association associates a file with an application capable of opening that file. More commonly, a file association associates a class of files with a corresponding application.

Raster graphics editors can be compared by many variables, including availability.

In NeXTSTEP, OPENSTEP, GNUstep, and their lineal descendants macOS and iOS, a bundle is a file directory with a defined structure and file extension, allowing related files to be grouped together as a conceptually single item.

A media type is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications. Media types were originally defined in Request for Comments RFC 2045 (MIME) Part One: Format of Internet Message Bodies in November 1996 as a part of MIME specification, for denoting type of email message content and attachments; hence the original name, MIME type. Media types are also used by other internet protocols such as HTTP and document file formats such as HTML, for similar purposes.

Apple's Macintosh computer supports a wide variety of fonts. This support was one of the features that initially distinguished it from other systems.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

Apple Disk Image Disk image file format developed by Apple and commonly used by macOS

AppleDisk Image is a disk image format commonly used by the macOS operating system. When opened, an Apple Disk Image is mounted as a volume within the Macintosh Finder.

In the Apple macOS operating system, a package is a file system directory that is normally displayed to the user by the Finder as if it were a single file. Such a directory may be the top-level of a directory tree of objects stored as files, or it may be other archives of files or objects for various purposes, such as installer packages, or backup archives.

Classic Mac OS Original operating system of Apple Mac (1984–2001)

The Classic Mac OS is the series of operating systems developed for the Macintosh family of personal computers by Apple Inc. from 1984 to 2001, starting with System 1 and ending with Mac OS 9. The Macintosh operating system is credited with having popularized the graphical user interface concept. It was included with every Macintosh that was sold during the era in which it was developed, and many updates to the system software were done in conjunction with the introduction of new Macintosh systems.

High Efficiency Image File Format (HEIF) is a container format for individual images and image sequences. The standard covers multimedia files that can also include other media streams, such as timed text, audio and video. A HEIF image using High Efficiency Video Coding, HEVC, requires less storage space than the equivalent quality JPEG. HEIF also supports animation, and is capable of storing more information than an animated GIF or APNG in less size. HEIF stores double the bit color depth at 16-bits compared to JPEG at only 8-bits, and "can store twice as many pictures in the HEIC format as in JPEG". AVIF, on the other hand, has "file size 10 times smaller than JPEG with the same image quality".

References

  1. 1 2 3 "Uniform Type Identifiers Overview". Guides and Sample Code. Apple Inc. October 29, 2007. Retrieved September 12, 2016.
  2. "Folklore.org: The Grand Unified Model (2) - The Finder". www.folklore.org. Retrieved April 12, 2018.
  3. "Mac OS X 10.1 File Name Extension Guidelines - Cocoabuilder". www.cocoabuilder.com. Retrieved April 12, 2018.
  4. "Uniform Type Identifier For Markdown". Daring Fireball. Retrieved August 21, 2019.
  5. "SQLite database file format media type at IANA". Internet Assigned Numbers Authority. IANA. Retrieved August 21, 2019.