EIDR

Last updated
Entertainment ID Registry Association
Formation2010
Type 501(c)(6) not-for-profit membership corporation
HeadquartersRedwood City, CA
Executive Director
Hollie Choi
Website eidr.org

The Entertainment Identifier Registry, or EIDR, is a global unique identifier system for a broad array of audiovisual objects, including motion pictures, television, and radio programs. The identification system resolves an identifier to a metadata record that is associated with top-level titles, edits, DVDs, encodings, clips, and mashups. EIDR also provides identifiers for video service providers, such as broadcast and cable networks.

Contents

As of June 2020, EIDR contains over two million records, including almost 400 thousand movies and almost one million episodes from over 40,000 TV series. [1]

EIDR is an implementation of a digital object identifier (DOI).

History

Media asset identification systems have existed for decades. The common motivation for their creation is to enable the management of media assets through the assignment of a unique id to a set of metadata representing salient characteristics of each asset. Over time such systems tend to proliferate, with each arising to deal with a specific set of issues. As a result, there is considerable variation between systems in terms of which assets are categorized, which metadata is associated with each asset, and the very definition of an asset. To name a few examples, should a "director's cut" of a film be distinct from the original theatrical release? How should regional variations (e.g. translation of the title or dialog into foreign languages) be accounted for? Further complications include the procedures (and required credentials) for adding new assets, editing existing assets, and creating derivative assets.

EIDR was created to address these issues, as well as others encountered in video asset workflows, both in a business-to-business context and the intramural post-production activities of content producers. EIDR has the following characteristics:

EIDR is intended to supplement, not replace, existing asset identification systems. To the contrary, a key feature is to allow an EIDR record to include references to that asset's ID under other systems. This feature is particularly useful for film and television archives, making it easy for them to cross-reference their holdings with other sources for the work and metadata about it. By design, EIDR does not replicate features of other asset ID systems, e.g. commercial systems that seek to add value through enhanced metadata (e.g. plot summaries, production details). It is also a non-goal to track ownership and rights information, which can, however, be implemented as applications that use the EIDR ID.

Content model

EIDR is built on a collection of records (which are further sub-divided into fields) that are stored in a central registry. These records are referenced externally by DOIs, which are assigned when a record is created, and each identifier is immutable thereafter. The identifier resolution system underlying DOIs is the Handle System and so each native EIDR Content ID is a handle formatted, in increasing specificity, to handle, DOI and EIDR standards.

Content ID format

The canonical form of an EIDR Content ID is an instance of a handle and has the format:

10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C

where

There is also a 96-bit compact binary form that is intended for embedding in small payloads such as watermarks. This form is generated from the canonical format as follows:

The Uniform Resource Name form for an EIDR ID is specified in RFC   7302.

For use on the web an EIDR content ID can be represented as a URI in one of these forms:

Record types

There are four types of content records, each associated with a reserved prefix:

The sub-prefixes 5237, 5238, 5239, and 5240 are all assigned to the EIDR Association.

Content Records

Content records are objects categorized by their types and relationships. Each has three different (orthogonal) kinds of type:

Basic metadata

The following fields (taken from a larger set) comprise the base object data of a content record:

Deleted content records

An EIDR ID must be always resolvable, thus under normal circumstances the corresponding Content Record will be permanent. There are two mechanisms available to deal with errors or other unusual circumstances. The preferred one is aliasing, whereby an EIDR ID is transparently redirected to another content record. Aliasing is commonly employed to deal with an asset being registered twice.

The other mechanism is the use of tombstone records. This is employed when the Content Record is corrupted, or an otherwise invalid asset was accidentally registered. In this case the ID will be aliased to a special tombstone record. The tombstone can be recognized by applications because its EIDR ID field will be set to the distinguished value " 10.5240/0000-0000-0000-0000-0000-X ". Note that "X" means the 24th letter of the Latin alphabet (ASCII 0x58 or Unicode U+0058).

Alternate ID

Having a rich set of alternate IDs for content is one of the primary goals of EIDR. This allows EIDR IDs to be used everywhere in content workflows; if an alternate ID is needed it can be found in the metadata for the EIDR ID. EIDR supports the inclusion both proprietary and other standard (e.g. ISAN) ID references. Additional Alternate IDs can be added when needed (e.g. by parties wanting to support new workflows). Below is an example of alternate IDs for the EIDR asset 10.5240/EA73-79D7-1B2B-B378-3A73-M (the movie Blade Runner ). If an alternate ID is resolvable algorithmically, for example by placing it appropriately in a template URL, EIDR makes that link available.

Alternate IDs for 10.5240/EA73-79D7-1B2B-B378-3A73-M
Alternate ID 0000-0000-14A9-0000-K-0000-0000-E
Type: ISAN
Alternate ID #2 89
Type: IVA
Alternate ID #3 B000SW4DLM
Type: Proprietary Domain: amazon.com
Alternate ID #4 12886
Type: Proprietary Domain: flixster.com
Alternate ID #515042
Type: Proprietary Domain: thecinemasource.com
Alternate ID #6 tt0083658
Type: IMDB Relation: IsSameAs
Alternate ID #7E0087486000
Type: Proprietary Domain: spe.sony.com/MPM
Alternate ID #83929
Type: Proprietary Domain: spe.sony.com/ProductID
Alternate ID #92002029
Type: Proprietary Domain: warnerbros.com/MPM
Alternate ID #10389785
Type: Proprietary Domain veronicamagazine.nl
Alternate ID #11 B001EC2J1G
Type: Proprietary Domain: amazon.com
Alternate ID #12 150002645
Type: Proprietary Domain: bfi.org.uk

Alternate IDs are partitioned into non-proprietary and proprietary. The former have distinguished, predefined types (e.g. those issued by ISAN, IMDb, and IVA), whereas proprietary IDs are all of type "Proprietary", and are further distinguished by an associated DNS domain. As of July 2017, there are over 2 million alternate IDs directly available through EIDR.

Relationships between objects

Content objects can be related to each other according to the following table. These relations are expressed as additional fields in the content record and are thus relative to that object. Note that the subject object is the child and the target is the parent (e.g. subject is<relation-type>Of parent). Additional constraints are noted in the table.

Inheritance relationships: The object on which the relationship exists can inherit basic metadata fields from the object to which the relationship refers. Only one inheritance relationship may exist on an object. These relationships produce a tree structure rooted in the EIDR ID for an abstraction.
isSeasonOfA group of series episodes released over a contiguous span of time (e.g. broadcast year) e.g. 10.5240/AB95-8734-5D98-A282-2DF0-C ("Season 9") is a season of 10.5240/C272-DA64-E2B5-0A78-2AC3-Z ("The X-Files")
isEpisodeOfe.g. 10.5240/E008-224D-0397-0560-6300-8 ("Sunshine Days") is an episode of 10.5240/AB95-8734-5D98-A282-2DF0-C ("Season 9").
isEditOfAn instance of a title with unique characteristics that differentiate it from any other version. For example, 10.5240/7290-C8AD-12BA-4F93-3B07-7 ("Blade Runner: The Director's Cut") is an edit of 10.5240/EA73-79D7-1B2B-B378-3A73-M.
isManifestationOfA manifestation is a more specific instance of a work that can be sold, transmitted, transferred or played. The parent of a manifestation should be an edit. For example, 10.5240/9CE1-DE39-5F3E-073D-4307-7 is the Ultraviolet Standard CFF (standard definition, English audio and subtitles) for "Blade Runner: The Director's Cut". It is a manifestation of the abstract work 10.5240/EA73-79D7-1B2B-B378-3A73-M.
isClipOfOne (and only one) contiguous fragment of an asset.
Dependence relationships: The objects to which the relationship refers have a strong bearing on the basic nature of the object on which the relationship exists. This means that the objects referred to in the relationship must be taken into account when checking for duplicates when an object is created or modified. These relationships produce directed graphs within and across trees.
isCompositeOfA single work composed of parts of multiple other records.
isCompilationOfA collection of multiple whole works that is not more precisely describable.
Lightweight relationships: There is no inheritance; the objects to which they refer do not influence the underlying nature of the object on which the relationship exists. These relationships are used primarily when moving around within the object tree and connecting object trees to each other, producing a directed graph across elements of those trees.
isPackagingOfFor creating a collection of assets that are released together e.g. 10.5240/F219-975E-5990-4570-BA75-2 ("Hannah Montana and Miley...") is a packaging of 10.5240/9ABE-2BF1-ACE7-EBA2-8E57-N.
isPromotionOfPromotional objects such as a trailer.
isSupplementToAncillary material that might be found on a DVD, such as an outtake or behind-the-scenes feature.
isAlternateContentForContent that in synchronized to the main asset, such as audio or an alternate camera angle.

Use in standards and applications

EIDR has been incorporated into many standards. A few of the more significant ones are listed here:

EIDR identifiers have found their way into an increasing number of commercial applications. The following are illustrative of some of the advantages of using EIDR:

Operations & Administrative

EIDR is administered by the non-profit EIDR Association, which was founded in October 2010 by MovieLabs, CableLabs, Comcast and Rovi. Membership has grown steadily since then: as of late-2014 it has 79 members divided between the Industry Promoters and Industry Contributor levels. The fastest growing category is non-US companies, which now accounts for about 20% of membership. The EIDR Association operates two EIDR registries: Production and Sandbox. The former is the official site, and the latter is reserved for test and development. Both systems are available publicly online, but the contents of the sandbox are not guaranteed to be correct, complete, or even to refer to assets that exist. Only members of the EIDR association may modify the registry.

Registration

Registration of new assets can be done individually or in bulk (up to 100,000 assets at a time). In either case, the workflow comprises a combination of automated (to perform well-defined but tedious tasks) and manual (where human judgment is called for) processes. It is also iterative, as the initial matching process may identify a variety of gaps and errors that need to be dealt with.

Registering new assets is a complex process that requires some preparation, particularly in the case of bulk submission. The automated processes will check syntax, make sure that the basic metadata is supplied, and that any dependencies (e.g. series records created before constituent episodes) are honored. Manual steps include making sure the correct Parties are associated with the asset. One of the most important steps is ensuring that a new asset does not already exist in the registry: this is covered in the next section.

In order to register a new asset a user must be associated with a party that has been granted the "Registrant" role by the EIDR operator. A registrant may be a principal agent, such as a studio or an encoding house, but it may also be a Party doing bulk registration of back-catalogue items, or a Party acting on behalf of someone else. It is also a requirement that a registrant be an EIDR member. In general, content ownership, metadata authority, and registration capability are separate and unrelated concepts.

Deduplication

This refers to flagging assets being submitted to the registry as falling into one of the following three categories:

This assessment is based on applying a (large) set of rules to the candidate asset, which results a numerical score. Bucketing occurs as the result of comparing the score to two thresholds:

Assets falling between the low and high threshold are deemed to have a high possibility of being a duplicate: the proposed record addition/modification will not proceed until manually reviewed by EIDR operations staff.

Architecture

The components of the EIDR system are shown below.

EIDR Registry Architecture EIDR Architecture 01.png
EIDR Registry Architecture

The principal functional blocks are as follows:

Relation to DOI and Handle System

An EIDR ID is a specialized example of a Digital Object Identifier (DOI), which in turn is built on top of the Handle System developed by the Corporation for National Research Initiatives (CNRI). The EIDR-specific aspects of the lower layers are described in more detail below.

Digital Object Identifier (EIDR Aspects)

A Digital Object Identifier, standardized as ISO 26324, [16] seeks to uniquely identify a wide range of digital artifacts including books, recordings, research data, and other digital content. The goal is not just for the IDs to be unique, but persistent and immutable. As opposed to URLs, DOI identifiers stay the same even if the objects move to another location, or become owned by another organization. Here are some of the characteristics of DOI:

The DOI data model provides the means to associate metadata with each object, as well as policies governing its use. In the words of the DOI Handbook, metadata may include "names, identifiers, descriptions, types, classifications, locations, times, measurements, relationships and any other kind of information related to [an object]." Metadata flows between the following entities:

To foster interoperability between RAs, DOI has the concept of a metadata Kernel. This is a core set of metadata that all objects stored within the DOI framework should have. The full set may be found in the DOI handbook. Interoperability is a large topic extending beyond the scope of EIDR, but the following subset is particularly relevant to EIDR assets:

EIDR metadata is available in standard DOI kernel metadata format as well as EIDR-specific formats. The DOI for the DOI metadata schema is doi : 10.1000/276.

Handle System (EIDR Aspects)

DOI is in turn implemented on top of the Handle System, a distributed, highly scalable, name resolution service. A handle is defined as:

<Handle> ::= <Handle Naming Authority> "/" <Handle Local Name>

The Naming Authority is globally unique and defines both an administrative space and the syntax of the Handle Local Name. For EIDR in the definition above, the "10.5240" is the EIDR Naming Authority, and is responsible for resolving the suffix (including that it conforms to the expected syntax for an EIDR asset). The range of allowable Naming Authorities is more general than is employed by DOI (or EIDR).

The distributed nature of the Handle System allows each local namespace to be hosted on multiple geographically distributed service sites. This is a federated model where each local name space has complete control over the placement and operation of its service sites. Furthermore, each service site may contain multiple resolution servers: requests directed to a particular service site will be dispatched evenly across its constituent servers.

The data model of the Handle System is simple but flexible. An arbitrary number of values may be associated with each handle. Over time, these values may be created, modified, and destroyed. Each such datum has the following attributes:

Accessing the Handle System is done via a wire protocol defined in RFC 3652; EIDR applications don't have to be concerned with this because of the layering of protocols.

See also

Further reading

  1. R. Kroon, R. Drewry, A. Leigh, S. McConnachie. "Content Identification for Audiovisual Archives". International Association of Sound and Audiovisual Archives Journal, Summer 2015 (No. 45).
  2. R. Kroon. "Bringing Order to Digital Identifiers". Media and Entertainment Journal Winter 2014-2015: 148–150.
  3. R. Drewry, D. Dulchinos. "Transforming Entertainment Through Technology". Media and Entertainment Journal Winter 2013-2014: 81–88.
  4. D. Agranoff, W. Michel, T. Wakai. "Streamlined Content Metadata Integration and Management Using Entertainment ID Registry (EIDR)". SCTE Cable-Tec Expo 2012.

Related Research Articles

<span class="mw-page-title-main">ISBN</span> Unique numeric book identifier since 1970

The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.

<span class="mw-page-title-main">Identifier</span> Name that identifies a unique entity

An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical countable object, or physical noncountable substance. The abbreviation ID often refers to identity, identification, or an identifier. An identifier may be a word, number, letter, symbol, or any combination of those.

Material Exchange Format (MXF) is a container format for professional digital video and audio media defined by a set of SMPTE standards. A typical example of its use is for delivering advertisements to TV stations and tapeless archiving of broadcast TV programs. It is also used as part of the Digital Cinema Package for delivering movies to commercial theaters.

<span class="mw-page-title-main">Universally unique identifier</span> Label used for information in computer systems

A Universally Unique Identifier (UUID) is a 128-bit label used for information in computer systems. The term Globally Unique Identifier (GUID) is also used, mostly in Microsoft systems.

The Advanced Authoring Format (AAF) is a file format for professional cross-platform data interchange, designed for the video post-production and authoring environment. It was created by the Advanced Media Workflow Association (AMWA), and is now being standardized through the Society of Motion Picture and Television Engineers (SMPTE).

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. DOIs have also been used to identify other types of information resources, like commercial videos.

Digital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files.

A FourCC is a sequence of four bytes used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts Interchange File Format and derivatives. The idea was later reused to identify compressed data types in QuickTime and DirectShow.

<span class="mw-page-title-main">International Standard Audiovisual Number</span> Unique identifier for audiovisual works and related versions, similar to ISBN for books

International Standard Audiovisual Number (ISAN) is a unique identifier for audiovisual works and related versions, similar to ISBN for books. It was developed within an ISO TC46/SC9 working group. ISAN is managed and run by ISAN-IA.

The ISO/IEC 11179 metadata registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

<span class="mw-page-title-main">Fedora Commons</span>

Fedora is a digital asset management (DAM) content repository architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedora is the underlying architecture for a digital repository, and is not a complete management, indexing, discovery, and delivery application. It is a modular architecture built on the principle that interoperability and extensibility are best achieved by the integration of data, interfaces, and mechanisms as clearly defined modules.

AES52 is a standard first published by the Audio Engineering Society in March 2006 that specifies the insertion of unique identifiers into the AES3 digital audio transport structure.

<span class="mw-page-title-main">International Standard Name Identifier</span> 16 digit identifier for people and organisations

The International Standard Name Identifier (ISNI) is an identifier system for uniquely identifying the public identities of contributors to media content such as books, television programmes, and newspaper articles. Such an identifier consists of 16 digits. It can optionally be displayed as divided into four blocks.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

<span class="mw-page-title-main">Archival Resource Key</span> Form of URLs used as persistent identifiers

An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI).

<span class="mw-page-title-main">ORCID</span> Code to uniquely identify scientific and other academic authors

The ORCID is a nonproprietary alphanumeric code to uniquely identify authors and contributors of scholarly communication as well as ORCID's website and services to look up authors and their bibliographic output.

<span class="mw-page-title-main">Coalition for Innovative Media Measurement</span> U.S. trade group

The Coalition for Innovative Media Measurement (CIMM) is a trade group founded in 2009 by television content providers, media agencies and advertisers to explore ways to measure audiences across media in the United States.

Sort Of is a Canadian television sitcom, released on CBC Television in 2021. Created by Bilal Baig and Fab Filippo, the series stars Baig as Sabi Mehboob, a non-binary millennial trying to balance their roles as a child of Pakistani immigrant parents, a bartender at an LGBTQ bookstore and café, and a caregiver to the young children of a professional couple.

Interoperable Master Format (IMF) is a container format for the standardized digital delivery and storage of finished audio-visual masters, including movies, episodic content and advertisements.

References

  1. "Batman comic fetches $1.075 million, rewrites record". Reuters. 2010-02-26. Retrieved 2023-08-18.
  2. ISO/IEC 7064:2003: Information technology -- Security techniques -- Check character systems. 2002
  3. W3C XML Schema Part 2: Datatypes Second Edition
  4. SMPTE RP 2079. DOI Name and EIDR Identifier Representation.
  5. Advanced Media Workflow Association AS-03 MXF Program Delivery Specification.
  6. Advanced Media Workflow Association AS-11 MFX for Contribution Specification.
  7. SMPTE RP 2021-5:2013. Using Ad-ID and EIDR as Alternate Identifiers in SMPTE BXF and ATSC PMCP.
  8. EBU TECH 3293. EBU CORE METADATA SET Version 1.5.
  9. DVB Document A167-2.Digital Video Broadcasting (DVB); Companion Screens and Streams; Part 2: Content Identification and Media Synchronisation, July, 2014. p. 52.
  10. ISO/IEC CD 23000-15. Information technology - Multimedia application format (MPEG-A) -- Part 15: Multimedia preservation application format.
  11. MD-SP-AMIv3.0-I02-121210 Archived 2015-04-02 at the Wayback Machine . CableLabs Asset Management Interface 3.0 Specification.
  12. ANSI/SCTE 35 2013. Digital Program Insertion Cueing Message for Cable.
  13. SCTE 130-10 2013. Digital Program Insertion – Advertising Systems Interfaces, Part 10.
  14. TC 372 Workshop Compendium. How EN 15744 and EN 15907 came into being.
  15. Press Release. Swisscom completes the first European deployment of the Entertainment ID Registry with media-press.tv.
  16. ISO 26324:2012: Information and documentation -- Digital object identifier system, 2012.