OLAC

Last updated

OLAC, the Open Language Archives Community, is an initiative to create a unified means of searching online databases of language resources for linguistic research. The information about resources is stored in XML format for easy searching. OLAC was founded in 2000, and is hosted at the Linguistic Data Consortium webserver at the University of Pennsylvania.

Contents

OLAC advises on best practices in language archiving, and works to promote interoperation between language archives.

Metadata

The OLAC metadata set is based on the complete set of Dublin Core metadata terms DCMT, but the format allows for the use of extensions to express community-specific qualifiers. It is often contrasted to IMDI (ISLE Metadata Initiative).

Attributes

The OLAC metadata is based on five primary attributes, refine, code, scheme, lang, and langs, although the last attribute is only for completed metadata sets. [1] Each attribute serves a different function and is applicable in a different section of the metadata.

Table 1: Attributes and Their Functions
AttributeFunction
Refinequalifying the meaning of certain elements, reducing the element to a "particular controlled vocabulary or notation" [2]
Code"holding metadata values from a specific encoding scheme" [1]
Schemestandardizes how "the text in the content of the element will be encoded" [1]
Langprovides the name of the language that is in the text [1]
Langsprovides the name of the language that "the metadata record is designed to be read" in [1]

Elements

There are currently 23 different elements that OLAC lists on its metadata page. Elements may be used more than once, and not every element is required in a metadata submission. Each element's entry on the official OLAC page includes the name of the element, its function, notes on its usage, and examples of its coding. [1]

In addition, OLAC provides a list of metadata extensions to augment descriptions. [3]

Related Research Articles

Dublin Core Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set, is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has been formally standardized as ISO 15836, ANSI/NISO Z39.85, and IETF RFC 5013. The core properties are part of a larger set of DCMI Metadata Terms. "Dublin Core" is also used as an adjective for Dublin Core metadata, a style of metadata that draws on multiple RDF vocabularies, packaged and constrained in Dublin Core application profiles.

HTML Hypertext Markup Language

Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

Ogg Digital container format

Ogg is a free, open container format maintained by the Xiph.Org Foundation. The creators of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality digital multimedia. Its name is derived from "ogging", jargon from the computer game Netrek.

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Text Encoding Initiative An academic community concerned with practices for semantic markup of texts

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains an eponymous technical standard, a journal, a wiki, a GitHub repository and a toolchain.

XBRL Exchange format for business information

XBRL is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based and uses the XML syntax and related XML technologies such as XML Schema, XLink, XPath, and Namespaces. One use of XBRL is to define and exchange financial information, such as a financial statement. The XBRL Specification is developed and published by XBRL International, Inc. (XII).

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by International Organization for Standardization (ISO) on 1 February 2007.

The PBCore metadata standard was created by the public broadcasting community in the United States of America for use by public broadcasters and related communities that manage audiovisual assets, including libraries, archives, independent producers, etc. PBCore is organized as a set of specified fields that can be used in database applications, and it can be used as a data model for media cataloging and asset management systems. As an XML schema, PBCore enables data exchange between media collections, systems and organizations.

Learning object metadata Data model

Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).

The Department of Defense Discovery Metadata Specification is a Net-Centric Enterprise Services (NCES) metadata initiative. DDMS is loosely based on the Dublin Core vocabulary. DDMS defines discovery metadata elements for resources posted to community and organizational shared spaces. It is sometimes (incorrectly) referred to as DoD Discovery Metadata Standard. The project focuses both on the process of developing a central taxonomy for metadata, and defining a way of discovering resources by their metadata using that taxonomy.

This article describes the technical specifications of the OpenDocument office document standard, as developed by the OASIS industry consortium. A variety of organizations developed the standard publicly and make it publicly accessible, meaning it can be implemented by anyone without restriction. The OpenDocument format aims to provide an open alternative to proprietary document formats.

In the Java computer programming language, an annotation is a form of syntactic metadata that can be added to Java source code. Classes, methods, variables, parameters and Java packages may be annotated. Like Javadoc tags, Java annotations can be read from source files. Unlike Javadoc tags, Java annotations can also be embedded in and read from Java class files generated by the Java compiler. This allows annotations to be retained by the Java virtual machine at run-time and read via reflection. It is possible to create meta-annotations out of the existing ones in Java.

RDFa is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The RDF data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability and data exchange for different types of information resources.

In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such. For clarity, attributes should more correctly be considered metadata. An attribute is frequently and generally a property of a property. However, in actual usage, the term attribute can and is often treated as equivalent to a property depending on the technology being discussed. An attribute of an object usually consists of a name and a value; of an element, a type or class name; of a file, a name and extension.

Metadata Data about data

Metadata is "data that provides information about other data". In other words, it is "data about data". Many distinct types of metadata exist, including descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

EPUB E-book file format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook standard.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

ISO/IEC 19788Information technology – Learning, education and training – Metadata for learning resources is a multi-part standard prepared by subcommittee SC36 of the Joint Technical Committee ISO/IEC JTC1, Information Technology for Learning, Education and Training. This committee was created to deal with the consequences of substantial overlap in areas of standardization done at the International Organization for Standardization (ISO) and the International Electrotechnical Commission.

The SAML metadata standard belongs to the family of XML-based standards known as the Security Assertion Markup Language (SAML) published by OASIS in 2005. A SAML metadata document describes a SAML deployment such as a SAML identity provider or a SAML service provider. Deployments share metadata to establish a baseline of trust and interoperability.

References

  1. 1 2 3 4 5 6 "OLAC Metadata Set". www.language-archives.org. Retrieved 2020-09-17.
  2. Bird, Steven; Simons, Gary (2001). "The OLAC Metadata Set and Controlled Vocabularies". ACL Anthology. Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources.
  3. "Recommended metadata extensions". www.language-archives.org. Retrieved 2020-09-17.