Data Catalog Vocabulary

Last updated

Data Catalog Vocabulary (DCAT) is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in catalogs, publishers increase discoverability and enable applications to consume metadata from multiple catalogs. It enables decentralized publishing of catalogs and facilitates federated dataset search across catalogs. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation. [1]

The original DCAT vocabulary was developed at DERI, as an idea from Vassilios Peristeras and his master student Fadi Maali together also with Richard Cyganiak. The vocabulary was further developed by W3C's eGov Interest Group, then brought onto the Recommendation Track by W3C's "Government Linked Data" Working Group. DCAT is the foundation for open dataset descriptions in the European Union public sector and was adapted by the ISA programme of the European Commission. [2] A 2022 report reviews DCATAP compliance on national data portals. [3] :77–79

DCAT v2 was published as a W3C Recommendation 2020-02-04. [4] Version 2 adds support for cataloguing data services or APIs, and has stronger support for expressing relationships between datasets. An alignment to Schema.org is included.

As DCAT is extensible, more specific extensions have been created in the statistical and geodata domains. [5] [6]

An open-source licensed porting of the version DCAT-AP 2.0.1 compatible with NGSI-LD API standard is available in the DCAT-AP subject at Smart Data Models program.

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

<span class="mw-page-title-main">World Wide Web Consortium</span> Main international standards organization for the World Wide Web

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the World Wide Web. As of 5 March 2023, W3C had 462 members. W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

The Device Description Repository (DDR) is a concept proposed by the Mobile Web Initiative Device Description Working Group (DDWG) of the World Wide Web Consortium. The DDR is supported by a standard interface and an initial core vocabulary of device properties. Implementations of the proposed repository are expected to contain information about Web-enabled devices. Authors of Web content will be able to make use of repositories to adapt their content to best suit the requesting device. This will facilitate the interaction and viewing of Web pages across devices with widely varying capabilities.

<span class="mw-page-title-main">HTML5</span> Fifth and current version of hypertext markup language

HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

The W3C Device Description Working Group (DDWG), operating as part of the World Wide Web Consortium (W3C) Mobile Web Initiative (MWI), was chartered to "foster the provision and access to device descriptions that can be used in support of Web-enabled applications that provide an appropriate user experience on mobile devices." Mobile devices exhibit the greatest diversity of capabilities, and therefore present the greatest challenge to content adaptation technologies. The group published several documents, including a list of requirements for an interface to a Device Description Repository (DDR) and a standard interface meeting those requirements.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

The Vocabulary of Interlinked Datasets (VoID) is an RDF vocabulary, and a set of instructions, that enables the discovery and usage of linked data sets. A linked dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferencable URIs. VoID is used to provide metadata on RDF datasets to facilitate query processing on a graph of interlinked datasets in the semantic web.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to this structured data because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

The Indexed Database API is a JavaScript application programming interface (API) provided by web browsers for managing a NoSQL database of objects. It is a standard maintained by the World Wide Web Consortium (W3C).

JSON-LD is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data to be serialized in a way that is similar to traditional JSON. It was initially developed by the JSON for Linking Data Community Group before being transferred to the RDF Working Group for review, improvement, and standardization, and is currently maintained by the JSON-LD Working Group. JSON-LD is a World Wide Web Consortium Recommendation.

HTML5 Audio is a subject of the HTML5 specification, incorporating audio input, playback, and synthesis, as well as in the browser. iOS

<span class="mw-page-title-main">Asset Description Metadata Schema</span>

The Asset Description Metadata Schema (ADMS) is a common metadata vocabulary to describe standards, so-called interoperability assets, on the Web.

On April 21, 2021 data.europa.eu was launched as a single access point for open data published by EU Institutions, national portals of EU Member states and non-member states, as well as international organisations of predominantly European scope. The portal consolidates datasets previously available via the EU Open Data Portal and the European Data Portal into a single meta-catalogue. The European Data Portal, launched in its beta version on November 16, 2015, was an initiative of the European Commission, and part of the Digital Single Market.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."

References

  1. Fadi Maali; John Erickson; Phil Archer (16 January 2014). "Data Catalog Vocabulary (DCAT)". The World Wide Web Consortium (W3C). Retrieved 23 August 2017.
  2. "DCAT application profile for data portals in Europe" . Retrieved 23 August 2017.
  3. Carsaniga, Giulia; Lincklaen Arriëns, Eline N; Dogger, Jochem; van Assen, Mariska; Cecconi, Gianfranco (December 2022). Open data maturity report 2022 (PDF). Luxembourg, Belgium: Publications Office of the European Union. doi:10.2830/70973. ISBN   978-92-78-43386-4 . Retrieved 2023-07-03. Open Access logo PLoS transparent.svg
  4. Riccardo Albertoni; David Browning; Simon Cox; Alejandra Gonzalez Beltran; Andrea Perego; Peter Winstanley (4 February 2020). "Data Catalog Vocabulary (DCAT) - Version 2". The World Wide Web Consortium (W3C). Retrieved 5 February 2020.
  5. "StatDCAT application profile for data portals in Europe". ISA Programme, European Commission. Retrieved 23 August 2017.
  6. "GeoDCAT-AP v1.0". ISA Programme, European Commission. Archived from the original on 23 August 2017. Retrieved 23 August 2017.