TimeML

Last updated

TimeML is a set of rules for encoding documents electronically. It is defined in the TimeML Specification version 1.2.1 [1] developed by several efforts, led in large part by the Laboratory for Linguistics and Computation at Brandeis University.

Contents

The TimeML project's goal is to create a standard markup language for temporal events in a document. TimeML addresses four problems regarding event markup, including time stamping (with which an event is anchored to a time), ordering events with respect to one another, reasoning with contextually underspecified temporal expressions, and reasoning about the length of events and their outcomes. [2]

History

TimeML was conceptualized in 2002 during the TERQAS (Time and Event Recognition for Question Answering Systems) workshops, organized by Professor James Pustejovsky of Brandeis University. The TERQAS Workshops set out to address the problem of how to enhance natural language question answering systems to answer temporally-based questions about the events and entities in news articles. During these workshops, TimeML version 1.0 was defined, and the TimeBank corpus was created as an illustration.

In 2003, the TANGO (TimeML Annotation Graphical Organizer) workshops produced a graphical annotation tool for TimeML.

The TARSQI (Temporal Awareness and Reasoning Systems for Question Interpretation) project currently develops algorithms that tag events and time expressions in natural language texts, anchor them temporally, and order them.

Versions

According to the official TimeML website, there are currently three versions of the TimeML specification language, although it is rumored that other versions exist. [3]

Version 1.1

TimeML version 1.1 was produced in 2004.

Version 1.2

TimeML version 1.2 was produced in 2004, shortly after the release of version 1.1.

Version 1.2.1

In 2005, version 1.2.1 was defined. There were several changes made to the language, and are described in the version 1.2.1 TimeML guideline as such:

  • The nf_morph attribute that was part of MAKEINSTANCE has been changed to pos (part of speech), and the PRESPART, PASTPART, and INFINITIVE elements of nf_morph redistributed to tense.
  • The optional syntax attribute was added to SLINK, ALINK, and TLINK. Syntax can be used to hold CDATA, but is generally only used by annotation programs to hold the data that led to the creation of the tag.
  • The optional comment attribute was added to all TimeML elements, for the purpose of giving (human) annotators a place to put observations about annotated text.

ISO-TimeML

ISO-TimeML was presented to the ISO for consideration as a standard in August 2007. It was then revised, voted on, and approved as an international standard by March 2009.

Work Group Members

TimeML Tags

The following tags defined by the TimeML specification version 1.2.1. [4]

TIMEML

The TIMEML tag is similar to the root tag in an XML document. It declares that the rest of the document surrounded by the TIMEML tag is encoded with TimeML tags.

EVENT

The EVENT tag is used to annotate those elements in a text that mark the semantic events described by it. Syntactically, EVENTs are typically verbs, although event nominals, such as "crash" in "...killed by the crash", will also be annotated as EVENTs. The EVENT tag is also used to annotate a subset of the states in a document. This subset of states includes those that are either transient or explicitly marked as participating in a temporal relation. See the TimeML annotation guidelines for more details.

TIMEX3

The TIMEX3 tag is primarily used to mark up explicit temporal expressions, such as times, dates, durations, etc. It is modeled on Setzer's (2001) TIMEX tag, as well as the TIDES (Ferro, et al. (2002)) TIMEX2 tag. Since it differs both in attribute structure and in use, it seemed best to give it a separate name, which reveals its heritage while at the same time indicating that it is different from its forebears.

SIGNAL

The SIGNAL tag represents a temporal signal. These are any function words that suggest a particular temporal relationship. Example SIGNALs are: when, in, after.

TLINKS, or Temporal Links establish relationships between two or more events for the purpose of ordering them in time. Temporal links are the most prevalent as they show how the TimeML elements (events and temporal expressions) are temporally related to each other.

Events that are marked as ASPECTUAL introduce an ALINK or Aspectual Link. These links are quite straightforward as they only occur when an aspectual event has another event as an argument. For example, in the sentence "Mary completed the marathon", "completed" is an aspectual event while marathon is an occurrence.

Like the ALINK, SLINKs (Subordinate Links) are only introduced by certain event classes, namely reporting events, intensional events (I ACTION and I STATE), and perception events. Additionally, these events must subordinate another event by taking it as an argument. SLINKs are used essentially to allow for temporal relationships to be given even for events that may or may not have happened. For example, reporting events such as "said" introduce an EVIDENTIAL SLINK. Consider the sentence ""He didn’t even stop,” one witness said". Here the "stop" event is being subordinated by the "said" event.

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and is a ISO/IEC standard ISO/IEC 40314 since 2015.

Property Specification Language (PSL) is a temporal logic extending linear temporal logic with a range of operators for both ease of expression and enhancement of expressive power. PSL makes an extensive use of regular expressions and syntactic sugaring. It is widely used in the hardware design and verification industry, where formal verification tools and/or logic simulation tools are used to prove or refute that a given PSL formula holds on a given design.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

<span class="mw-page-title-main">CellML</span>

CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation. CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.

<span class="mw-page-title-main">Object Process Methodology</span> Modelling language and methodology for capturing knowledge and designing systems

Object process methodology (OPM) is a conceptual modeling language and methodology for capturing knowledge and designing systems, specified as ISO/PAS 19450. Based on a minimal universal ontology of stateful objects and processes that transform them, OPM can be used to formally specify the function, structure, and behavior of artificial and natural systems in a large variety of domains.

James Pustejovsky is an American computer scientist. He is the TJX Feldberg professor of computer science at Brandeis University in Waltham, Massachusetts, United States. His expertise includes theoretical and computational modeling of language, specifically: Computational linguistics, Lexical semantics, Knowledge representation, temporal and spatial reasoning and Extraction. His main topics of research are Natural language processing generally, and in particular, the computational analysis of linguistic meaning. He holds a B.S. from MIT as well as a PhD from the University of Massachusetts, Amherst.

In linguistics, coercion is a term applied to a process of reinterpretation triggered by a mismatch between the semantic properties of a selector and the semantic properties of the selected element. As Catalina Ramírez explains it, this phenomenon is called coercion because the process forces meaning into a lexical phrase where there is otherwise a discrepancy of the semantic aspects of the phrase. The term was first used in the semantic literature in 1988 by Marc Moens and Mark Steedman, who adopted it due to its "loose analogy with type-coercion in programming languages.” In his written framework of the generative lexicon, Pustejovsky (1995:111) defines coercion as "a semantic operation that converts an argument to the type which is expected by a function, where it would otherwise result in a type error."

In the Java computer programming language, an annotation is a form of syntactic metadata that can be added to Java source code. Classes, methods, variables, parameters and Java packages may be annotated. Like Javadoc tags, Java annotations can be read from source files. Unlike Javadoc tags, Java annotations can also be embedded in and read from Java class files generated by the Java compiler. This allows annotations to be retained by the Java virtual machine at run-time and read via reflection. It is possible to create meta-annotations out of the existing ones in Java.

The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

The Internationalization Tag Set (ITS) is a set of attributes and elements designed to provide internationalization and localization support in XML documents.

Machine interpretation of documents and services in Semantic Web environment is primarily enabled by (a) the capability to mark documents, document segments and services with semantic tags and (b) the ability to establish contextual relations between the tags with a domain model, which is formally represented as ontology. Human beings use natural languages to communicate an abstract view of the world. Natural language constructs are symbolic representations of human experience and are close to the conceptual model that Semantic Web technologies deal with. Thus, natural language constructs have been naturally used to represent the ontology elements. This makes it convenient to apply Semantic Web technologies in the domain of textual information. In contrast, multimedia documents are perceptual recording of human experience. An attempt to use a conceptual model to interpret the perceptual records gets severely impaired by the semantic gap that exists between the perceptual media features and the conceptual world. Notably, the concepts have their roots in perceptual experience of human beings and the apparent disconnect between the conceptual and the perceptual world is rather artificial. The key to semantic processing of multimedia data lies in harmonizing the seemingly isolated conceptual and the perceptual worlds. Representation of the Domain knowledge needs to be extended to enable perceptual modeling, over and above conceptual modeling that is supported. The perceptual model of a domain primarily comprises observable media properties of the concepts. Such perceptual models are useful for semantic interpretation of media documents, just as the conceptual models help in the semantic interpretation of textual documents.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

A temporal expression in a text is a sequence of tokens that denote time, that is express a point in time, a duration or a frequency. Examples:

He was born on <TIMEX>6 May, 1980</TIMEX>.
The show lasted <TIMEX>7 minutes</TIMEX>.
The pump circulates the water <TIMEX>every 2 hours</TIMEX>.

The Semantic Sensor Web (SSW) is a marriage of sensor web and semantic Web technologies. The encoding of sensor descriptions and sensor observation data with Semantic Web languages enables more expressive representation, advanced access, and formal analysis of sensor resources. The SSW annotates sensor data with spatial, temporal, and thematic semantic metadata. This technique builds on current standardization efforts within the Open Geospatial Consortium's Sensor Web Enablement (SWE) and extends them with Semantic Web technologies to provide enhanced descriptions and access to sensor data.

ISO 24617-1:2009, ISO-TimeML is the International Organization for Standardization ISO/TC37 standard for time and event markup and annotation. The scope is standardization of principles and methods relating to the annotation of temporal events in the contexts of electronic documentation and language.

<span class="mw-page-title-main">Apache cTAKES</span> Natural language processing system

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context, and negated/not negated.

Temporal annotation is the study of how to automatically add semantic information regarding time to natural language documents. It plays a role in natural language processing and computational linguistics.

Drama annotation is the process of annotating the metadata of a drama. Given a drama expressed in some medium, the process of metadata annotation identifies what are the elements that characterize the drama and annotates such elements in some metadata format. For example, in the sentence "Laertes and Polonius warn Ophelia to stay away from Hamlet." from the text Hamlet, the word "Laertes", which refers to a drama element, namely a character, will be annotated as "Char", taken from some set of metadata. This article addresses the drama annotation projects, with the sets of metadata and annotations proposed in the scientific literature, based markup languages and ontologies.

References

  1. "TimeML Specification 1.2.1". catalog.ldc.upenn.edu. Retrieved 2021-01-29.
  2. "TimeML Specification Language". cs.brandeis.edu. Retrieved 2021-01-29.
  3. "TimeML Documents". www.timeml.org. Archived from the original on 21 July 2007. Retrieved 17 January 2022.
  4. "TimeML Specification 1.2.1". www.timeml.org. Archived from the original on 8 August 2007. Retrieved 17 January 2022.