Augmented Reality Markup Language

Last updated November 01, 2023

The Augmented Reality Markup Language (ARML)^[1] is a data standard to describe and interact with augmented reality (AR) scenes. It has been developed within the Open Geospatial Consortium (OGC) by a dedicated ARML 2.0 Standards Working Group.^[2] ARML consists of both an XML grammar to describe the location and appearance of virtual objects in the scene, as well as ECMAScript bindings to allow dynamic access to the properties of the virtual objects, as well as event handling, and is currently published in version 2.0. ARML focuses on visual augmented reality (i.e. the camera of an AR-capable device serves as the main output for augmented reality scenarios).

Data model

ARML is built on a generic object model that allows serialization in several languages. Currently, ARML defines an XML serialization, as well as a JSON serialization for the ECMAScript bindings. The ARML object model consists of three main concepts:

Features represent the physical object that should be augmented.
VisualAssets describe the appearance of the virtual object in the augmented scene.
Anchors describe the spatial relation between the physical and the virtual object.

Feature

The definition of a Feature is reused from the Geography Markup Language (GML) and describes the physical object that should be augmented. The physical object is described by a set of metadata, including an ID, a name and a description. A Feature has one or more Anchors.

Anchor

An Anchor describes the location of the physical object in the real world. Four different Anchor types are defined in ARML:

Geometries
Trackables
RelativeTo
ScreenAnchor

Geometries

Geometries describe the location of an object through a set of fixed coordinates. WGS84 (latitude, longitude, altitude) is used as the default coordinate reference system, other arbitrary coordinate reference systems can be supplied if required. ARML allows 0- (Point), 1- (LineString) and 2-dimensional (Polygon) geometries. Geometry Anchors reuse the syntax as defined in GML3. As an example, the following snippet defines the location of the Wiener Riesenrad.

<gml:Pointgml:id="ferrisWheelViennaPoint"><gml:pos>48.21662216.395901 </gml:pos></gml:Point>

Trackables

Trackables are patterns that are searched, recognized and tracked in the video screen coming from the camera of the device. A wide variety of different tracking technologies exist, including QR codes, Natural features, 3D and Face Tracking. As all these tracking types use different algorithms and technologies, the definition of a Trackable is abstracted and split into two parts, a Tracker and its associated Trackables. A Tracker describes the technology (or algorithm) with which its associated Trackables should be tracked, using URIs identifying the algorithm. The Trackable itself describes the pattern the algorithm should look for in the video stream.

Example: A natural feature tracker and an associated Trackable

<Trackerid="defaultImageTracker"><urixlink:href="http://opengeospatial.org/arml/tracker/genericImageTracker"/></Tracker><Trackable><config><trackerxlink:href="#defaultImageTracker"/><src>http://www.example.com/myMarker.jpg</src></config><size>0.20</size></Trackable>

RelativeTo

RelativeTo Anchors allow the definition of a location relative to other Anchors or the user's position. The former allows the setup of a scene and the location of all included virtual objects based on a single Anchor, like a Trackable placed on a table. The latter allows for scenarios where the actual location of the user is irrelevant. The virtual objects are simply placed around the user, regardless of his or her physical location.

ScreenAnchor

Contrary to the previous three Anchor types, ScreenAnchors do not describe a location in the 3-dimensional virtual scene. Instead, they define an area on the device screen, allowing for status bars and the like.

VisualAsset

VisualAssets describe the appearance of the virtual objects in the augmented scene. ARML allows various kinds of VisualAssets to be described, including plain text, images, HTML content and 3D models. VisualAssets can be oriented (either to always automatically face the user, or to maintain a specific static orientation) and scaled. Additionally, visibility conditions can be applied (i.e. the Asset is only visible on the screen if the distance to the user is within certain boundaries).

History

In late 2009, Wikitude (formerly Mobilizy), the creators of the Wikitude World Browser, started an early initiative on creating a format all AR Browsers at that time could adhere to, called the Augmented Reality Markup Language (ARML).^[3] This format is now called ARML 1.0 and serves as an input format for the Wikitude World Browser.

In late 2011, Martin Lechner, Wikitude's CTO and the main driver of the ARML initiative, established the Augmented Reality Markup Language 2.0 Standards Working Group (ARML 2.0 SWG) within the OGC.^[4] Its goal was to create an internationally accepted standard for Augmented Reality, based on the ideas of ARML 1.0 and similar formats. During ISMAR in Atlanta in November 2012, the first ARML 2.0 specification was officially published,^[5] making ARML 2.0 an official OGC Candidate Standard.

Related standards

ARML 2.0 is reusing ideas, structure, syntax and semantics of the following existing and widely used standards:^[6]

In addition, the following, ARML-independent initiatives also deal with creating standards for Augmented Reality environments:

Augmented Reality Application Format (ARAF) ^[7] developed within ISO/MPEG
KARML ^[8] developed by the Georgia Institute of Technology
MobAR ^[9] developed within the Open Mobile Alliance (OMA)

Examples

The following example describes a 3D Model (assuming one is available on http://www.example.com/myModel.dae) on a Trackable, like a fiducial marker, located at http://www.example.com/myMarker.jpg:

<arml><ARElements><!-- register the Tracker to track a generic image --><Trackerid="defaultImageTracker"><urixlink:href="http://opengeospatial.org/arml/tracker/genericImageTracker"/></Tracker><!-- define the artificial marker the Model will be placed on top of --><Trackable><assets><!-- define the 3D Model that should be visible on top of the marker --><Model><hrefxlink:href="http://www.example.com/myModel.dae"/></Model></assets><config><trackerxlink:href="#defaultImageTracker"/><src>http://www.example.com/myMarker.jpg</src></config><size>0.20</size></Trackable></ARElements></arml>

Related Research Articles

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.

<span class="mw-page-title-main">Augmented reality</span> View of the real world with computer-generated supplementary features

Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be defined as a system that incorporates three basic features: a combination of real and virtual worlds, real-time interaction, and accurate 3D registration of virtual and real objects. The overlaid sensory information can be constructive, or destructive. This experience is seamlessly interwoven with the physical world such that it is perceived as an immersive aspect of the real environment. In this way, augmented reality alters one's ongoing perception of a real-world environment, whereas virtual reality completely replaces the user's real-world environment with a simulated one.

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

A coverage is the digital representation of some spatio-temporal phenomenon. ISO 19123 provides the definition:

X3D is a set of royalty-free ISO/IEC standards for declaratively representing 3D computer graphics. X3D includes multiple graphics file formats, programming-language API definitions, and run-time specifications for both delivery and integration of interactive network-capable 3D data. X3D version 4.0 has been approved by Web3D Consortium, and is under final review by ISO/IEC as a revised International Standard (IS).

In computing, the Open Geospatial Consortium Web Feature Service (WFS) Interface Standard provides an interface allowing requests for geographical features across the web using platform-independent calls. One can think of geographical features as the "source code" behind a map, whereas the WMS interface or online tiled mapping portals like Google Maps return only an image, which end-users cannot edit or spatially analyze. The XML-based GML furnishes the default payload-encoding for transporting geographic features, but other formats like shapefiles can also serve for transport. In early 2006 the OGC members approved the OpenGIS GML Simple Features Profile. This profile is designed both to increase interoperability between WFS servers and to improve the ease of implementation of the WFS standard.

XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.

Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Keyhole Earth Viewer. It was created by Keyhole, Inc, which was acquired by Google in 2004. KML became an international standard of the Open Geospatial Consortium in 2008. Google Earth was the first program able to view and graphically edit KML files, but other projects such as Marble have added KML support.

Catalogue Service for the Web (CSW), sometimes seen as Catalogue Service - Web, is a standard for exposing a catalogue of geospatial records in XML on the Internet. The catalogue is made up of records that describe geospatial data, geospatial services, and related resources.

CityGML is an open standardised data model and exchange format to store digital 3D models of cities and landscapes. It defines ways to describe most of the common 3D features and objects found in cities and the relationships between them. It also defines different standard levels of detail (LoDs) for the 3D objects, which allows the representation of objects for different applications and purposes, such as simulations, urban data mining, facility management, and thematic inquiries.

GeoRSS is a specification for encoding location as part of a Web feed. (Web feeds are used to describe feeds of content, such as news articles, Audio blogs, video blogs and text blog entries. These web feeds are rendered by programs such as aggregators and web browsers.) The name "GeoRSS" is derived from RSS, the most known Web feed and syndication format.

The Open Geospatial Consortium Web Coverage Service Interface Standard (WCS) defines Web-based retrieval of coverages – that is, digital geospatial information representing space/time-varying phenomena.

A 3D city model is digital model of urban areas that represent terrain surfaces, sites, buildings, vegetation, infrastructure and landscape elements in three-dimensional scale as well as related objects belonging to urban areas. Their components are described and represented by corresponding two- and three-dimensional spatial data and geo-referenced data. 3D city models support presentation, exploration, analysis, and management tasks in a large number of different application domains. In particular, 3D city models allow "for visually integrating heterogeneous geoinformation within a single framework and, therefore, create and manage complex urban information spaces."

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

Animation of Scalable Vector Graphics, an open XML-based standard vector graphics format is possible through various means:

Wikitude is a mobile augmented reality (AR) technology provider based in Salzburg, Austria. Founded in 2008, Wikitude initially focused on providing location-based augmented reality experiences through the Wikitude World Browser App. In 2012, the company restructured it's proposition by launching the Wikitude SDK, a development framework utilizing image recognition and tracking, and geolocation technologies.

The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 and involves more than 500 commercial, governmental, nonprofit and research organizations in a consensus process encouraging development and implementation of open standards.

Commercial augmented reality (CAR) describes augmented reality (AR) applications that support various B2B (Business-to-Business) and B2C (Business-to-Consumer) commercial activities, particularly for the retail industry. The use of CAR started in 2010 with virtual dressing rooms for E-commerce.

References

↑ "OGC® Augmented Reality Markup Language 2.0 (ARML 2.0) - OGC". Opengeospatial.org.
↑ "ARML 2.0 SWG | OGC". 2017-08-24. Archived from the original on 2017-08-24. Retrieved 2022-10-09.
↑ "ARML - An Augmented Reality Standard" (PDF). Perey.com. Retrieved 27 December 2018.
↑ "The OGC seeks comments on candidate Augmented Reality Markup Language (ARML 2.0) standard - OGC". Opengeospatial.org.
↑ "The OGC Forms International Augmented Reality Standards Working Group - OGC". Opengeospatial.org.
↑ Martin Lechner, The Augmented Reality Markup Language 2.0, Dissertation
↑ "Augmented Reality Application Format - MPEG". Mpeg.chiariglione.org.
↑ "Home - KHARMA". Kharma.gatech.edu.
↑ "Releases - Mobile Augmented Reality Enabler v1.0". Archived from the original on 2014-01-06. Retrieved 2013-07-22.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "OGC® Augmented Reality Markup Language 2.0 (ARML 2.0) - OGC". Opengeospatial.org.

[2] "ARML 2.0 SWG | OGC". 2017-08-24. Archived from the original on 2017-08-24. Retrieved 2022-10-09.

[3] "ARML - An Augmented Reality Standard" (PDF). Perey.com. Retrieved 27 December 2018.

[4] "The OGC seeks comments on candidate Augmented Reality Markup Language (ARML 2.0) standard - OGC". Opengeospatial.org.

[5] "The OGC Forms International Augmented Reality Standards Working Group - OGC". Opengeospatial.org.

[6] Martin Lechner, The Augmented Reality Markup Language 2.0, Dissertation

[7] "Augmented Reality Application Format - MPEG". Mpeg.chiariglione.org.

[8] "Home - KHARMA". Kharma.gatech.edu.

[9] "Releases - Mobile Augmented Reality Enabler v1.0". Archived from the original on 2014-01-06. Retrieved 2013-07-22.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]