Semantic desktop

Last updated

In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different applications or tasks and so that data that once could not be automatically processed by a computer could be. It also encompasses some ideas about being able to share information automatically between different people. This concept is very much related to the Semantic Web, but is distinct insofar as its main concern is the personal use of information.

Contents

Problems to solve

The vision of the semantic desktop can be considered as a response to the perceived problems of existing user interfaces.

Metadata

Without good metadata, computers cannot easily learn many commonly needed attributes about files. For example, suppose one downloads a document by a particular author on a particular subject - though the document will likely clearly indicate its subject, author, source and possibly copyright information there may be no easy way for the computer to obtain this information and process it across applications like file managers, desktop search engines, and other services. This means the computer cannot search, filter or otherwise act upon the information as effectively as it otherwise could. This is very much the problem that the Semantic Web is concerned with.

File structure

Researchers in the iMemex project provide the following query examples [1] :

  1. “Show me all LaTeX ‘Introduction’ sections pertaining to project PIM that contain the phrase ‘Mike Franklin’.”
  2. “Show me all documents pertaining to project ‘OLAP’ that have a figure containing the phrase ‘Indexing Time’ in its label.”

Both of these queries need to parse the file structure, the first one to find a section in a LaTeX document, the second one to find figures and their labels in documents of any format, both of which current OSs don’t know how to do.

Inside-outside file boundary

A user might want te relate in a single query information that is maintained by the file system, such as placement in a folder, and information that is inside a file. With current technology, this query cannot be issued in one single request.

In query example 1 above, the project information is only materialized in the folder hierarchy; the rest of the filters relate to the inside of the file, and some of it needs to parse the file structure (see above). This leads to performing a first query in the file system and further search inside a file.

Data-application coupling

There is also the problem of relating different files with each other. For example, on operating systems such as Unix, e-mails are stored separately from files. Neither has anything to do with tasks, notes or planned activities that may be stored in a calendar program. Contacts might be stored in another program. However, all these forms of information might simultaneously be relevant and necessary for a particular task.

Data locality and sharing

Related to this, a user will often access a lot of data from the Internet which are segregated from the data stored locally on the computer and accessed through a browser or other program. Researchers in the iMemex project provide the example of searching both in the local folder hierarchy and also in email attachments, which are located on an IMAP server [1] (see above, query example 2). In addition, the folder hierarchies are often different on both systems.

As well as accessing data, a user has to share data, often through e-mail or separate file transfer programs.

Definition

The semantic desktop is an attempt to solve some or all of these problem by extending the operating system's capabilities to handle all data using Semantic Web technologies. Based on this data integration, improved user interfaces (or plugins to existing applications) can give the user an integrated view on stored knowledge.

Sauermann et al. proposed a definition of Semantic Desktop in 2005:

A Semantic Desktop is a device in which an individual stores all her digital information like documents, multimedia and messages. These are interpreted as Semantic Web resources, each is identified by a Uniform Resource Identifier (URI) and all data is accessible and queryable as Resource Description Framework (RDF) graph. Resources from the web can be stored and authored content can be shared with others. Ontologies allow the user to express personal mental models and form the semantic glue interconnecting information and systems. Applications respect this and store, read and communicate via ontologies and Semantic Web protocols. The Semantic Desktop is an enlarged supplement to the user’s memory [2] .

Different interpretations of the semantic desktop

There are various interpretations of the semantic desktop. At its most limited state it might be interpreted as adding mechanisms for relating machine readable metadata to files. In a more extreme way it could be viewed as a complete replacement to existing user interfaces, which unifies all forms of data and provides a consistent single interface. There are many degrees between these two depending on which of the above problems are being dealt with.

Standardization effort

To foster interoperability between different implementations and publish standards, the community around the Nepomuk project founded the OSCA Foundation (OSCAF) [3] in 2008. Since June 2009, the developers from the Nepomuk-KDE communities and Xesam collaborate with OSCAF to help standardizing the data formats for KDE, GNOME and freedesktop.org. The Nepomuk/OSCAF standards are taken up by these projects and Nokia's Maemo Platform. [4]

Relationship with other concepts

Semantic Web

The Semantic Web is mainly concerned with making machine readable metadata to enable computers to process shared information, and the creation of formats and standards related to this. As such the aims of allowing more of a user's data to be processed by a computer and allowing data to more easily be shared could be considered as a subset of those of the Semantic Web, but extended to a user's local computer, rather than just files stored on the Internet.

However the aims of creating a unified interface and allowing data to be accessed in a format independent way are not really the concerns of the Semantic Web.

In practice most projects related to the semantic desktop make use of Semantic Web protocols for storing their data. In particular RDF's concepts are used, and the format itself is used.

Semantic file systems

Semantic file systems allow the user to query files by semantic metadata. As such, they can be considered a part of the semantic desktop.

Some operating systems such as BeOS include a semantic file system, which is a move towards a more semantic desktop.

See also

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. It was the first metadata standard for describing web content. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to provide a unified interface for viewing all the data within an organization, and a single set of structures and naming conventions to represent this data; the goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source.

<span class="mw-page-title-main">Desktop search</span>

Desktop search tools search within a user's own computer files as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video. A variety of desktop search programs are now available; see this list for examples. Most desktop search programs are standalone applications. Desktop search products are software alternatives to the search software included in the operating system, helping users sift through desktop files, emails, attachments, and more.

<span class="mw-page-title-main">KDE Software Compilation 4</span> Software

KDE Software Compilation 4 was the only series of the so-called KDE Software Compilation, first released in January 2008 and the last release being 4.14.3 released in November 2014. It was the follow-up to K Desktop Environment 3. Following KDE SC 4, the compilation was broken up into basic framework libraries, desktop environment and applications, which are termed KDE Frameworks 5, KDE Plasma 5 and KDE Applications, respectively.

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.

<span class="mw-page-title-main">Semantic technology</span> Technology to help machines understand data

The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF and OWL. These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

The concept of the Social Semantic Web subsumes developments in which social interactions on the Web lead to the creation of explicit and semantically rich knowledge representations. The Social Semantic Web can be seen as a Web of collective knowledge systems, which are able to provide useful information based on human contributions and which get better as more people participate. The Social Semantic Web combines technologies, strategies and methodologies from the Semantic Web, social software and the Web 2.0.

<span class="mw-page-title-main">Tracker (search software)</span>

Tracker is a file indexing and search framework for Linux and other Unix-like systems. It is written in the C programming language.

Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than the location as with current file systems. It allows the data to be addressed by their content. Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.

NEPOMUK is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF. Between 2006 and 2008 it was funded by a European Union research project of the same name that grouped together industrial and academic actors to develop various Semantic Desktop technologies.

Strigi was a file indexing and file search framework adopted by KDE SC. Strigi was initiated by Jos van den Oever. Strigi's goals are to be fast, use a small amount of RAM, and use flexible backends and plug-ins. A benchmark as of January 2007 showed that Strigi is faster and uses less memory than other search systems, but it lacks many of their features. Like most desktop search systems, Strigi can extract information from files, such as the length of an audio clip, the contents of a document, or the resolution of a picture; plugins determine what filetypes it is capable of handling. Strigi uses its own Jstream system which allows for deep indexing of files. Strigi is accessible via Konqueror, or by clicking on its icon, after adding it to KDE's Kicker or GNOME Panel. The graphical user interface (GUI) is named Strigiclient.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

References

  1. 1 2 Dittrich, Jens-Peter; Vaz Salles, Marcos Antonio (2006). "iDM: A Unified and Versatile Data Model for Personal Dataspace Management" (PDF). International Conference on Very Large Databases: 367–378.
  2. Sauermann, Leo; Bernardi, Ansgar; Dengel, Andreas (2005). "Overview and Outlook on the Semantic Desktop" (PDF). ISWC. 175: 1–19.
  3. "OSCA Foundation". OSCA Foundation. Archived from the original on 2014-01-02.{{cite web}}: CS1 maint: unfit URL (link)
  4. "OSCAF ontologies suited for Nokia's Maemo platform". OSCA Foundation. Archived from the original on 2013-11-27.{{cite web}}: CS1 maint: unfit URL (link)

Open Source Implementations