Haystack (MIT project)

Last updated November 10, 2023

Haystack is a project at the Massachusetts Institute of Technology to research and develop several applications around personal information management and the Semantic Web. The most notable of those applications is the Haystack client, a research personal information manager (PIM) and one of the first to be based on semantic desktop technologies.^[1] The Haystack client is published as open source software under the BSD license.

Adenine

Haystack was developed in the RDF-aware dynamic language Adenine which was created for the project.^[2] The language was named after the nuclease adenine and is a scripting language that is cross-platform. It is the perhaps the earliest example of a homoiconic general graph (rather than list/tree) programming language.^[3] A substantial characteristic of Adenine is that this language possesses native support for the Resource Description Framework (RDF). The language constructs of Adenine are derived from Python and Lisp. Adenine is written in RDF and thus also can be represented and written with RDF based syntaxes such as Notation3 (N3).

Active projects and recent research papers

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

Wiki software is collaborative software that runs a wiki, which allows the users to create and collaboratively edit pages or entries via a web browser. A wiki system is usually a web application that runs on one or more web servers. The content, including previous revisions, is usually stored in either a file system or a database. Wikis are a type of web content management system, and the most commonly supported off-the-shelf software that web hosting facilities offer.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

A personal information manager is a type of application software that functions as a personal organizer. The acronym PIM is now, more commonly, used in reference to personal information management as a field of study. As an information management tool, a PIM tool's purpose is to facilitate the recording, tracking, and management of certain types of "personal information".

<span class="mw-page-title-main">Kontact</span> Personal information manager software

Kontact is a personal information manager and groupware software suite developed by KDE. It supports calendars, contacts, notes, to-do lists, news, and email. It offers a number of inter-changeable graphical UIs all built on top of a common core.

Chandler is a discontinued personal information management software suite described by its developers as a "Note-to-Self Organizer" designed for personal and small-group task management and calendaring. It is free software, previously released under the GNU General Public License, and now released under the Apache License 2.0. It is inspired by a PIM from the 1980s called Lotus Agenda, notable because of its "free-form" approach to information management. Lead developer of Agenda, Mitch Kapor, was also involved in the vision and management of Chandler.

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

Ontotext is a software company with offices in Europe and USA. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different applications or tasks and so that data that once could not be automatically processed by a computer could be. It also encompasses some ideas about being able to share information automatically between different people. This concept is very much related to the Semantic Web, but is distinct insofar as its main concern is the personal use of information.

SIMILE was a joint research project run by the World Wide Web Consortium (W3C), Massachusetts Institute of Technology Libraries and MIT CSAIL and funded by the Andrew W. Mellon Foundation. The project ran from 2003 to August 2008. It focused on developing tools to increase the interoperability of disparate digital collections. Much of SIMILE's technical focus is oriented towards Semantic Web technology and standards such as Resource Description Framework (RDF).

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

NEPOMUK is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF. Between 2006 and 2008 it was funded by a European Union research project of the same name that grouped together industrial and academic actors to develop various Semantic Desktop technologies.

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining. There are also examples of probabilistic reasoners, including non-axiomatic reasoning systems, and probabilistic logic networks.

OpenIRIS is the open source version of IRIS, a semantic desktop that enables users to create a "personal map" across their office-related information objects. The name IRIS is an acronym for "Integrate. Relate. Infer. Share."

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

A personal knowledge base (PKB) is an electronic tool used to express, capture, and later retrieve the personal knowledge of an individual. It differs from a traditional database in that it contains subjective material particular to the owner, that others may not agree with nor care about. Importantly, a PKB consists primarily of knowledge, rather than information; in other words, it is not a collection of documents or other sources an individual has encountered, but rather an expression of the distilled knowledge the owner has extracted from those sources or from elsewhere.

Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL and available as a high-availability cluster. Ontotext GraphDB is used in various European research projects.

References

↑ Karger, David R.; Dennis Quan (2004). "Haystack: a user interface for creating, browsing, and organizing arbitrary semistructured information". CHI '04 extended abstracts on Human factors in computing systems. Vienna, Austria: ACM. pp. 777–778. ISBN 1-58113-703-6.
↑ Quan, Dennis; Huynh, David; Sinha, Vineet; Karger, David (2002). Adenine: a metadata programming language (PDF). Student Oxygen Workshop.
↑ Rodriguez, Marko A. (August 2011). "The RDF virtual machine". Knowledge-Based Systems. 24 (6): 890–903. arXiv: 0802.3492 . doi:10.1016/j.knosys.2011.04.004. ISSN 0950-7051. S2CID 1962171.

Haystack: per-user information environments. Eytan Adar, David Karger, Lynn Andrea Stein. Proceedings of the eighth international conference on Information and knowledge management, p. 413–422, November 2–06, 1999, Kansas City, Missouri, United States
Haystack: A Platform for Creating, Organizing and Visualizing Information Using RDF. Huynh, Karger, et al. 2002
Haystack Project summary
Belief layer for Haystack

External links

Active Haystack projects
Haystack at the SIMILE project webpage
Adenine Tutorial converted to IFCX Wings from the original which is now 404.
Adenine implementation in Java extracted from MIT Haystack hosted on Sourceforge.

This software article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Karger, David R.; Dennis Quan (2004). "Haystack: a user interface for creating, browsing, and organizing arbitrary semistructured information". CHI '04 extended abstracts on Human factors in computing systems. Vienna, Austria: ACM. pp. 777–778. ISBN 1-58113-703-6.

[2] Quan, Dennis; Huynh, David; Sinha, Vineet; Karger, David (2002). Adenine: a metadata programming language (PDF). Student Oxygen Workshop.

[3] Rodriguez, Marko A. (August 2011). "The RDF virtual machine". Knowledge-Based Systems. 24 (6): 890–903. arXiv: 0802.3492 . doi:10.1016/j.knosys.2011.04.004. ISSN 0950-7051. S2CID 1962171.

[1]

[2]

[3]