Swoogle

Last updated
Swoogle
Original author(s) Li Ding
Tim Finin
Anupam Joshi
Rong Pan
R. Scott Cost
Yun Peng
Pavan Reddivari
Vishal Doshi
Joel Sachs [1]
Website swoogle.umbc.edu

Swoogle was a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle employed a system of crawlers to discover RDF documents and HTML documents with embedded RDF content. Swoogle reasoned about these documents and their constituent parts (e.g., terms and triples) and recorded and indexed meaningful metadata about them in its database. [1] [2]

The Semantic Web is an extension of the World Wide Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF). According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". The Semantic Web is therefore regarded as an integrator across different content, information applications and systems.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

HTML Hypertext Markup Language

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web.

Swoogle provided services to human users through a browser interface and to software agents via RESTful web services. Several techniques were used to rank query results inspired by the PageRank algorithm developed at Google but adapted to the semantics and use patterns found in semantic web documents.

PageRank Algorithm

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Google American multinational Internet and technology corporation

Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware. It is considered one of the Big Four technology companies, alongside Amazon, Apple and Facebook.

Swoogle was developed at and was hosted by the University of Maryland, Baltimore County (UMBC) with funding from the US DARPA and National Science Foundation agencies. It was PhD thesis work of Li Ding advised by Professor Tim Finin.

University of Maryland, Baltimore County university

The University of Maryland, Baltimore County is a public research university in Baltimore County, Maryland. It has a fall 2017 enrollment of 13,662 students, 48 undergraduate majors, over 60 graduate programs and the first university research park in Maryland.

DARPA agency of the U.S. Department of Defense responsible for the development of new technologies

The Defense Advanced Research Projects Agency (DARPA) is an agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military.

National Science Foundation United States government agency

The National Science Foundation (NSF) is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health. With an annual budget of about US$7.8 billion, the NSF funds approximately 24% of all federally supported basic research conducted by the United States' colleges and universities. In some fields, such as mathematics, computer science, economics, and the social sciences, the NSF is the major source of federal backing.

See also

The DARPA Agent Markup Language (DAML) was the name of a US funding program at the US Defense Advanced Research Projects Agency (DARPA) started in 1999 by then-Program Manager James Hendler, and later run by Murray Burke, Mark Greaves and Michael Pagels. The program focused on the creation of machine-readable representations for the Web. One of the Investigators working on the program was Tim Berners-Lee and to a great degree through his influence, working with the program managers, the effort worked to create technologies and demonstrations for what is now called the Semantic Web and this in turn led to the growth of Knowledge Graph technology.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in object-oriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly whereas ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases.

Related Research Articles

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many or all domains.

OIL can be regarded as an ontology infrastructure for the Semantic Web. OIL is based on concepts developed in Description Logic (DL) and frame-based systems and is compatible with RDFS.

Description logics (DL) are a family of formal knowledge representation languages. Many DLs are more expressive than propositional logic but less expressive than first-order logic. In contrast to the latter, the core reasoning problems for DLs are (usually) decidable, and efficient decision procedures have been designed and implemented for these problems. There are general, spatial, temporal, spatiotemporal, and fuzzy descriptions logics, and each description logic features a different balance between DL expressivity and reasoning complexity by supporting different sets of mathematical constructors.

James Hendler artificial intelligence researcher

James Alexander Hendler is an artificial intelligence researcher at Rensselaer Polytechnic Institute, United States, and one of the originators of the Semantic Web.

In computer science and artificial intelligence, ontology languages are formal languages used to construct ontologies. They allow the encoding of knowledge about specific domains and often include reasoning rules that support the processing of that knowledge. Ontology languages are usually declarative languages, are almost always generalizations of frame languages, and are commonly based on either first-order logic or on description logic.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Tim Finin American computer scientist

Timothy Wilking Finin is the Willard and Lillian Hackerman Chair in Engineering and is a Professor of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County (UMBC). His research has focused on the applications of artificial intelligence to problems in information systems and has included contributions to natural language processing, expert systems, the theory and applications of multiagent systems, the semantic web, and mobile computing.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

Semantically-Interlinked Online Communities schema for describing posts and interactions on forums, message boards, blogs etc.

Semantically-Interlinked Online Communities Project is a Semantic Web technology. SIOC provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other. It consists of the SIOC ontology, an open-standard machine readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging this SIOC data.

Ontotext is a Bulgarian software company headquartered in Sofia. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

Robert David Stevens Professor of Bio-Health Informatics

Robert David Stevens is a Professor of Bio-Health Informatics in the School of Computer Science at the University of Manchester. He has served as head of the School of Computer Science since 2016.

Ian Horrocks Professor of Computer Science at the University of Oxford

Ian Robert Horrocks is a Professor of Computer Science at the University of Oxford in the UK and a Fellow of Oriel College, Oxford. His research focuses on knowledge representation and reasoning, particularly ontology languages, description logic and optimised tableaux decision procedures.

Hyperdata are data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the "data on the Web" that is not inter-related.

Ontology engineering

Ontology engineering in computer science, information science and systems engineering is a field which studies the methods and methodologies for building ontologies: formal representations of a set of concepts within a domain and the relationships between those concepts. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

References

  1. 1 2 Ding, L.; Finin, T.; Joshi, A.; Pan, R.; Cost, R. S.; Peng, Y.; Reddivari, P.; Doshi, V.; Sachs, J. (2004). "Swoogle". Proceedings of the Thirteenth ACM conference on Information and knowledge management - CIKM '04. p. 652. doi:10.1145/1031171.1031289. ISBN   1581138741.
  2. Ding, Li (2006). Enhancing Semantic Web Data Access (PhD thesis). University of Maryland, Baltimore County.