Amit Sheth | |
---|---|
Born | |
Alma mater | Ohio State University, Birla Institute of Technology and Science |
Occupation | Professor at University of South Carolina |
Title | Founding Director of Artificial Intelligence Institute |
Website | Amit Sheth |
Amit Sheth is a computer scientist at University of South Carolina in Columbia, South Carolina. He is the founding Director of the Artificial Intelligence Institute, and a professor of Computer Science and Engineering. [1] From 2007 to June 2019, he was the Lexis Nexis Ohio Eminent Scholar, director of the Ohio Center of Excellence in Knowledge-enabled Computing, [2] and a professor of Computer Science at Wright State University. Sheth's work has been cited by over 48,800 publications. [3] He has an h-index of 117, [3] which puts him among the top 100 computer scientists [4] with the highest h-index. [5] Prior to founding the Kno.e.sis Center, he served as the director of the Large Scale Distributed Information Systems Lab at the University of Georgia in Athens, Georgia.
Sheth received his bachelor's in engineering from the Birla Institute of Technology and Science in computer science in 1981. He received his M.S. and Ph.D. in computer science from Ohio State University in 1983 and 1985, respectively. [6] [7]
Sheth has investigated, demonstrated, and advocated for the comprehensive use of metadata. He explored syntactical, structural, and semantic metadata; recently, he has pioneered ontology-driven approaches to metadata extraction and semantic analytics. He was among the first researchers to utilize description logic-based ontologies for schema and information integration (a decade before W3C adopted a DL-based ontology representation standard), and he was the first to deliver a keynote about Semantic Web applications in search. [8] [9] His work on multi-ontology query processing includes the most cited paper on the topic (over 930 citations [10] ). In 1996, he introduced the powerful concept of Metadata Reference Link (MREF) for associating metadata to hypertext that links documents on the Web and described an RDF-based realization in 1998, before RDF was adopted as a W3C recommendation. Part of his recent work has focussed on information extraction from text to generate semantic metadata in the form of RDF. In his work, semantic metadata extracted from biological text is made up of complex knowledge structures (complex entities and relationships) that reflect complex interactions in biomedical knowledge. [11] Sheth proposed a realization of Vannevar Bush's MEMEX vision as the Relationship Web, [12] based on the semantic metadata extracted from text. Sheth and his co-inventors were awarded the first known patent for commercial Semantic Web applications in browsing, searching, profiling, personalization, and advertising, [13] which led to his founding of the first Semantic Search company, Taalee.
In 1992, he gave an influential keynote titled "So far (schematically) yet so near (semantically)", which attested to the need for domain-specific semantics, the use of ontological representation for richer semantic modeling/knowledge representation, and the use of context when looking for similarity between objects. His work on using ontologies for information processing encompassed the approach for searching for an ontology-automated [14] reasoning for schema integration, semantic search, other applications, and semantic query processing. The latter involved query transformations using different ontologies for user queries and resources and federated queries—a concept with associated measures and techniques for computing information loss when traversing taxonomic relationships. [15] [16]
In the early 1990s, he initiated research in the formal modeling, scheduling, and correctness of workflows. His METEOR project demonstrated the value of research with real-world applications; its tools were used in graduate courses in several countries, and its technology was licensed to create a commercial product and was followed up by METEOR-S. He led the research (later joined by IBM) that resulted in the W3C submission of WSDL-S (Semantic Annotation of WSDL), the basis for SAWSDL, a W3C recommendation for adding semantics to WSDL and XML Schema.
For both SAWSDL and SA-REST, he provided leadership in the community-based process followed by the W3C. He coauthored a 1995 paper in the Journal of Distributed and Parallel Databases, which is one of the most cited papers in the area of workflow management literature, with more than 2,330 citations, as well as the most cited among over 430 papers published in that journal. [17] His key technical areas of contribution in workflow management include adaptive workflow management, [18] exception handling, [19] authorization and access control, [20] security, optimization, and quality of service. [21]
In the 1980s, large organizations wanted to couple multiple autonomous databases to accomplish certain tasks, but how this could be accomplished from a technical perspective wasn't understood. Starting in 1987, Sheth gave a number of tutorials at ICDE, VLDB, SIGMOD, and other major conferences in the area of distributed (federated) data management and developed scientific foundations and architectural principles to address these issues of database interoperability. He developed a clean reference architecture, covered in his most cited paper on federated databases. [22] It provides an architecture consisting of a range of tightly (i.e., global as a view) to loosely coupled (i.e., local as a view) alternatives for dealing with three dimensions: distribution, heterogeneity, and autonomy. Later, he led the development of a schema integration tool in the USA. [23]
Sheth analyzed the limitations resulting from the autonomy of the individual databases and worked towards deep integration by developing specification models for interdatabase dependencies, allowing for a limited degree of coupling to ensure global consistency for critical applications. [24] Together with Dimitrios Georgakopoulos and Marek Rusinkiewicz, he developed the ticketing method for concurrency control of global transactions that need to see and preserve a consistent state across multiple databases. [25] This work, which was recognized with a best paper award at the 1991 International Conference on Data Engineering Conference, was awarded a patent and resulted in progress on multidatabase transactions by other researchers.
His work continued in the areas of the integration and interoperability of networked databases in enterprises to Web-based database access. [26] [27] He has also helped to characterize metadata and develop the techniques that extract and use metadata for integrated access to a variety of content, ranging from databases to multimedia/multimodal data. [28] [29] [30]
Sheth has been a strong proponent of identifying a richer and broader set of relationships, such as meronomy and causality, on the Semantic Web. His idea of a "relationship web" [31] is inspired from the vision of memex given by Vannevar Bush. Since the inception of linked data he emphasized the utilization of schema knowledge and the information present on the Web and in linked data for this purpose. These ideas led to a system called BLOOMS [32] for the identification of schema-level relationships between datasets belonging to linked data. Another related system called PLATO allowed for the identification of partonomical relationship between entities on linked data.
In 1993, he initiated InfoHarness, a system that extracted metadata from diverse content (news, software code, and requirements documents) using a Mozilla browser-based faceted search. [33] This system transitioned into a product by Bellcore in 1995 and was followed by a metadata-based search engine for a personal, electronic program guide and Web-based videos for a cable set-top box. [34] He licensed this technology he developed at the University of Georgia for his company Taalee in the same year that Tim-Berners Lee coined the term Semantic Web. In the first keynote on Semantic Web given anywhere, [35] Sheth presented Taalee's commercial implementation of a semantic search engine, which is covered the patent "System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising".
This 1999–2001 incarnation of semantic search (as described in the patent document) started with extensive tooling to create an ontology/WorldModel (today's knowledge graph) to design a schema and then automatically extract information (through knowledge extraction agents) and incorporate knowledge from multiple high-quality sources to populate the ontology and keep it fresh. This involves machinery for disambiguation to identify what is new and what has changed.
Then the data extraction agents which supported diverse content either pulled (crawled) or pushed (e.g., syndicated news in NewsML), called upon a nine-classifier committee (using bayesian, HMM, and knowledge-based classifiers) to determine the domains of the content, identify the relevant subset of the ontology to use, and perform semantic annotation. "Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content" is one of the earliest publications demonstrating the unusual effectiveness of knowledge-based classifiers compared with more traditional ML techniques. [36] The third component of the system utilized ontology and metadata (annotation) to support semantic search, browsing, profiling, personalization, and advertising.
This system also supported a dynamically generated "Rich Media Reference" (a.k.a. Google's Infobox) which not only displayed metadata about the searched entity pulled from the ontology and metabase but also provided what was termed "blended semantic browsing and querying". [37] He also led efforts in other forms/modality of data, including social and sensor data. He coined the term "Semantic Sensor Web" and initiated and chaired [38] [39] the W3C effort on Semantic Sensor Networking that resulted in a de facto standard. [40] He introduced the concept of semantic perception to reflect the process of converting massive amounts of IoT data into higher level abstractions to support human cognition and perception in decision making, which involves an IntellegO ontology-enabled abductive and deductive reasoning framework for iterative hypothesis refinement and validation. [41]
In early 2009 he initiated and framed the issue of social media analysis in a broad set of semantic dimensions he called "Spatio-Temporal-Thematic" (STT). He emphasised the analysis of social data from the perspective of people, content, sentiment analysis and emotions. This idea led to a system called Twitris, [42] which employs dynamically evolving semantic models [43] produced by the Semantic Web project Doozer [44] for this purpose. Twitris system can identify people's emotions (such as: joy, sadness, anger, fear, etc.) from their social media posts [45] by applying machine learning techniques with millions of self-labeled emotion tweets. [46]
Sheth founded Infocosm, Inc. in 1997, which licensed and commercialized the METEOR technology from the research he led at the University of Georgia, resulting in distributed workflow management products, WebWork [47] and ORBWork. [48] He founded Taalee, Inc. in 1999 based on licensing VideoAnywhere technology [49] based on the research he led at the University of Georgia. The first product from Taalee was a semantic search engine. [50] [51] [52] Taalee became Voquette [53] after merger in 2002, and then Semagix in 2004. [54] In 2016, Cognovi Labs was founded based on the Twirtis technology [55] resulting from the research he led at the Kno.e.sis Center of the Wright State University. [56] He also served as its chief innovator and serves on the board. The technology was successfully used to predict Brexit [57] and the US 2016 presidential election. [58]
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.
The semantic spectrum, sometimes referred to as the ontology spectrum, the smart data continuum, or semantic precision, is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.
The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF and OWL. These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.
A semantic web service, like conventional web services, is the server end of a client–server system for machine-to-machine interaction via the World Wide Web. Semantic services are a component of the semantic web because they use markup which makes data machine-readable in a detailed and sophisticated way.
Rudi Studer is a German computer scientist and professor emeritus at KIT, Germany. He served as head of the knowledge management research group at the Institute AIFB and one of the directors of the Karlsruhe Service Research Institute (KSRI). He is a former president of the Semantic Web Science Association, an STI International Fellow, and a member of numerous programme committees and editorial boards. He was one of the inaugural editors-in-chief of the Journal of Web Semantics, a position he held until 2007. He is a co-author of the "Semantic Wikipedia" proposal which led to the development of Wikidata.
Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.
In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.
Ontology-based data integration involves the use of one or more ontologies to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.
Machine interpretation of documents and services in Semantic Web environment is primarily enabled by (a) the capability to mark documents, document segments and services with semantic tags and (b) the ability to establish contextual relations between the tags with a domain model, which is formally represented as ontology. Human beings use natural languages to communicate an abstract view of the world. Natural language constructs are symbolic representations of human experience and are close to the conceptual model that Semantic Web technologies deal with. Thus, natural language constructs have been naturally used to represent the ontology elements. This makes it convenient to apply Semantic Web technologies in the domain of textual information. In contrast, multimedia documents are perceptual recording of human experience. An attempt to use a conceptual model to interpret the perceptual records gets severely impaired by the semantic gap that exists between the perceptual media features and the conceptual world. Notably, the concepts have their roots in perceptual experience of human beings and the apparent disconnect between the conceptual and the perceptual world is rather artificial. The key to semantic processing of multimedia data lies in harmonizing the seemingly isolated conceptual and the perceptual worlds. Representation of the Domain knowledge needs to be extended to enable perceptual modeling, over and above conceptual modeling that is supported. The perceptual model of a domain primarily comprises observable media properties of the concepts. Such perceptual models are useful for semantic interpretation of media documents, just as the conceptual models help in the semantic interpretation of textual documents.
Business semantics management (BSM) encompasses the technology, methodology, organization, and culture that brings business stakeholders together to collaboratively realize the reconciliation of their heterogeneous metadata; and consequently the application of the derived business semantics patterns to establish semantic alignment between the underlying data structures.
The Semantic Sensor Web (SSW) is a marriage of sensor web and semantic Web technologies. The encoding of sensor descriptions and sensor observation data with Semantic Web languages enables more expressive representation, advanced access, and formal analysis of sensor resources. The SSW annotates sensor data with spatial, temporal, and thematic semantic metadata. This technique builds on current standardization efforts within the Open Geospatial Consortium's Sensor Web Enablement (SWE) and extends them with Semantic Web technologies to provide enhanced descriptions and access to sensor data.
Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.
Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets.
Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.
Sheila Ann McIlraith is a Canadian computer scientist specializing in artificial intelligence (AI). She is a Professor in the Department of Computer Science, University of Toronto. She is a Canada CIFAR AI Chair, a faculty member of the Vector Institute, and Associate Director and Research Lead of the Schwartz Reisman Institute for Technology and Society.
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.
Data Commons is an open-source platform created by Google that provides an open knowledge graph, combining economic, scientific and other public datasets into a unified view. Ramanathan V. Guha, a creator of web standards including RDF, RSS, and Schema.org, founded the project, which is now led by Prem Ramaswami.
Terry R. Payne is a computer scientist and artificial intelligence researcher at the University of Liverpool. He works on the use of ontologies by Software Agents within decentralised environments. He is best known for his work on Semantic Web Services and in particular for his work on OWL-S.
{{cite book}}
: |journal=
ignored (help){{cite journal}}
: Cite journal requires |journal=
(help){{cite book}}
: |website=
ignored (help){{cite web}}
: |author=
has generic name (help){{cite web}}
: |author=
has generic name (help)