Enterprise information integration

Last updated

Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to provide a unified interface (known as uniform data access) for viewing all the data within an organization, and a single set of structures and naming conventions (known as uniform information representation) to represent this data; the goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source.

Contents

Overview

Data within an enterprise can be stored in heterogeneous formats, including relational databases (which themselves come in a large number of varieties), text files, XML files, spreadsheets and a variety of proprietary storage methods, each with their own indexing and data access methods.

Standardized data access APIs have emerged that offer a specific set of commands to retrieve and modify data from a generic data source. Many applications exist that implement these APIs' commands across various data sources, most notably relational databases. Such APIs include ODBC, JDBC, XQJ, OLE DB, and more recently ADO.NET.

There are also standard formats for representing data within a file that are very important to information integration. The best-known of these is XML, which has emerged as a standard universal representation format. There are also more specific XML "grammars" defined for specific types of data such as Geography Markup Language for expressing geographical features and Directory Service Markup Language for holding directory-style information. In addition, non-XML standard formats exist such as iCalendar for representing calendar information and vCard for business card information.

Enterprise Information Integration (EII) applies data integration commercially. Despite the theoretical problems described above, the private sector shows more concern with the problems of data integration as a viable product. [1] EII emphasizes neither on correctness nor tractability, but speed and simplicity.

Combining disparate data sets
Each data source is disparate and as such is not designed to support EII. Therefore, data virtualization as well as data federation depends upon accidental data commonality to support combining data and information from disparate data sets. Because of this lack of data value commonality across data sources, the return set may be inaccurate, incomplete, and impossible to validate.
One solution is to recast disparate databases to integrate these databases without the need for ETL. The recast databases support commonality constraints where referential integrity may be enforced between databases. The recast databases provide designed data access paths with data value commonality across databases.
Simplicity of deployment
Even if recognized as a solution to a problem, EII as of 2009 currently takes time to apply and offers complexities in deployment. Proposed schema-less solutions include "Lean Middleware". [2]
Handling higher-order information
Analysts experience difficulty—even with a functioning information integration system—in determining whether the sources in the database will satisfy a given application. Answering these kinds of questions about a set of repositories requires semantic information like metadata and/or ontologies.

Applications

EII products enable loose coupling between homogeneous-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (spreadsheets, word processors, presentation software, etc.), development environments and frameworks (Java EE, .NET, Mono, SOAP or RESTful Web services, etc.), business intelligence (BI), business activity monitoring (BAM) software, enterprise resource planning (ERP), Customer relationship management (CRM), business process management (BPM and/or BPEL) Software, and web content management (CMS).

Data access technologies

See also

Related Research Articles

Middleware in the context of distributed applications is software that provides services beyond those provided by the operating system to enable the various components of a distributed system to communicate and manage data. Middleware supports and simplifies complex distributed applications. It includes web servers, application servers, messaging and similar tools that support application development and delivery. Middleware is especially integral to modern information technology based on XML, SOAP, Web services, and service-oriented architecture.

The Organization for the Advancement of Structured Information Standards is a nonprofit consortium that works on the development, convergence, and adoption of open standards for cybersecurity, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

Enterprise application integration (EAI) is the use of software and computer systems' architectural principles to integrate a set of enterprise computer applications.

<span class="mw-page-title-main">Enterprise service bus</span> Communication system in a service-oriented architecture

An enterprise service bus (ESB) implements a communication system between mutually interacting software applications in a service-oriented architecture (SOA). It represents a software architecture for distributed computing, and is a special variant of the more general client-server model, wherein any application may behave as server or client. ESB promotes agility and flexibility with regard to high-level protocol communication between applications. Its primary use is in enterprise application integration (EAI) of heterogeneous and complex service landscapes.

A universal integration platform is a development- and/or configuration-time analog of a universal server. The emphasis on the term: "platform" implies a middleware environment from which integration oriented solutions are derived. Likewise, the term: "Universal" implies depth and breadth of integration capabilities that transcend disparate operating systems, protocols, APIs, Data Sources, Programming Languages, Composite Processes, Discrete Services, and Monolithic Applications.

An enterprise messaging system (EMS) or messaging system in brief is a set of published enterprise-wide standards that allows organizations to send semantically precise messages between computer systems. EMS systems promote loosely coupled architectures that allow changes in the formats of messages to have minimum impact on message subscribers. EMS systems are facilitated by the use of structured messages, and appropriate protocols, such as DDS, MSMQ, AMQP or SOAP with web services.

Prova is an open source programming language that combines Prolog with Java.

A mashup, in web development, is a web page or web application that uses content from more than one source to create a single new service displayed in a single graphical interface. For example, a user could combine the addresses and photographs of their library branches with a Google map to create a map mashup. The term implies easy, fast integration, frequently using open application programming interfaces and data sources to produce enriched results that were not necessarily the original reason for producing the raw source data. The term mashup originally comes from creating something by combining elements from two or more sources.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

JBoss Developer Studio (JBDS) is a development environment created and currently developed by JBoss and Exadel.

EMML, or Enterprise Mashup Markup Language, is an XML markup language for creating enterprise mashups, which are software applications that consume and mash data from variety of sources, often performing logical or mathematical operations as well as presenting data.

<span class="mw-page-title-main">Sheetster</span>

Sheetster is a GPL open-source Web Spreadsheet and a Java Application Server created by Extentech Inc. The product was created for the enterprise and small and medium-sized businesses as an Open Source alternative to closed document management systems.

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

Oracle TopLink is a mapping and persistence framework for Java developers. TopLink is produced by Oracle and is a part of Oracle's OracleAS, WebLogic, and OC4J servers. It is an object-persistence and object-transformation framework. TopLink provides development tools and run-time functionalities that ease the development process and help increase functionality. Persistent object-oriented data is stored in relational databases which helps build high-performance applications. Storing data in either XML or relational databases is made possible by transforming it from object-oriented data.

References

  1. Alon Y. Halevy; et al. (2005). "Enterprise information integration: successes, challenges and controversies" (PDF). SIGMOD 2005. pp. 778–787. doi:10.1145/1066157.1066246.
  2. David A. Maluf; et al. (2005). "Lean middleware". SIGMOD 2005. pp. 788–791. doi:10.1145/1066157.1066247.