William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. [1] [2] Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.
William H. Inmon was born July 20, 1945, in San Diego, California. He received his Bachelor of Science degree in mathematics from Yale University in 1967, and his Master of Science degree in computer science from New Mexico State University.
He worked for American Management Systems and Coopers & Lybrand before 1991, when he founded the company Prism Solutions, which he took public. In 1995 he founded Pine Cone Systems, which was renamed Ambeo later on. In 1999, he created a corporate information factory web site for his consulting business. [3]
Inmon coined terms such as the government information factory, as well as data warehousing 2.0. Inmon promotes building, usage, and maintenance of data warehouses and related topics. His books include "Building the Data Warehouse" (1992, with later editions) and "DW 2.0: The Architecture for the Next Generation of Data Warehousing" (2008).
In July 2007, Inmon was named by Computerworld as one of the ten people that most influenced the first 40 years of the computer industry. [4]
Inmon's association with data warehousing stems from the fact that he wrote the first [5] book on data warehousing he held the first conference on data warehousing (with Arnie Barnett), he wrote the first column in a magazine on data warehousing, he has written over 1,000 articles on data warehousing in journals and newsletters, he created the first fold out wall chart for data warehousing and he conducted the first classes on data warehousing.
In 2012, Inmon developed and made public technology known as "textual disambiguation". Textual disambiguation applies context to raw text and reformats the raw text and context into a standard data base format. Once raw text is passed through textual disambiguation, it can easily and efficiently be accessed and analyzed by standard business intelligence technology. Textual disambiguation is accomplished through the execution of TextualETL.
Inmon owns and operates Forest Rim Technology, a company that applies and implements data warehousing solutions executed through textual disambiguation and TextualETL. [6]
Bill Inmon has published more than 60 books in nine languages and 2,000 articles on data warehousing and data management.
In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.
Business intelligence (BI) consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.
A data mart is a structure/access pattern specific to data warehouse environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit owner will not hold true for shared dimensions like customer, product, etc.
Data engineering refers to the building of systems to enable the collection and usage of data. This data is usually used to enable subsequent analysis and data science; which often involves machine learning. Making the data usable usually involves substantial compute and storage, as well as data processing.
Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization’s data so it can be analyzed for decision making.
An operational data store (ODS) is used for operational reporting and as a source of data for the enterprise data warehouse (EDW). It is a complementary element to an EDW in a decision support environment, and is used for operational reporting, controls, and decision making, as opposed to the EDW, which is used for tactical and strategic decision support.
Ralph Kimball is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His bottom-up methodology, also known as dimensional modeling or the Kimball methodology, is one of the two main data warehousing methodologies alongside Bill Inmon.
John A. Zachman is an American business and IT consultant, early pioneer of enterprise architecture, chief executive officer of Zachman International, and originator of the Zachman Framework.
In computerized business management, single version of the truth (SVOT), is a technical concept describing the data warehousing ideal of having either a single centralised database, or at least a distributed synchronised database, which stores all of an organisation's data in a consistent and non-redundant form. This contrasts with the related concept of single source of truth (SSOT), which refers to a data storage principle to always source a particular piece of information from one place.
Datavault or data vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change as well as emphasizing the need to trace where all the data in the database came from. This means that every row in a data vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source. The concept was published in 2000 by Dan Linstedt.
Richard Veryard FRSA is a British computer scientist, author and business consultant, known for his work on service-oriented architecture and the service-based business.
Steven Howard Spewak was an American management consultant, author, and lecturer on enterprise architectures, known for the development of Enterprise Architecture Planning (EAP).
A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.
LucidDB is an open-source database purpose-built to power data warehouses, OLAP servers and business intelligence systems. According to the product website, its architecture is based on column-store, bitmap indexing, hash join/aggregation, and page-level multiversioning.
Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data.
The Kimball lifecycle is a methodology for developing data warehouses, and has been developed by Ralph Kimball and a variety of colleagues. The methodology "covers a sequence of high level tasks for the effective design, development and deployment" of a data warehouse or business intelligence system. It is considered a "bottom-up" approach to data warehousing as pioneered by Ralph Kimball, in contrast to the older "top-down" approach pioneered by Bill Inmon.
Christopher P. (Chris) Gane was a British/American computer scientist, consultant and information technology writer, known for developing data flow diagrams with Trish Sarson in the 1970s.
Trish Sarson is a British/American computer scientist, consultant and information technology writer, known for developing data flow diagrams with Chris Gane in the 1970s.
Business metadata is data that adds business context to other data. It provides information authored by business people and/or used by business people. It is in contrast to technical metadata, which is data used in the storage and structure of the data in a database or system. Technical metadata includes the database table name and column name, data type, indexes referencing the data, ETL jobs involving the data, when the data was last updated, accessed, etc.
Joe Caserta is an American information specialist and author. He is best known as the founder and president of data and analytics consulting, architecture, and implementation firm Caserta founded in 2001. Management consulting firm McKinsey & Company acquired Caserta on June 1, 2022.