Ralph Kimball

Last updated

Ralph Kimball (born July 18, 1944 [1] ) is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. [2] [3] His bottom-up methodology, also known as dimensional modeling or the Kimball methodology, is one of the two main data warehousing methodologies alongside Bill Inmon. [2] [3]

Contents

He is the principal author of the best-selling [4] books The Data Warehouse Toolkit (1996), [5] The Data Warehouse Lifecycle Toolkit (1998), The Data Warehouse ETL Toolkit (2004) and The Kimball Group Reader (2015), published by Wiley and Sons.

Career

After receiving a Ph.D. [4] in 1973 from Stanford University in electrical engineering (specializing in man-machine systems), Ralph joined the Xerox Palo Alto Research Center (PARC). At PARC Ralph was a principal designer of the Xerox Star Workstation, the first commercial product to use mice, icons and windows.

Kimball then became vice president of applications at Metaphor Computer Systems, a decision support software and services provider. He developed the Capsule Facility in 1982. The Capsule was a graphical programming technique which connected icons together in a logical flow, allowing a very visual style of programming for non-programmers. The Capsule was used to build reporting and analysis applications at Metaphor.

Kimball founded Red Brick Systems in 1986, serving as CEO until 1992. The company was acquired by Informix, which is now owned by IBM. [6] Red Brick was known for its relational database optimized for data warehousing. Their claim to fame was the use of bit-map Indexes in order to achieve performance gains that amounted to almost 10 times that of other Database vendors at that time.

Since 1992, Kimball has provided data warehouse consulting and education through various companies such as Ralph Kimball Associates and the Kimball Group. [7] [4]

See also

Bibliography

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reoccurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Data mart</span>

A data mart is a structure/access pattern specific to data warehouse environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit owner will not hold true for shared dimensions like customer, product, etc.

Data profiling is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data. The purpose of these statistics may be to:

  1. Find out whether existing data can be easily used for other purposes
  2. Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category
  3. Assess data quality, including whether the data conforms to particular standards or patterns
  4. Assess the risk involved in integrating data in new applications, including the challenges of joins
  5. Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies
  6. Assess whether known metadata accurately describes the actual values in the source database
  7. Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
  8. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality.

William H. Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

<span class="mw-page-title-main">Fact table</span> Business data structure

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non-additive, and semi-additive measures.

<span class="mw-page-title-main">Snowflake schema</span> A logical arrangement of computing tables in a multidimensional database

In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. "Snowflaking" is a method of normalizing the dimension tables in a star schema. When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.

An operational data store (ODS) is used for operational reporting and as a source of data for the enterprise data warehouse (EDW). It is a complementary element to an EDW in a decision support environment, and is used for operational reporting, controls, and decision making, as opposed to the EDW, which is used for tactical and strategic decision support.

<span class="mw-page-title-main">Dimension (data warehouse)</span> Structure that categorizes facts and measures in a data warehouse

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time.

In a data warehouse, a measure is a property on which calculations can be made. A measure can either be categorical, algebraic or holistic.

According to Ralph Kimball, in a data warehouse, a degenerate dimension is a dimension key in the fact table that does not have its own dimension table, because all the interesting attributes have been placed in analytic dimensions. The term "degenerate dimension" was originated by Ralph Kimball.

A slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule. Some examples of typical slowly changing dimensions are entities such as names of geographical locations, customers, or products.

Dimensional modeling (DM) is part of the Business Dimensional Lifecycle methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design. The approach focuses on identifying the key business processes within a business and modelling and implementing these first before adding additional business processes, as a bottom-up approach. An alternative approach from Inmon advocates a top down design of the model of all the enterprise data using tools such as entity-relationship modeling (ER).

<span class="mw-page-title-main">Data vault modeling</span> Database modeling method

Data vault modeling, also known as common foundational warehouse architecture or common foundational modeling architecture, is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change as well as emphasizing the need to trace where all the data in the database came from. This means that every row in a data vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source. The concept was published in 2000 by Dan Linstedt.

In business intelligence, data classification has close ties to data clustering, but where data clustering is descriptive, data classification is predictive. In essence data classification consists of using variables with known values to predict the unknown or future values of other variables. It can be used in e.g. direct marketing, insurance fraud detection or medical diagnosis.

The Kimball lifecycle is a methodology for developing data warehouses, and has been developed by Ralph Kimball and a variety of colleagues. The methodology "covers a sequence of high level tasks for the effective design, development and deployment" of a data warehouse or business intelligence system. It is considered a "bottom-up" approach to data warehousing as pioneered by Ralph Kimball, in contrast to the older "top-down" approach pioneered by Bill Inmon.

<span class="mw-page-title-main">Aggregate (data warehouse)</span> Cached summaries to speed up queries

An aggregate is a type of summary used in dimensional models of data warehouses to shorten the time it takes to provide answers to typical queries on large sets of data. The reason why aggregates can make such a dramatic increase in the performance of a data warehouse is the reduction of the number of rows to be accessed when responding to a query.

The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball’s approach to dimensional modeling conformed dimension.

Utilising the DW/BI system is the final step before business users gain access to the information. The first impression the business community gets is when introduced to the BI frontend drives. Because acceptance from users is important, the deployment must be thoughtfully planned to ensure that the DW/BI system can perform and deliver the results it is designed to deliver. To ensure that the implementation can perform and deliver it has to be exposed to extensive end-to-end testing. The process of testing is an ongoing activity along the development process, because defects that should be correct later in the lifecycle are difficult to find and are associated with exponentially increasing costs. A way of securing that the testing is done through the development lifecycle is to follow a methodology. Kimball prescribe that before adding the DW/BI system, it should have passed a mock test that will cover the following procedures;

<span class="mw-page-title-main">Joe Caserta</span> American information specialist, author

Joe Caserta is an American information specialist and author. He is best known as the founder and president of data and analytics consulting, architecture, and implementation firm Caserta founded in 2001. Management consulting firm McKinsey & Company acquired Caserta on June 1, 2022.

References

  1. "Kimball, Ralph (1944-....)". Bibliothèque nationale de France . 2022-02-14. Retrieved 2022-07-16.
  2. 1 2 Černiauskas, Julius (2022-04-27). "Opening The Doors To Data Warehouses". Forbes . Retrieved 2022-07-16. There are many ways to construct data warehouses, but the two dominant ones have been proposed by Bill Inmon and Ralph Kimball.
  3. 1 2 KasthuriArachchi, Tharuka (2021-02-14). "Theories of Kimball and Inmon About Data Warehouse Design". Medium . Retrieved 2022-07-16. Bill Inmon and Ralph Kimball are the two pioneers that stated different philosophies in enterprise-wide information gathering, information management, and analytics for decision support.
  4. 1 2 3 "Dr. Ralph Kimball and Informatica Host Big Data Best Practices Webinar Industry Leaders and Innovators Share Newly Emerging Best Practices for Big Data". Informatica. 2012-09-28. Archived from the original on 2015-12-11. Retrieved 2015-12-11. Dr. Ralph Kimball, founder of the Kimball Group, is known worldwide as an innovator, writer, educator, speaker and consultant in the field of data warehousing. He has remained steadfast in his long-term conviction that data warehouses must be designed to be understandable and fast. Ralph's books on dimensional design techniques have become the all-time best sellers in data warehousing and he has trained more than 10,000 IT professionals around the globe. Prior to founding Kimball Group, he worked at Metaphor, founded Red Brick Systems and co-invented the Star workstation at Xerox's Palo Alto Research Center (PARC). Ralph has his Ph.D. in Electrical Engineering from Stanford University.
  5. Whitehorn, Mark (2010-03-09). "Inmon vs. Kimball data warehousing: the debate over DW architecture". ComputerWeekly.com. Archived from the original on 2015-10-03. Retrieved 2015-12-11. Indeed, the fact that Inmon wrote a foreword for the first edition of Kimball's book The Data Warehouse Toolkit tells us that they probably take a balanced view.
  6. "IBM Red Brick Warehouse". IBM. 2015-12-11. Archived from the original on 2015-12-11. Retrieved 2017-03-04.
  7. Hapgood, Fred (2001-08-15). "Data Warehousing: Making Smart Decisions Via Business Intelligence Systems". CIO . Retrieved 2022-07-16. Along with his wife Julie, he now operates Ralph Kimball Associates in Boulder Creek, Calif., which in turn runs Kimball University, "the authoritative source for data warehouse education."