Kimball lifecycle

Last updated

The Kimball lifecycle is a methodology for developing data warehouses, and has been developed by Ralph Kimball and a variety of colleagues. The methodology "covers a sequence of high level tasks for the effective design, development and deployment" of a data warehouse or business intelligence system. [1] It is considered a "bottom-up" approach to data warehousing as pioneered by Ralph Kimball, in contrast to the older "top-down" approach pioneered by Bill Inmon. [2]

Contents

Program or project planning phase

According to Ralph Kimball et al., the planning phase is the start of the lifecycle. It is a planning phase in which project is a single iteration of the lifecycle while program is the broader coordination of resources. When launching a project or program Kimball et al. suggests following three focus areas:

Program and project management

This is an ongoing discipline in the project. The purpose is to keep the project/program on course, develop a communication plan and manage expectations.

Business requirements definition

This phase or milestone of the project is about making the project team understand the business requirements. Its purpose is to establish a foundation for all the following activities in the lifecycle. Kimball et al. makes it clear that it is important for the project team to talk with the business users, and team members should be prepared to focus on listening and to document the user interviews. An output of this step is the enterprise bus matrix.

Technology track

The top track holds two milestones:

  1. Technical architecture design is supposed to create a framework for the data warehouse or business intelligence system. The main focus in this phase is to create a plan for the application architecture, while considering business requirements, technical environment and the planned strategic technical directions.
  2. Product selection and installation use the architecture plan to identify what components are needed to complete the data warehouse or business intelligence project. This phase then selects, installs and tests the products.

Data track

Dimensional modeling is a process in which the business requirements are used to design dimensional models for the system.

Physical design is the phase where the database is designed. It involves the database environment as well as security.

Extract, transform, load (ETL) design and development is the design of some of the heavy procedures in the data warehouse and business intelligence system. Kimball et al. suggests four parts to this process, which are further divided into 34 subsystems [3] :

Business intelligence application track

Business intelligence application design deals with designing and selecting some applications to support the business requirements. Business intelligence application development use the design to develop and validate applications to support the business requirements.

Deployment

When the three tracks are complete they all end up in the final deployment. This phase requires planning and should include pre-deployment testing, documentation, training and maintenance and support.

Maintenance

When the deployment has finished the system will need proper maintenance to stay alive. This includes data reconciliation, execution and monitoring and performance tuning.

Growth

As the project can be seen as part of the larger iterative program, it is likely that the system will want to expand. There will be projects to add new data as well as reaching new segments of the business areas. The lifecycle then starts over again.

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is a core component of business intelligence. Data warehouses are central repositories of data integrated from disparate sources. They store current and historical data organized so as to make it easy to create reports, query and get insights from the data. Unlike databases, they are intended to be used by analysts and managers to help make organizational decisions.

Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

The rational unified process (RUP) is an iterative software development process framework created by the Rational Software Corporation, a division of IBM since 2003. RUP is not a single concrete prescriptive process, but rather an adaptable process framework, intended to be tailored by the development organizations and software project teams that will select the elements of the process that are appropriate for their needs. RUP is a specific implementation of the Unified Process.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Data mart</span> Data management pattern

A data mart is a structure/access pattern specific to data warehouse environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit owner will not hold true for shared dimensions like customer, product, etc.

<span class="mw-page-title-main">Systems development life cycle</span> Systems engineering terms

In systems engineering, information systems and software engineering, the systems development life cycle (SDLC), also referred to as the application development life cycle, is a process for planning, creating, testing, and deploying an information system. The SDLC concept applies to a range of hardware and software configurations, as a system can be composed of hardware only, software only, or a combination of both. There are usually six stages in this cycle: requirement analysis, design, development and testing, implementation, documentation, and evaluation.

Web development is the work involved in developing a website for the Internet or an intranet. Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development.

Data profiling is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data. The purpose of these statistics may be to:

  1. Find out whether existing data can be easily used for other purposes
  2. Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category
  3. Assess data quality, including whether the data conforms to particular standards or patterns
  4. Assess the risk involved in integrating data in new applications, including the challenges of joins
  5. Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies
  6. Assess whether known metadata accurately describes the actual values in the source database
  7. Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
  8. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality.

William H. Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Additionally, the validation of migrated data for completeness and the decommissioning of legacy data storage are considered part of the entire data migration process. Data migration is a key consideration for any system implementation, upgrade, or consolidation, and it is typically performed in such a way as to be as automated as possible, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including server or storage equipment replacements, maintenance or upgrades, application migration, website consolidation, disaster recovery, and data center relocation.

Ralph Kimball is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His bottom-up methodology, also known as dimensional modeling or the Kimball methodology, is one of the two main data warehousing methodologies alongside Bill Inmon.

The Enterprise Unified Process (EUP) is an extended variant of the Unified Process and was developed by Scott W. Ambler and Larry Constantine in 2000, eventually reworked in 2005 by Ambler, John Nalbone and Michael Vizdos. EUP was originally introduced to overcome some shortages of RUP, namely the lack of production and eventual retirement of a software system. So two phases and several new disciplines were added. EUP sees software development not as a standalone activity, but embedded in the lifecycle of the system, the IT lifecycle of the enterprise and the organization/business lifecycle of the enterprise itself. It deals with software development as seen from the customer's point of view.

Dimensional modeling (DM) is part of the Business Dimensional Lifecycle methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design. The approach focuses on identifying the key business processes within a business and modelling and implementing these first before adding additional business processes, as a bottom-up approach. An alternative approach from Inmon advocates a top down design of the model of all the enterprise data using tools such as entity-relationship modeling (ER).

In computerized business management, single version of the truth (SVOT), is a technical concept describing the data warehousing ideal of having either a single centralised database, or at least a distributed synchronised database, which stores all of an organisation's data in a consistent and non-redundant form. This contrasts with the related concept of single source of truth (SSOT), which refers to a data storage principle to always source a particular piece of information from one place.

In information systems, applications architecture or application architecture is one of several architecture domains that form the pillars of an enterprise architecture (EA).

<span class="mw-page-title-main">Data vault modeling</span> Database modeling method

Datavault or data vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change as well as emphasizing the need to trace where all the data in the database came from. This means that every row in a data vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source. The concept was published in 2000 by Dan Linstedt.

In software engineering, a software development process or software development life cycle (SDLC) is a process of planning and managing software development. It typically involves dividing software development work into smaller, parallel, or sequential steps or sub-processes to improve design and/or product management. The methodology may include the pre-definition of specific deliverables and artifacts that are created and completed by a project team to develop or maintain an application.

The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball’s approach to dimensional modeling conformed dimension.

Utilising the DW/BI system is the final step before business users gain access to the information. The first impression the business community gets is when introduced to the BI frontend drives. Because acceptance from users is important, the deployment must be thoughtfully planned to ensure that the DW/BI system can perform and deliver the results it is designed to deliver. To ensure that the implementation can perform and deliver it has to be exposed to extensive end-to-end testing. The process of testing is an ongoing activity along the development process, because defects that should be correct later in the lifecycle are difficult to find and are associated with exponentially increasing costs. A way of securing that the testing is done through the development lifecycle is to follow a methodology. Kimball prescribe that before adding the DW/BI system, it should have passed a mock test that will cover the following procedures;

Agile Business Intelligence (ABI) refers to the use of Agile software development for BI projects, aiming to reduce the time it takes to show value to the organization in comparison to other approaches. Agile BI attempts to enable the BI team, businesspeople or stakeholders to make business decisions more quickly.

References

  1. The Kimball Lifecycle Methodology. Adapted from Kimball et al. (2008)
  2. Naeem, Tehreem. "Data Warehouse Concepts: Kimball vs. Inmon Approach". Astera. Retrieved 17 October 2024.
  3. Kimball, Ralph; Ross, Margy; Thornthwaite, Warren; Mundy, Joy; Becker, Bob (2008), The Data Warehouse Lifecycle Toolkit, Wiley Publishing Inc