Kimball lifecycle

Last updated

The Kimball lifecycle is a methodology for developing data warehouses, and has been developed by Ralph Kimball and a variety of colleagues. The methodology "covers a sequence of high level tasks for the effective design, development and deployment" of a data warehouse or business intelligence system. [1] It is considered a "bottom-up" approach to data warehousing as pioneered by Ralph Kimball, in contrast to the older "top-down" approach pioneered by Bill Inmon.[ citation needed ]

Contents

Program or project planning phase

According to Ralph Kimball et al., the planning phase is the start of the lifecycle. It is a planning phase in which project is a single iteration of the lifecycle while program is the broader coordination of resources. When launching a project or program Kimball et al. suggests following three focus areas:

Program and project management

This is an ongoing discipline in the project. The purpose is to keep the project/program on course, develop a communication plan and manage expectations.

Business requirements definition

This phase or milestone of the project is about making the project team understand the business requirements. Its purpose is to establish a foundation for all the following activities in the lifecycle. Kimball et al. makes it clear that it is important for the project team to talk with the business users, and team members should be prepared to focus on listening and to document the user interviews. An output of this step is the enterprise bus matrix.

Technology track

The top track holds two milestones:

  1. Technical architecture design is supposed to create a framework for the data warehouse or business intelligence system. The main focus in this phase is to create a plan for the application architecture, while considering business requirements, technical environment and the planned strategic technical directions.
  2. Product selection and installation use the architecture plan to identify what components are needed to complete the data warehouse or business intelligence project. This phase then selects, installs and tests the products.

Data track

Dimensional modeling is a process in which the business requirements are used to design dimensional models for the system.

Physical design is the phase where the database is designed. It involves the database environment as well as security.

Extract, transform, load (ETL) design and development is the design of some of the heavy procedures in the data warehouse and business intelligence system. Kimball et al. suggests four parts to this process, which are further divided into 34 subsystems (Kimball et al., 2008):

Business intelligence application track

Business intelligence application design deals with designing and selecting some applications to support the business requirements. Business intelligence application development use the design to develop and validate applications to support the business requirements.

Deployment

When the three tracks are complete they all end up in the final deployment. This phase requires planning and should include pre-deployment testing, documentation, training and maintenance and support.

Maintenance

When the deployment has finished the system will need proper maintenance to stay alive. This includes data reconciliation, execution and monitoring and performance tuning.

Growth

As the project can be seen as part of the larger iterative program, it is likely that the system will want to expand. There will be projects to add new data as well as reaching new segments of the business areas. The lifecycle then starts over again.

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating reports. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

The waterfall model is a breakdown of development activities into linear sequential phases, meaning they are passed down onto each other, where each phase depends on the deliverables of the previous one and corresponds to a specialization of tasks. The approach is typical for certain areas of engineering design. In software development, it tends to be among the less iterative and flexible approaches, as progress flows in largely one direction through the phases of conception, initiation, analysis, design, construction, testing, deployment and maintenance. The waterfall model is the earliest SDLC approach that was used in software development.

The rational unified process (RUP) is an iterative software development process framework created by the Rational Software Corporation, a division of IBM since 2003. RUP is not a single concrete prescriptive process, but rather an adaptable process framework, intended to be tailored by the development organizations and software project teams that will select the elements of the process that are appropriate for their needs. RUP is a specific implementation of the Unified Process.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Systems development life cycle</span> Systems engineering terms

In systems engineering, information systems and software engineering, the systems development life cycle (SDLC), also referred to as the application development life cycle, is a process for planning, creating, testing, and deploying an information system. The SDLC concept applies to a range of hardware and software configurations, as a system can be composed of hardware only, software only, or a combination of both. There are usually six stages in this cycle: requirement analysis, design, development and testing, implementation, documentation, and evaluation.

Web development is the work involved in developing a website for the Internet or an intranet. Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development.

Data profiling is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data. The purpose of these statistics may be to:

  1. Find out whether existing data can be easily used for other purposes
  2. Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category
  3. Assess data quality, including whether the data conforms to particular standards or patterns
  4. Assess the risk involved in integrating data in new applications, including the challenges of joins
  5. Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies
  6. Assess whether known metadata accurately describes the actual values in the source database
  7. Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
  8. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality.

William H. Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

Software prototyping is the activity of creating prototypes of software applications, i.e., incomplete versions of the software program being developed. It is an activity that can occur in software development and is comparable to prototyping as known from other fields, such as mechanical engineering or manufacturing.

Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Additionally, the validation of migrated data for completeness and the decommissioning of legacy data storage are considered part of the entire data migration process. Data migration is a key consideration for any system implementation, upgrade, or consolidation, and it is typically performed in such a way as to be as automated as possible, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including server or storage equipment replacements, maintenance or upgrades, application migration, website consolidation, disaster recovery, and data center relocation.

Ralph Kimball is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His bottom-up methodology, also known as dimensional modeling or the Kimball methodology, is one of the two main data warehousing methodologies alongside Bill Inmon.

The Enterprise Unified Process (EUP) is an extended variant of the Unified Process and was developed by Scott W. Ambler and Larry Constantine in 2000, eventually reworked in 2005 by Ambler, John Nalbone and Michael Vizdos. EUP was originally introduced to overcome some shortages of RUP, namely the lack of production and eventual retirement of a software system. So two phases and several new disciplines were added. EUP sees software development not as a standalone activity, but embedded in the lifecycle of the system, the IT lifecycle of the enterprise and the organization/business lifecycle of the enterprise itself. It deals with software development as seen from the customer's point of view.

Dimensional modeling (DM) is part of the Business Dimensional Lifecycle methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design. The approach focuses on identifying the key business processes within a business and modelling and implementing these first before adding additional business processes, as a bottom-up approach. An alternative approach from Inmon advocates a top down design of the model of all the enterprise data using tools such as entity-relationship modeling (ER).

Service-oriented modeling is the discipline of modeling business and software systems, for the purpose of designing and specifying service-oriented business systems within a variety of architectural styles and paradigms, such as application architecture, service-oriented architecture, microservices, and cloud computing.

In information systems, applications architecture or application architecture is one of several architecture domains that form the pillars of an enterprise architecture (EA).

<span class="mw-page-title-main">Enterprise life cycle</span> Process of changing an enterprise over time

Enterprise life cycle (ELC) in enterprise architecture is the dynamic, iterative process of changing the enterprise over time by incorporating new business processes, new technology, and new capabilities, as well as maintenance, disposition and disposal of existing elements of the enterprise.

In software engineering, a software development process or software development life cycle (SDLC) is a process of planning and managing software development. It typically involves dividing software development work into smaller, parallel, or sequential steps or sub-processes to improve design and/or product management. The methodology may include the pre-definition of specific deliverables and artifacts that are created and completed by a project team to develop or maintain an application.

The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball’s approach to dimensional modeling conformed dimension.

Utilising the DW/BI system is the final step before business users gain access to the information. The first impression the business community gets is when introduced to the BI frontend drives. Because acceptance from users is important, the deployment must be thoughtfully planned to ensure that the DW/BI system can perform and deliver the results it is designed to deliver. To ensure that the implementation can perform and deliver it has to be exposed to extensive end-to-end testing. The process of testing is an ongoing activity along the development process, because defects that should be correct later in the lifecycle are difficult to find and are associated with exponentially increasing costs. A way of securing that the testing is done through the development lifecycle is to follow a methodology. Kimball prescribe that before adding the DW/BI system, it should have passed a mock test that will cover the following procedures;

Agile Business Intelligence (BI) refers to the use of Agile software development for BI projects to reduce the time it takes to show value to the organization in comparison to other approaches. It helps in quickly adapting to changing business needs. Agile BI enables the BI team, business people or in general stakeholders to make better business decisions, and to start doing this more quickly.

References

Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The data warehouse lifecycle toolkit (2nd ed.). Wiley Publishing, Inc. ISBN   978-0-470-14977-5