Star schema

Last updated
Star-schema.png
Star Schema.png

In computing, the star schema or star model is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. [1] The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries. [2]

Contents

The star schema gets its name from the physical model's [3] resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star's points.

Model

The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, time, distance, speed and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.

A star schema that has many dimensions is sometimes called a centipede schema. [4] Having dimensions of only a few attributes, while simpler to maintain, results in queries with many table joins and makes the star schema less easy to use.

Fact tables

Fact tables record measurements or metrics for a specific event. Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept. [4] Fact tables are designed to a low level of uniform detail (referred to as "granularity" or "grain"), meaning facts can record events at a very atomic level. This can result in the accumulation of a large number of records in a fact table over time. Fact tables are defined as one of three types:

Fact tables are generally assigned a surrogate key to ensure each row can be uniquely identified. This key is a simple primary key.

Dimension tables

Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data. Dimensions can define a wide variety of characteristics, but some of the most common attributes defined by dimension tables include:

Dimension tables are generally assigned a surrogate primary key, usually a single-column integer data type, mapped to the combination of dimension attributes that form the natural key.

Benefits

Star schemas are denormalized, meaning the typical rules of normalization applied to transactional relational databases are relaxed during star-schema design and implementation. The benefits of star-schema denormalization are:

Use and comparison with the snowflake schema

A star schema denormalizes dimension attributes into single wide tables to improve understandability and reduce join complexity for analytic workloads. By contrast, a snowflake schema normalizes dimension hierarchies into multiple linked tables. Kimball recommends avoiding snowflaking unless there is a clear need (for example, extremely large dimensions) because it adds complexity for users and can hurt query performance. [5] [6] Star schemas align well with multidimensional/OLAP models commonly used in decision support. [7]

Typical tables

Dimensional modeling distinguishes a central fact table and surrounding dimension tables. Common fact table types in star schemas are transaction, periodic snapshot, and accumulating snapshot; frequently used conformed dimensions include date/time, product, customer, organization, and geography. [8] [9]

Query performance considerations

Analytic queries over a star schema usually join one large fact table with a handful of relatively small dimensions; many DBMSs implement ‘‘star-join’’ optimizations for this pattern. Performance characteristics of such workloads are commonly studied using the Star Schema Benchmark (SSB). [7] [10]

Example

Star schema used by example query Priklad skhemi zirki.png
Star schema used by example query

Consider a database of sales, perhaps from a store chain, classified by date, store and product. The image of the schema to the right is a star schema version of the sample schema provided in the snowflake schema article.

Fact_Sales is the fact table and there are three dimension tables Dim_Date, Dim_Store and Dim_Product.

Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id). The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).

For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:

SELECTP.Brand,S.CountryASCountries,SUM(F.Units_Sold)FROMFact_SalesFINNERJOINDim_DateDON(F.Date_Id=D.Id)INNERJOINDim_StoreSON(F.Store_Id=S.Id)INNERJOINDim_ProductPON(F.Product_Id=P.Id)WHERED.Year=1997ANDP.Product_Category='tv'GROUPBYP.Brand,S.Country

See also

References

  1. Dedić, N. and Stanier C., 2016., "An Evaluation of the Challenges of Multilingualism in Data Warehouse Development" in 18th International Conference on Enterprise Information Systems - ICEIS 2016, p. 196.
  2. DWH Schemas, 2009, archived from the original on 16 July 2010
  3. C J Date, "An Introduction to Database Systems (Eighth Edition)", p. 708
  4. 1 2 Ralph Kimball and Margy Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), p. 393
  5. "Snowflaked Dimension". Kimball Group. Retrieved 2025-08-15.
  6. "Design Tip #105: Snowflakes, Outriggers, and Bridges". Kimball Group. 2008-09-09. Retrieved 2025-08-15.
  7. 1 2 Chaudhuri, Surajit; Dayal, Umeshwar (1997). "An overview of data warehousing and OLAP technology" (PDF). SIGMOD Record. 26 (1): 65–74.
  8. "Dimensional Modeling Techniques" (PDF). Kimball Group. September 2013. Retrieved 2025-08-15.
  9. Kimball, Ralph; Ross, Margy (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (PDF) (3rd ed.). Wiley. ISBN   9781118530801.
  10. O'Neil, Patrick; O'Neil, Elizabeth; Chen, Xuedong (2009-06-05). Star Schema Benchmark (PDF) (Report). UMass Boston. Retrieved 2025-08-15.