Synonyms

Snowflake join schema

Definition

Asnowflake schema has one “central” table whose primary key is compound, i.e., consisting of multiple attributes. Each one of these attributes is a foreign key to one of the remaining tables, which may, in turn, have some of its non-key attributes each be a foreign key to yet another, different table. This continues recursively with the remaining tables, until they are exhausted, forming chains or trees of foreign key dependencies rooted at the “central” table, i.e., each table in the schema (except the “central” table) is pointed to by exactly one such foreign key. (In the above, without loss of generality, we make the assumption that all tables except the “central” table have simple primary keys. This is usually the case in almost all practical situations, and as for efficiency, these keys are often generated, surrogate keys.)

Key Points

Many data warehouses (see definitional entry for “Data Warehouse”) that represent the multidimensional conceptual data model in a relational fashion [1, 2] store their primary data as well as the data cubes derived from it in snowflake schemas, as an alternative to star schemas. As in star schemas, the “central” table and the remaining tables of the definition above correspond, respectively, to the fact table and the dimension tables that are typically found in data warehouses. Each fact (tuple) in the fact table consists of a set of numeric measures, comprising the objects of analysis, and a set of dimensions, which uniquely determine the set of measures. The remaining tables store the attributes of the aforementioned dimensions at different levels of granularity.

Unlike star schemas, snowflake schemas can explicitly capture hierarchies in the dimensions, with each table in each chain (or tree path) of foreign key dependencies corresponding to one level of one such hierarchy. For instance, dimension Store in the example below contains values at different levels of detail, forming the hierarchy Street→City→State. On the contrary, star schemas capture all levels of a hierarchical dimension in a single, de-normalized table. Starting from a star schema (usually in second normal form), one may generate the corresponding snowflake schema (usually in third normal form at least) by normalization, decomposing the dimensions into multiple tables. Accordingly, star schemas lend themselves to simpler and usually faster queries, while snowflake schemas are easier to maintain and require less space.

For example, consider a data warehouse of a retail chain with many stores around a country. The dimensions may be the products sold, the stores themselves with their locations, and the dates, while the numeric measures may be the number of items and the total monetary amount corresponding to a particular product sold in a particular store on a particular date. The relevant snowflake schema, with the product, store, and date dimensions normalized, is shown below, where SalesSummary is the fact table, primary keys are in italics, and each attribute of the fact table primary key as well as each non-key ‘Id’ attribute of the other tables is a foreign key.

  • SalesSummary(ProductId,StoreId,DateId, NumOfItems, TotalAmount)

    Product(ProductId, ProdName, ProdDescr, CategoryId, UnitPrice)

    Category(??CategoryId, CategoryDescr)

    Store(??StoreId, StreetId)

    Street(??StreetId, Street, CityId)

    City(??CityId, City, StateId)

    State(??StateId, State)

    Date(??DateId, Date, MonthId)

    Month(??MonthId, Month, YearId)

    Year(???YearId, Year)

Cross-References