CODING CLUB: DATA WAREHOUSE SCHEMAS

DATA WAREHOUSE SCHEMAS

The schema is a logical description of the entire database. The schema includes the name and description of records of all record types including all associated data-items and aggregates. Likewise the database, the data warehouse also requires the schema. The database uses relational model on the other hand the data warehouse uses:

1. Star Schema,

2. Snowflake Schema and

3. Fact constellation Schema.

Star Schema

Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summarizing, drill-downs and specifying criteria. Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de-normalized (second normal form).

Despite the fact that, star schema is the simplest data warehouse architecture, it is most commonly used in most data warehouse implementations across the world today (about 90-95% cases).

Fact table

The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query response times. The fact table typically contains records that are ready to explore, usually with ad hoc queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data warehouse environment.

Typical fact tables in a global enterprise data warehouse are (usually there may be additional company or business specific fact tables):

1. Sales fact table - contains all details regarding sales

2. Orders fact table - in some cases the table can be split into open orders and historical orders. Sometimes the values for historical orders are stored in a sales fact table.

3. Budget fact table - usually grouped by month and loaded once at the end of a year.

4. Forecast fact table - usually grouped by month and loaded daily, weekly or monthly.

5. Inventory fact table - report stocks, usually refreshed daily

Dimension table

Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily. The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. In a star schema design, there is only one de-normalized table for a given dimension.

Typical dimension tables in a data warehouse are:

1. Time dimension table

2. Customers dimension table

3. Products dimension table

4. Key account managers (KAM) dimension table

5. Sales office dimension table

· In star schema each dimension is represented with only one dimension table.

· This dimension table contains the set of attributes.

· In the following diagram we have shown the sales data of a company with respect to the four dimensions namely, time, item, branch and location.

· There is a fact table at the center. This fact table contains the keys to each of four dimensions.

· The fact table also contain the attributes namely, dollars sold and units sold.

Note: Each dimension has only one dimension table and each table holds a set of attributes. For example the location dimension table contains the attribute set {location_key, street, city, province_or_state, country}. This constraint may cause data redundancy. For example the "Vancouver" and "Victoria" both cities are both in Canadian province of British Columbia. The entries for such cities may cause data redundancy along the attributes province_or_state and country.

Snowflake Schema

Snowflake schema architecture is a more complex variation of a star schema design. The main difference is that dimensional tables in a snowflake schema are normalized, so they have a typical relational database design. Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains millions of rows, the use of snowflake schemas should significantly improve performance by moving out some data to other table.

The problem is that the more normalized the dimension table is, the more complicated SQL joins must be issued to query them. This is because in order for a query to be answered, many tables need to be joined and aggregates generated.

· In Snowflake schema some dimension tables are normalized.

· The normalization split up the data into additional tables.

· Unlike Star schema, the dimensions table in snowflake schema are normalized for example the item dimension table in star schema is normalized and split into two dimension tables namely, item and supplier table.

· Therefore now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key.

· The supplier key is linked to supplier dimension table. The supplier dimension table contains the attributes supplier_key, and supplier_type.

· Note: Due to normalization in Snowflake schema the redundancy is reduced therefore it becomes easy to maintain and save storage space.

Fact Constellation Schema

· In fact Constellation there are multiple fact tables. This schema is also known as Galaxy schema.

· In the following diagram we have two fact tables namely, sales and shipping.

· The sale fact table is same as that in star schema.

· The shipping fact table has the five dimensions namely, item_key, time_key, shipper-key, from-location.

· The shipping fact table also contains two measures namely, dollars sold and units sold.

· It is also possible for dimension table to share between fact tables. For example time, item and location dimension tables are shared between sales and shipping fact table.

CODING CLUB

Tuesday, 23 December 2014

DATA WAREHOUSE SCHEMAS

Star Schema

Snowflake Schema

Fact Constellation Schema

No comments:

Post a Comment