DATA
WAREHOUSE SCHEMAS
The
schema is a logical description of the entire database. The schema includes the
name and description of records of all record types including all associated
data-items and aggregates. Likewise the database, the data warehouse also
requires the schema. The database uses relational model on the other hand the
data warehouse uses:
1. Star Schema,
2. Snowflake Schema and
3. Fact constellation Schema.
Star Schema
Star schema
architecture is the simplest data warehouse design. The main feature of a star
schema is a table at the center, called the fact table and the dimension tables
which allow browsing of specific categories, summarizing, drill-downs and
specifying criteria. Typically, most of the fact tables in a star schema are in
database third normal form, while dimensional tables are de-normalized (second
normal form).
Despite the fact that,
star schema is the simplest data warehouse architecture, it is most commonly
used in most data warehouse implementations across the world today (about
90-95% cases).
Fact table
The fact table is not
a typical relational database table as it is de-normalized on purpose - to
enhance query response times. The fact table typically contains records that
are ready to explore, usually with ad hoc queries. Records in the fact table
are often referred to as events, due to the time-variant nature of a data
warehouse environment.
Typical fact tables
in a global enterprise data warehouse are (usually there may be additional
company or business specific fact tables):
1.
Sales
fact table - contains all details regarding sales
2.
Orders
fact table - in some cases the table can be split into open orders and
historical orders. Sometimes the values for historical orders are stored in a
sales fact table.
3.
Budget
fact table - usually grouped by month and loaded once at the end of a year.
4.
Forecast
fact table - usually grouped by month and loaded daily, weekly or monthly.
5.
Inventory
fact table - report stocks, usually refreshed daily
Dimension table
Nearly all of the
information in a typical fact table is also present in one or more dimension
tables. The main purpose of maintaining Dimension Tables is to allow browsing
the categories quickly and easily. The primary keys of each of the dimension
tables are linked together to form the composite primary key of the fact table.
In a star schema design, there is only one de-normalized table for a given
dimension.
Typical dimension
tables in a data warehouse are:
1.
Time
dimension table
2.
Customers
dimension table
3.
Products
dimension table
4.
Key
account managers (KAM) dimension table
5.
Sales
office dimension table
·
In
star schema each dimension is represented with only one dimension table.
·
This
dimension table contains the set of attributes.
·
In
the following diagram we have shown the sales data of a company with respect to
the four dimensions namely, time, item, branch and location.
·
There
is a fact table at the center. This fact table contains the keys to each of
four dimensions.
·
The
fact table also contain the attributes namely, dollars sold and units sold.
Note: Each dimension has only one dimension
table and each table holds a set of attributes. For example the location
dimension table contains the attribute set {location_key, street, city,
province_or_state, country}. This constraint may cause data redundancy. For
example the "Vancouver" and "Victoria" both cities are both
in Canadian province of British Columbia. The entries for such cities may cause
data redundancy along the attributes province_or_state and country.
Snowflake
Schema
Snowflake schema
architecture is a more complex variation of a star schema design. The main
difference is that dimensional tables in a snowflake schema are normalized, so
they have a typical relational database design. Snowflake schemas are generally
used when a dimensional table becomes very big and when a star schema can’t
represent the complexity of a data structure. For example if a PRODUCT
dimension table contains millions of rows, the use of snowflake schemas should
significantly improve performance by moving out some data to other table.
The problem is that
the more normalized the dimension table is, the more complicated SQL joins must
be issued to query them. This is because in order for a query to be answered,
many tables need to be joined and aggregates generated.
·
In
Snowflake schema some dimension tables are normalized.
·
The
normalization split up the data into additional tables.
·
Unlike
Star schema, the dimensions table in snowflake schema are normalized for
example the item dimension table in star schema is normalized and split into
two dimension tables namely, item and supplier table.
·
Therefore
now the item dimension table contains the attributes item_key, item_name, type,
brand, and supplier-key.
·
The
supplier key is linked to supplier dimension table. The supplier dimension
table contains the attributes supplier_key, and supplier_type.
·
Note:
Due to normalization in Snowflake schema the redundancy is reduced therefore it
becomes easy to maintain and save storage space.
Fact
Constellation Schema
·
In
fact Constellation there are multiple fact tables. This schema is also known as
Galaxy schema.
·
In
the following diagram we have two fact tables namely, sales and shipping.
·
The
sale fact table is same as that in star schema.
·
The
shipping fact table has the five dimensions namely, item_key, time_key,
shipper-key, from-location.
·
The
shipping fact table also contains two measures namely, dollars sold and units
sold.
·
It
is also possible for dimension table to share between fact tables. For example
time, item and location dimension tables are shared between sales and shipping
fact table.
No comments:
Post a Comment