Data Warehouse definition by William H. Inmonna:
A data warehouse is
a:
·
Subject-oriented
·
Integrated
·
Time-varying
·
Non-volatile
collection of data that supports management's decision-making
process.
A data warehouse is a centralized repository that stores data from multiple heterogeneous information sources and transforms them into a common, multidimensional data model for efficient querying and analysis.
The
term "Data Warehouse" was first coined by Bill Inmon in 1990. He said
that Data warehouse is subject Oriented, Integrated, Time-Variant and
nonvolatile collection of data. This data helps in supporting decision making
process by analyst in an organization.
The data warehouse is constructed by integrating the data
from multiple heterogeneous sources. This data warehouse supports analytical
reporting, structured and/or ad hoc queries and decision making.
Data Warehouse Tools and Utilities Functions
The
following are the functions of Data Warehouse tools and Utilities:
·
Data
Extraction -
Data Extraction involves gathering the data from multiple heterogeneous
sources.
·
Data
Cleaning -
Data Cleaning involves finding and correcting the errors in data.
·
Data
Transformation - Data Transformation involves converting
data from legacy format to warehouse format.
·
Data
Loading -
Data Loading involves sorting, summarizing, consolidating, checking integrity
and building indices and partitions.
·
Refreshing - Refreshing involves updating from
data sources to warehouse.
Note: Data Cleaning and Data Transformation
are important steps in improving the quality of data and data mining results.
Data
Warehouse Features
The
key features of Data Warehouse such as Subject Oriented, Integrated,
Nonvolatile and Time-Variant are discussed below:
·
Subject
Oriented -
The Data Warehouse is Subject Oriented because it provides us the information
around a subject rather the organization's ongoing operations. These subjects
can be product, customers, suppliers, sales, revenue etc. The data warehouse
does not focus on the ongoing operations rather it focuses on modelling and
analysis of data for decision making.
·
Integrated - Data Warehouse is constructed by
integration of data from heterogeneous sources such as relational databases,
flat files etc. This integration enhances the effective analysis of data.
·
Time-Variant - The Data in Data Warehouse is
identified with a particular time period. The data in data warehouse provide
information from historical point of view.
·
Non
Volatile -
Nonvolatile means that the previous data is not removed when new data is added
to it. The data warehouse is kept separate from the operational database
therefore frequent changes in operational database are not reflected in data
warehouse.
Note: - Data Warehouse does not require
transaction processing, recovery and concurrency control because it is
physically stored separate from the operational database.
Data
Warehouse Applications
As
discussed before Data Warehouse helps the business executives in organize, analyze
and use their data for decision making. Data Warehouse serves as a soul part of
a plan-execute-assess "closed-loop" feedback system for enterprise
management. Data Warehouse is widely used in the following fields:
1.
financial
services
2.
Banking
Services
3.
Consumer
goods
4.
Retail
sectors.
5.
Controlled
manufacturing
The
operational database undergoes the per day transactions which causes the
frequent changes to the data on daily basis. But if in future the business
executive wants to analyze the previous feedback on any data such as product,
supplier, or the consumer data. In this case the analyst will be having no data
available to analyze because the previous data is updated due to transactions.
The
Data Warehouses provide us generalized and consolidated data in multidimensional
view. Along with generalize and consolidated view of data the Data Warehouses
also provide us Online Analytical Processing (OLAP) tools. These tools help us
in interactive and effective analysis of data in multidimensional space. This
analysis results in data generalization and data mining.
The
data mining functions like association, clustering, classification, prediction
can be integrated with OLAP operations to enhance interactive mining of
knowledge at multiple level of abstraction. That's why data warehouse has now
become important platform for data analysis and online analytical processing.
Understanding
Data Warehouse
·
The
Data Warehouse is that database which is kept separate from the organization's
operational database.
·
There
is no frequent updating done in data warehouse.
·
Data
warehouse possess consolidated historical data which help the organization to
analyze its business.
·
Data
warehouse helps the executives to organize, understand and use their data to
take strategic decision.
·
Data
warehouse systems available, helps in integration of diverse application
systems.
·
The
Data warehouse system allows analysis of consolidated historical data analysis.
Why Data
Warehouse is separated from Operational Databases
The
following are the reasons why Data Warehouse is kept separate from operational
databases:
·
The
operational database is constructed for well-known tasks and workload such as
searching particular records, indexing etc. but the data warehouse queries are
often complex and it presents the general form of data.
·
Operational
database supports the concurrent processing of multiple transactions.
Concurrency control and recovery mechanism are required for operational
databases to ensure robustness and consistency of database.
·
Operational
database query allow to read and modify while the OLAP query need only read only access of stored data.
·
Operational
database maintain the current data on the other hand data warehouse maintain
the historical data.
Data
Warehouse Application Types
Information
processing, Analytical processing and Data Mining are the three types of data
warehouse applications that are discussed below:
·
Information
processing -
Data Warehouse allow us to process the information stored in it. The
information can be processed by means of querying, basic statistical analysis,
reporting using crosstabs, tables, charts, or graphs.
·
Analytical
Processing -
Data Warehouse supports analytical processing of the information stored in it.
The data can be analyzed by means of basic OLAP operations, including
slice-and-dice, drill down, drill up, and pivoting.
·
Data
Mining -
Data Mining supports knowledge discovery by finding the hidden patterns and
associations, constructing analytical models, performing classification and
prediction. These mining results can be presented using the visualization
tools.
S.No.
|
Property
|
OLTP
|
OLAP
|
1
|
User
|
Clerk,
IT Professional
|
Knowledge
worker
|
2
|
Function
|
Day
to Day Operations
|
Decision
Support
|
3
|
DB Design
|
Application-oriented(E-R
based)
|
Subject-Oriented(Star,
SnowFlake)
|
4
|
Data
|
Current,
Isolated
|
Historical,
Consolidated
|
5
|
View
|
Detailed,
Flat Relational
|
Summarized,
Multi-dimensional
|
6
|
Usage
|
Structured,
Repetitive
|
Ad-Hoc
|
7
|
Unit of Work
|
Short,
Simple transaction
|
Complex
Query
|
8
|
Access
|
Read/write
|
Read
Mostly
|
9
|
Operations
|
Index/Hash
on primary key
|
Lots
of Scans
|
10
|
#Records Accessed
|
Tens
|
Millions
|
11
|
# Users
|
Thousands
|
Hundreds
|
12
|
Data Base Size
|
100
MB-GB
|
100GB-TB
|
13
|
Metric
|
Trans.
throughput
|
Query
throughput, response
|
No comments:
Post a Comment