DATA-MINING
Data-Mining
can be defined as the computational process of examining large pre-existing databases and discovering
patterns among the data set in order to generate new information by the
intersection of artificial intelligence, machine learning, statistics,
and database systems. The overall goal of the data
mining process is to extract information from a data set and transform it into
an understandable structure for further use.
Data mining systems just
make it easier for us to handle large amounts of data.
Most
IT systems in use are transactional. This means that transactions are processed
in the system and the data of the transaction is stored in the system’s
database.
Data
mining software analyses relationships and patterns in this stored transaction
data. Several types of analytical software are available: statistical, machine
learning, and neural networks. Generally, any of four types of relationships
are sought:
- Classes: Stored data is used to locate data in
predetermined groups. For example, a restaurant chain could mine customer
purchase data to determine when customers visit and what they typically
order. This information could be used to increase traffic by having daily
specials.
- Clusters: Data items are grouped according to logical
relationships or consumer preferences. For example, a sports shop that
analyzed their data know that there is an 85% chance that a person buying
new mountain bike will also buy a helmet, gloves and a water bottle.
However, customers who come in requesting a helmet will probably not buy a
bike, but they most likely will also buy gloves. This knowledge can assist
the manager in ordering the correct stock and assist the sale personnel in
suggesting add-on purchasing.
- Associations: Data can be mined to identify associations.
- Sequential
patterns: Data is mined to
anticipate behavior patterns and trends. For example, an outdoor equipment
retailer could predict the likelihood of a backpack being purchased based
on a consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
- Extract,
transform, and load transaction data
onto the data warehouse system.
- Store and
manage data in a multidimensional database
system.
- Provide
data access to business analysts and
information technology professionals.
- Analyze
data by application software.
- Present
data in a useful format,
such as a graph or table.
Different techniques of Data Mining are available:
- Artificial
neural networks: Non-linear predictive models
that learn through training and resemble biological neural networks in
structure.
- Genetic
algorithms: Optimization techniques that
use process such as genetic combination, mutation, and natural selection
in a design based on the concepts of natural evolution.
- Decision
trees: Tree-shaped structures that
represent sets of decisions. These decisions generate rules for the
classification of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square Automatic
Interaction Detection (CHAID) CART and CHAID are decision tree techniques
used for classification of a dataset. They provide a set of rules that you
can apply to a new (unclassified) dataset to predict which records will
have a given outcome. CART segments a dataset by creating two-way splits
while CHAID segments use chi square tests to create multi-way splits. CART
typically requires less data preparation than CHAID.
- Nearest
neighbor method: A technique that classifies
each record in a dataset based on a combination of the classes of the k
record(s) most similar to it in a historical dataset (where k 1).
Sometimes called the k-nearest neighbor technique.
- Rule induction: The extraction of useful if-then rules from data
based on statistical significance.
- Data
visualization: The visual interpretation of
complex relationships in multidimensional data. Graphics tools are used to
illustrate data relationships.
Different Tools for Data
Mining
Most data mining tools can be
classified into one of three categories: traditional data mining tools,
dashboards, and text-mining tools. Below is a description of each.
§ Traditional Data Mining Tools. Traditional data mining programs help companies
establish data patterns and trends by using a number of complex algorithms and
techniques. Some of these tools are installed on the desktop to monitor the
data and highlight trends and others capture information residing outside a
database. The majority are available in both Windows and UNIX versions,
although some specialize in one operating system only. In addition, while some
may concentrate on one database type, most will be able to handle any data
using online analytical
processing or a similar technology.
§ Dashboards. Installed
in computers to monitor information in a database, dashboards reflect data
changes and updates onscreen — often in the form of a chart or table — enabling
the user to see how the business is performing. Historical data also can be
referenced, enabling the user to see where things have changed (e.g., increase
in sales from the same period last year). This functionality makes dashboards
easy to use and particularly appealing to managers who wish to have an overview
of the company's performance.
§ Text-mining Tools. The
third type of data mining tool sometimes is called a text-mining tool because
of its ability to mine data from different kinds of text — from Microsoft Word
and Acrobat PDF documents to simple text files, for example. These tools scan
content and convert the selected data into a format that is compatible with the
tool's database, thus providing users with an easy and convenient way of
accessing data without the need to open different applications. Scanned content
can be unstructured (i.e., information is scattered almost randomly across the
document, including e-mails, Internet pages, audio and video data) or
structured (i.e., the data's form and purpose is known, such as content found
in a database). Capturing these inputs can provide organizations with a wealth
of information that can be mined to discover trends, concepts, and attitudes.
Real Life Applications of Data Mining
·
Games
o Data
Mining used in extraction of human-usable strategies.
·
Business
o Automated
prediction of trends and behaviors
o Automated
discovery of previously unknown patterns
o
In business, data mining is the analysis
of historical business activities, stored as static data in data warehouse
databases. The goal is to reveal hidden patterns and trends. Data mining
software uses advanced pattern recognition algorithms to sift through large
amounts of data to assist in discovering previously unknown strategic business
information.
o raw data is being collected by companies
at an exploding rate
o Data
mining in customer relationship management applications can contribute significantly
to the bottom line
o Data
mining can be helpful to human resources (HR) departments in identifying the
characteristics of their most successful employees
o Data
mining is a highly effective tool in the catalog marketing industry
·
Science and engineering
o
Data
Mining helps address the important goal of
understanding the mapping relationship between the inter-individual variations
in human DNA sequence
and the variability in disease susceptibility.
o In
the area of electrical power engineering, data mining methods have been widely
used for condition
monitoring of
high voltage electrical equipment.
o In
educational research, where data mining has been used to study the factors
leading students to choose to engage in behaviors which reduce their learning
efforts.
o Data
mining has been applied to software artifacts within the realm of software
engineering: Mining Software Repositories.
·
Human rights
o Data
mining of government records.
·
Medical data mining
·
Spatial data mining
o Spatial
data mining is the application of data mining methods to spatial data. The end
objective of spatial data mining is to find patterns in data with respect to
geography.
·
Temporal data mining
o Data
may contain attributes generated and recorded at different times. In this case
finding meaningful relationships in the data may require considering the
temporal order of the attributes. A temporal relationship may indicate a causal
relationship, or simply an association.
·
Sensor data mining
o Wireless sensor networks can be used for facilitating the
collection of data for spatial data mining for a variety of applications such
as air pollution monitoring
·
Visual data mining
o In
the process of turning from analogical into digital, large data sets have been
generated, collected, and stored discovering statistical patterns, trends and information
which is hidden in data, in order to build predictive patterns.
·
Music data mining
·
Surveillance
o Data
mining has been used by the U.S. government.
·
Pattern mining
o "Pattern
mining" is a data mining method that involves finding existing patterns in data. In this context patterns often means association rules.
The original motivation for searching association rules came from the desire to
analyze supermarket transaction data, that is, to examine customer behavior in
terms of the purchased products.
·
Subject-based data mining
o "Subject-based
data mining" is a data mining method involving the search for associations
between individuals in data
·
Knowledge grid
o Knowledge
discovery "On the Grid" generally refers to conducting knowledge
discovery in an open environment using grid computing concepts, allowing users to integrate
data from various online data sources, as well make use of remote resources,
for executing their data mining tasks.
No comments:
Post a Comment