CODING CLUB: DATA MINING

DATA-MINING

Data-Mining can be defined as the computational process of examining large pre-existing databases and discovering patterns among the data set in order to generate new information by the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Data mining systems just make it easier for us to handle large amounts of data.

Most IT systems in use are transactional. This means that transactions are processed in the system and the data of the transaction is stored in the system’s database.

Data mining software analyses relationships and patterns in this stored transaction data. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.
Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, a sports shop that analyzed their data know that there is an 85% chance that a person buying new mountain bike will also buy a helmet, gloves and a water bottle. However, customers who come in requesting a helmet will probably not buy a bike, but they most likely will also buy gloves. This knowledge can assist the manager in ordering the correct stock and assist the sale personnel in suggesting add-on purchasing.
Associations: Data can be mined to identify associations.
Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system.
Store and manage data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze data by application software.
Present data in a useful format, such as a graph or table.

Different techniques of Data Mining are available:

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.
Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating two-way splits while CHAID segments use chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID.
Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on statistical significance.
Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

Different Tools for Data Mining

Most data mining tools can be classified into one of three categories: traditional data mining tools, dashboards, and text-mining tools. Below is a description of each.

§ Traditional Data Mining Tools. Traditional data mining programs help companies establish data patterns and trends by using a number of complex algorithms and techniques. Some of these tools are installed on the desktop to monitor the data and highlight trends and others capture information residing outside a database. The majority are available in both Windows and UNIX versions, although some specialize in one operating system only. In addition, while some may concentrate on one database type, most will be able to handle any data using online analytical processing or a similar technology.

§ Dashboards. Installed in computers to monitor information in a database, dashboards reflect data changes and updates onscreen — often in the form of a chart or table — enabling the user to see how the business is performing. Historical data also can be referenced, enabling the user to see where things have changed (e.g., increase in sales from the same period last year). This functionality makes dashboards easy to use and particularly appealing to managers who wish to have an overview of the company's performance.

§ Text-mining Tools. The third type of data mining tool sometimes is called a text-mining tool because of its ability to mine data from different kinds of text — from Microsoft Word and Acrobat PDF documents to simple text files, for example. These tools scan content and convert the selected data into a format that is compatible with the tool's database, thus providing users with an easy and convenient way of accessing data without the need to open different applications. Scanned content can be unstructured (i.e., information is scattered almost randomly across the document, including e-mails, Internet pages, audio and video data) or structured (i.e., the data's form and purpose is known, such as content found in a database). Capturing these inputs can provide organizations with a wealth of information that can be mined to discover trends, concepts, and attitudes.

Real Life Applications of Data Mining

· Games

o Data Mining used in extraction of human-usable strategies.

· Business

o Automated prediction of trends and behaviors

o Automated discovery of previously unknown patterns

o In business, data mining is the analysis of historical business activities, stored as static data in data warehouse databases. The goal is to reveal hidden patterns and trends. Data mining software uses advanced pattern recognition algorithms to sift through large amounts of data to assist in discovering previously unknown strategic business information.

o raw data is being collected by companies at an exploding rate

o Data mining in customer relationship management applications can contribute significantly to the bottom line

o Data mining can be helpful to human resources (HR) departments in identifying the characteristics of their most successful employees

o Market basket analysis, relates to data-mining use in retail sales.

o Data mining is a highly effective tool in the catalog marketing industry

· Science and engineering

o Data Mining helps address the important goal of understanding the mapping relationship between the inter-individual variations in human DNA sequence and the variability in disease susceptibility.

o In the area of electrical power engineering, data mining methods have been widely used for condition monitoring of high voltage electrical equipment.

o In educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning efforts.

o Data mining has been applied to software artifacts within the realm of software engineering: Mining Software Repositories.

· Human rights

o Data mining of government records.

· Medical data mining

· Spatial data mining

o Spatial data mining is the application of data mining methods to spatial data. The end objective of spatial data mining is to find patterns in data with respect to geography.

· Temporal data mining

o Data may contain attributes generated and recorded at different times. In this case finding meaningful relationships in the data may require considering the temporal order of the attributes. A temporal relationship may indicate a causal relationship, or simply an association.

· Sensor data mining

o Wireless sensor networks can be used for facilitating the collection of data for spatial data mining for a variety of applications such as air pollution monitoring

· Visual data mining

o In the process of turning from analogical into digital, large data sets have been generated, collected, and stored discovering statistical patterns, trends and information which is hidden in data, in order to build predictive patterns.

· Music data mining

· Surveillance

o Data mining has been used by the U.S. government.

· Pattern mining

o "Pattern mining" is a data mining method that involves finding existing patterns in data. In this context patterns often means association rules. The original motivation for searching association rules came from the desire to analyze supermarket transaction data, that is, to examine customer behavior in terms of the purchased products.

· Subject-based data mining

o "Subject-based data mining" is a data mining method involving the search for associations between individuals in data

· Knowledge grid

o Knowledge discovery "On the Grid" generally refers to conducting knowledge discovery in an open environment using grid computing concepts, allowing users to integrate data from various online data sources, as well make use of remote resources, for executing their data mining tasks.

CODING CLUB

Tuesday, 23 December 2014

DATA MINING

No comments:

Post a Comment