Redundancy
Redundancy in information
theory is
the number of bits used to transmit or store a message minus the number of bits
of actual information in the message. Informally, it is the amount of wasted
"space" used to transmit or store certain data. Data compression is a way to reduce or eliminate unwanted
redundancy, while checksums are a way of adding desired redundancy
for purposes of error detection when communicating over a noisy channel
of limited capacity.
Data
redundancy occurs in database systems
which have a field that is repeated in two or more tables. For instance, when
customer data are duplicated and attached with each product
bought, then redundancy of data is a known source
of inconsistency since customer might appear with different values for given
attribute.
Definition - What does Data Redundancy mean?
Data redundancy is a condition
created within a database or data storage technology in which the same piece of
data is held in two separate places.
This can mean two different fields
within a single database, or two different spots in multiple software
environments or platforms. Whenever data is repeated, this basically
constitutes data redundancy. This can occur by accident, but is also done
deliberately for backup and recovery purposes.
Within the general definition of
data redundancy, there are different classifications based on what is
considered appropriate in database management, and what is considered excessive
or wasteful. Wasteful data redundancy generally occurs when a given piece of
data does not have to be repeated, but ends up being duplicated due to
inefficient coding or process complexity.
A positive type of data redundancy
works to safeguard data and promote consistency. Many developers consider it
acceptable for data to be stored in multiple places. The key is to have a
central, master field or space for this data, so that there is a way to update
all of the places where data is redundant through one central access point.
Otherwise, data redundancy can lead to big problems with data inconsistency, where
one update does not automatically update another field. As a result, pieces of
data that are supposed to be identical end up having different values.
Noisy Data
Noisy
data is meaningless data. The term has often been used as a synonym for corrupt data. However, its meaning has expanded to
include any data that cannot be understood and interpreted correctly by
machines, such as unstructured text. Any data that has been received, stored, or
changed in such a manner that it cannot be read or used by the program that
originally created it can be described as noisy.
Noisy
data unnecessarily increases the amount of storage space required and can also
adversely affect the results of any data mining analysis. Statistical analysis can use
information gleaned from historical data to weed out noisy data and facilitate
data mining.
Noisy
data can be caused by hardware failures, programming errors and gibberish input
from speech or optical character recognition (OCR) programs. Spelling errors, industry
abbreviations and slang can also impede machine reading.
No comments:
Post a Comment