Data Compression

Only available on StudyMode
  • Topic: Data compression, Huffman coding, Lossless data compression
  • Pages : 40 (11698 words )
  • Download(s) : 97
  • Published : April 25, 2013
Open Document
Text Preview
DATA COMPRESSION
The word data is in general used to mean the information in digital form on which computer programs operate, and compression means a process of removing redundancy in the data. By 'compressing data', we actually mean deriving techniques or, more specifically, designing efficient algorithms to: * represent data in a less redundant fashion

* remove the redundancy in data
* Implement compression algorithms, including both compression and decompression.

Data Compression means encoding the information in a file in such a way that it takes less space. Compression is used just about everywhere. All the images you get on the web are compressed, typically in the JPEG or GIF formats, most modems use compression, HDTV will be compressed using MPEG-2, and several file systems automatically compress files when stored, and the rest of us do it by hand. The task of compression consists of two components, an encoding algorithm that takes a message and generates a “compressed” representation (hopefully with fewer bits), and a decoding algorithm that reconstructs the original message or some approximation of it from the compressed representation. Compression denotes compact representation of data.

Examples for the kind of data we typically want to compress are e.g. * text
* source-code
* arbitrary files
* images
* video
* audio data
* speech

Why do we need compression ?
Compression Technology is employed to efficiently use storage space, to save on transmission capacity and transmission time, respectively. Basically, its all about saving resources and money. Despite of the overwhelming advances in the areas of storage media and transmission networks it is actually quite a surprise that still compression technology is required. One important reason is that also the resolution and amount of digital data has increased (e.g. HD-TV resolution, ever-increasing sensor sizes in consumer cameras), and that there are still application areas where resources are limited, e.g. wireless networks. Apart from the aim of simply reducing the amount of data, standards like MPEG-4, MPEG-7, and MPEG-21 offer additional functionalities.

Why is it possible to compress data ?
Compression-enabling properties are:
* Statistical redundancy: in non-compressed data, all symbols are represented with the same number of bits independent of their relative frequency (fixed length representation). * Correlation: adjacent data samples tend to be equal or similar (e.g. think of images or video data).There are different types of correlation: * Spatial correlation

* Spectral correlation
* Temporal correlation
In addition, in many data types there is a significant amount of irrelevancy since the human brain is not able to process and/or perceive the entire amount of data. As a consequence, such data can be omitted without degrading perception. Furthermore, some data contain more abstract properties which are independent of time, location, and resolution and can be described very efficiently (e.g. fractal properties). Compression techniques are broadly classified into two categories:

Lossless Compression
A compression approach is lossless only if it is possible to exactly reconstruct the original data from the compressed version. There is no loss of any information during the compression process. For example, in Figure below, the input string AABBBA is reconstructed after the execution of the compression algorithm followed by the decompression algorithm. Lossless compression is called reversible compression since the original data may be recovered perfectly by decompression.

Lossless compression techniques are used when the original data of a source are so important that we cannot afford to lose any details. Examples of such source data are medical images, text and images preserved for legal reason, some computer executable files, etc.

In lossless compression (as the name...
tracking img