Data compression,decompression Using C Language



Data compression,decompression Using C Language

Data compression,decompression Using C Language

————————————————————————————————————————————————————————————————————————-
Note:: Please don’t ask code as the compression algorithm which is the core engine of this application needs further refinements and investigations. My core compression engine is a variant of DEFLATE algorithm.

No Copyright infringement intended . Music is not owned by me.
————————————————————————————————————————————————————————————————————————-
There are tons of compression algorithms. What you need here is a lossless compression algorithm. A lossless compression algorithm compresses data such that it can be decompressed to achieve exactly what was given before compression. The opposite would be a lossy compression algorithm. Lossy compression can remove data from a file. PNG images use lossless compression while JPEG images can and often do use lossy compression.

Some of the most widely known compression algorithms include:

* DEFLATE (DEFLATE is a lossless data compression algorithm and associated file format that uses a combination of the LZ77 algorithm and Huffman coding)
* RLE (Run-length encoding is a very simple form of lossless data compression)
* Huffman (Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression)
* LZW (Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch)
* LZ77

ZIP archives use a combination of Huffman coding and LZ77 to give fast compression and decompression times and reasonably good compression ratios.

LZ77 is pretty much a generalized form of RLE and it will often yield much better results.

Huffman allows the most repeating bytes to represent the least number of bits. Imagine a text file that looked like this:

aaaaaaaabbbbbcccdd

A typical implementation of Huffman would result in the following map:

Bits——Character
0——a
10——b
110——c
1110——d

So the file would be compressed to this:

00000000 10101010 10110110 11011101 11000000
—————————————————————————^^^^^^^(Padding bits required)
18 bytes go down to 5. Of course, the table must be included in the file. This algorithm works better with more data

————————————————————————————————————————————————————————————————————————-

In computing, DEFLATE is a lossless data compression algorithm and associated file format that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool. The file format was later specified in RFC 1951.

The original algorithm as designed by Katz was patented as U.S. Patent 5,051,745 and assigned to PKWARE, Inc.As stated in the RFC document, an algorithm producing DEFLATE files is widely thought to be implementable in a manner not covered by patents. This has led to its widespread use, for example in gzip compressed files, PNG image files and the ZIP file format for which Katz originally designed it.

Using Deflate in new software
————————————————
Implementations of Deflate are freely available in many languages. C programs typically use the zlib library (licensed under the zlib License, which allows use with both free and proprietary software). Programs written using the Borland dialects of Pascal can use paszlib; a C++ library is included as part of 7-Zip/AdvanceCOMP. Java includes support as part of the standard library (in java.util.zip). Microsoft .NET Framework 2.0 base class library supports it in the System.IO.Compression namespace. Programs in Ada can use Zip-Ada (pure) or the ZLib-Ada thick binding to zlib.

“A person who never made a mistake never tried anything new.
Once you stop learning you start dying” – Albert Einstein

Happy Coding 🙂

Comments are closed.