Hybrid Text Compression

Introduction

Hybrid Text Compression (HTC) uses different techniques to achieve maximum text compression (lossless). It makes use of LZW, Huffman, Run length encoding and Burrows Wheeler Transform to compress different types of files.

Compression Algorithms

LZW

It is a dictionary based technique for compressing data created by Abraham Lempel, Jacob Ziv, and Terry Welch. It is the improved version of LZ78.

HUFFMAN Code

Huffman code is a type of prefix code which is used for lossless data compression. It was developed by David A. Huffman in 1952. It is an entropy/frequency encoding method.

Burrows Wheeler Transform

The BWT is a block-sorting compression algorithm. It rearranges a string into runs of similar character. It was invented by Michael Burrows and David Wheeler in 1994. It is used in Bio-informics. In Next Generation Sequencing, DNA is fragmented into small pieces of which first few bases are sequenced, yielding several millions of reads each 30 to 500 base pairs(“DNA Characters”) long.

Run Length Encoding

It is a form of lossless data compression in which runs of data are stored as a single data value and count, rather than as original run.

Tools and Dependencies

C++
Python 2.x/3.x
Matplotlib C++ API
Boost for reading/writing binary data (in actual/raw bits)

Compiling

git clone https://github.com/IAMIQBAL/Hybrid-Text-Compression
cd Hybrid-Text-Compression
pacman -S boost
g++ -o main HybridCompressor.cpp
./main

Tests

We have written a test class (tests.cpp) which can be used to check the compression ratio and time taken on a scatter plot. The class uses Matplotlib’s C++ Library to plot the scatter plot. The Tests are as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ComparisonPlots		ComparisonPlots
CompressionAlgo		CompressionAlgo
DataStructures		DataStructures
Utility		Utility
.gitignore		.gitignore
Dictionary.h		Dictionary.h
HybridCompressor.cpp		HybridCompressor.cpp
LICENSE		LICENSE
README.md		README.md
tests.cpp		tests.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Text Compression

Introduction

Compression Algorithms

LZW

HUFFMAN Code

Burrows Wheeler Transform

Run Length Encoding

Tools and Dependencies

Compiling

Tests

1. Test for LZW

2. Test for Huffman

3. Test for RLE + LZW

4. Test for BWT + RLE + LZW

5. All Tests

Note:

About

Releases

Packages

Languages

License

IAMIQBAL/Hybrid-Text-Compression

Folders and files

Latest commit

History

Repository files navigation

Hybrid Text Compression

Introduction

Compression Algorithms

LZW

HUFFMAN Code

Burrows Wheeler Transform

Run Length Encoding

Tools and Dependencies

Compiling

Tests

1. Test for LZW

2. Test for Huffman

3. Test for RLE + LZW

4. Test for BWT + RLE + LZW

5. All Tests

Note:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages