-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #125 from instaclustr/issue-98
Readme file was improved
- Loading branch information
Showing
1 changed file
with
71 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,121 +1,111 @@ | ||
# BRRO Compressor | ||
# ATSC - Advance Time Series Compressor | ||
|
||
Version: 0.5 Released: 30/11/2023 | ||
## Documentation | ||
|
||
## Major Changes | ||
For full documentation please go to [Docs](https://github.com/instaclustr/fft-compression/tree/main/docs) | ||
|
||
### 0.5 | ||
|
||
- Added Polynomial Compressor (with 2 variants) | ||
- Created and Integrated a proper file type (wbro) | ||
- Benchmarks of the different compressors | ||
- Integration testing | ||
- Several fixes and cleanups | ||
|
||
## Description | ||
|
||
BRRO Compressor is a compressor that relies on the characteristics of a signal to provide a far greater compression that currently existing ones. BRRO relies on different techniques based on a initial analysis of the signal to use the best suited method for compressor for that specific signal segment. | ||
|
||
For a detailed description on the compressor methods and logic check `BRRO.md`. | ||
|
||
## Getting Started with BRRO Compressor | ||
|
||
### Prerequisites | ||
|
||
- Ensure you have [Rust](https://www.rust-lang.org/tools/install) and Cargo installed on your system. | ||
|
||
### Installation | ||
## Building ATSC | ||
|
||
1. Clone the repository: | ||
|
||
```bash | ||
git clone https://github.com/instaclustr/fft-compression | ||
cd fft-compression | ||
``` | ||
|
||
2. Build the project: | ||
|
||
```bash | ||
cargo build --release | ||
``` | ||
|
||
### Usage | ||
## What is ATSC? | ||
|
||
Currently BRRO relies on have Raw BRRO files generated by our prometheus remote endpoint. This would work as input for the compressor. | ||
Advanced Time Series Compressor (in short: ATSC), is a configurable, *lossy* compressor that uses the characteristics of a time-series to create a function approximation of the time series. | ||
This way, ATSC only needs to store the parametrization of the function and not the data. | ||
ATSC draws inspiration from established compression and signal analysis techniques, achieving significant compression ratios. | ||
In internal testing ATSC compressed from 46 times to 880 times the time series of our databases with a fitting error within 1% of the original time-series. | ||
In some cases, ATSC would produce highly compressed data without any data loss (Perfect fitting functions). | ||
ATSC is meant to be used in long term storage of time series, as it benefits from more points to do a better fitting. | ||
The decompression of data is faster (up to 40x) vs a slower compression speed, as it is expected that the data might be compressed once and decompressed several times. | ||
|
||
Compressor usage: | ||
Internally ATSC uses the following methods for time series fitting: | ||
|
||
``` | ||
Usage: brro-compressor [OPTIONS] <INPUT> | ||
Arguments: | ||
<INPUT> input file | ||
Options: | ||
--compressor <COMPRESSOR> [default: auto] [possible values: auto, fft, constant, polynomial] | ||
-u Uncompresses the input file/directory | ||
-h, --help Print help | ||
-V, --version Print version | ||
``` | ||
* FFT (Fast Fourier Transforms) | ||
* Constant | ||
* Interpolation - Catmull-Rom | ||
* Interpolation - Inverse Distance Weight | ||
|
||
#### Compress a File | ||
For a more detailed insight into ATSC read the paper here: [ATSC - A novel approach to time-series compression](https://some.url.com) | ||
|
||
To compress a file using the BRRO Compressor, run: | ||
Currently, ATSC uses an internal format to process time series (WBRO) and outputs a compressed format (BRO). A CSV to WBRO format is available here: [CSV Compressor](https://github.com/instaclustr/fft-compression/tree/main/csv-compressor) | ||
|
||
```bash | ||
brro-compressor <input-file> | ||
``` | ||
## Where does ATSC fits? | ||
|
||
#### Decompress a File | ||
ATSC fits in any place that needs space reduction in trade for precision. | ||
ATSC is to time series what JPG/MP3 is to image/audio. | ||
If there is no need of absolute precision of the output vs the original input, you could probably use ATSC. | ||
|
||
To decompress a file, use the following command: | ||
Example of use cases: | ||
|
||
```bash | ||
brro-compressor -u <input-file> | ||
``` | ||
* In places where time series are rolled over, ATSC is a perfect fit. It would probably offer more space savings without any meaningful loss in precision. | ||
* Time series that are under sampled (e.g. once every 20sec). With ATSC you can greatly increase sample rate (e.g. once per second) without losing space. | ||
* Long, slow moving data series (e.g. Weather data). Those will most probably follow an easy to fit pattern | ||
* Data that is meant to be visualized by humans and not machine processed (e.g. Operation teams). With such a small error, under 1%, it shouldn't impact analysis. | ||
|
||
## Programs and description | ||
## Usage ATSC | ||
|
||
This repository contains one main program and other programs that serve different purposes, some are for just testing, others do some actual work. | ||
### Prerequisites | ||
|
||
### flac-server | ||
* Ensure you have [Rust](https://www.rust-lang.org/tools/install) and Cargo installed on your system. | ||
|
||
**NOTE**: Remote read is currently NOT working, as it depends on FLAC files that are no longer generated. | ||
### Usage | ||
|
||
Needs a prometheus server. We need it to get our samples out. Supports read and write from prometheus. | ||
ATSC relies on files with a WBRO extension to operate, learn more about that here: [WBRO - A time series format](https://github.com/instaclustr/fft-compression/tree/main/wavbrro) | ||
You can also compress from CSV with the provided [CSV tool](https://github.com/instaclustr/fft-compression/tree/main/csv-compressor) | ||
Those files would work as input for the compressor. | ||
|
||
Launch the `flac-server` and set it as your remote endpoint for prometheus, example below. | ||
Compressor usage: | ||
|
||
```YAML | ||
# Remote read and Write | ||
remote_write: | ||
- url: "http://localhost:9201/api/write" | ||
```bash | ||
Usage: atsc [OPTIONS] <INPUT> | ||
|
||
remote_read: | ||
- url: "http://localhost:9201/api/read" | ||
read_recent: true | ||
name: "flac_server" | ||
Arguments: | ||
<INPUT> input file | ||
|
||
--compressor <COMPRESSOR> | ||
Select a compressor, default is auto [default: auto] [possible values: auto, noop, fft, constant, polynomial, idw] | ||
-e, --error <ERROR> | ||
Sets the maximum allowed error for the compressed data, must be between 0 and 50. Default is 5 (5%). 0 is lossless compression 50 will do a median filter on the data. In between will pick optimize for the error [default: 5] | ||
-u | ||
Uncompresses the input file/directory | ||
-c, --compression-selection-sample-level <COMPRESSION_SELECTION_SAMPLE_LEVEL> | ||
Samples the input data instead of using all the data for selecting the optimal compressor. Only impacts speed, might or not increased compression ratio. For best results use 0 (default). Only works when compression = Auto. 0 will use all the data (slowest) 6 will sample 128 data points (fastest) [default: 0] | ||
--verbose | ||
Verbose output, dumps everysample in the input file (for compression) and in the ouput file (for decompression) | ||
-h, --help | ||
Print help | ||
-V, --version | ||
Print version | ||
``` | ||
|
||
Make Prometheus server a source of your grafana and check the data. | ||
### brro_optimizer | ||
Maybe the most important tool at this point, it picks a WAV file from the datasets described below and optimizes it into a way that we might see a meaningful compression into FLAC. | ||
The tool also has options to dump the output of the file as a single sample per period, instead of the 4 channels. This is good to obtain the data as it was feed into the flac-server. | ||
The code performs optimizations based on file name, so renaming might cause issues. | ||
Usage (Getting raw samples): `./brro_optimizer infile.wav --dump-raw > file.raw` | ||
Usage (Getting optimized samples): `./brro_optimizer infile.wav --dump-optimized > file.raw` | ||
Usage (Generate a optimized file): `./brro_optimizer -w infile.wav` | ||
#### Compress a File | ||
|
||
If you set the ENV Variable for Debug it will output what it is doing. | ||
To compress a file using ATSC, run: | ||
|
||
### Matlab folder | ||
```bash | ||
atsc <input-file> | ||
``` | ||
|
||
Exploratory code. Should be removed. | ||
### Decompress a File | ||
To decompress a file, use: | ||
```bash | ||
atsc -u <input-file> | ||
``` | ||
|
||
## Roadmap | ||
|
||
1. Update `flac-server` to read/write WBRO/BRO files. | ||
2. Streaming compression/decompression | ||
3. Automated compressor selection | ||
4. Frame expansion (Allowing new data to be appended to existing frames) | ||
* Frame expansion (Allowing new data to be appended to existing frames) | ||
* Dynamic function loading (e.g. providing more functions without touching the whole code base) | ||
* Global/Per frame error storage | ||
* Efficient error encoding |