Skip to content

Commit

Permalink
Merge pull request #125 from instaclustr/issue-98
Browse files Browse the repository at this point in the history
Readme file was improved
  • Loading branch information
cjrolo authored Oct 30, 2024
2 parents c3e8c0a + 95da51b commit 95ad313
Showing 1 changed file with 71 additions and 81 deletions.
152 changes: 71 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,121 +1,111 @@
# BRRO Compressor
# ATSC - Advance Time Series Compressor

Version: 0.5 Released: 30/11/2023
## Documentation

## Major Changes
For full documentation please go to [Docs](https://github.com/instaclustr/fft-compression/tree/main/docs)

### 0.5

- Added Polynomial Compressor (with 2 variants)
- Created and Integrated a proper file type (wbro)
- Benchmarks of the different compressors
- Integration testing
- Several fixes and cleanups

## Description

BRRO Compressor is a compressor that relies on the characteristics of a signal to provide a far greater compression that currently existing ones. BRRO relies on different techniques based on a initial analysis of the signal to use the best suited method for compressor for that specific signal segment.

For a detailed description on the compressor methods and logic check `BRRO.md`.

## Getting Started with BRRO Compressor

### Prerequisites

- Ensure you have [Rust](https://www.rust-lang.org/tools/install) and Cargo installed on your system.

### Installation
## Building ATSC

1. Clone the repository:

```bash
git clone https://github.com/instaclustr/fft-compression
cd fft-compression
```

2. Build the project:

```bash
cargo build --release
```

### Usage
## What is ATSC?

Currently BRRO relies on have Raw BRRO files generated by our prometheus remote endpoint. This would work as input for the compressor.
Advanced Time Series Compressor (in short: ATSC), is a configurable, *lossy* compressor that uses the characteristics of a time-series to create a function approximation of the time series.
This way, ATSC only needs to store the parametrization of the function and not the data.
ATSC draws inspiration from established compression and signal analysis techniques, achieving significant compression ratios.
In internal testing ATSC compressed from 46 times to 880 times the time series of our databases with a fitting error within 1% of the original time-series.
In some cases, ATSC would produce highly compressed data without any data loss (Perfect fitting functions).
ATSC is meant to be used in long term storage of time series, as it benefits from more points to do a better fitting.
The decompression of data is faster (up to 40x) vs a slower compression speed, as it is expected that the data might be compressed once and decompressed several times.

Compressor usage:
Internally ATSC uses the following methods for time series fitting:

```
Usage: brro-compressor [OPTIONS] <INPUT>
Arguments:
<INPUT> input file
Options:
--compressor <COMPRESSOR> [default: auto] [possible values: auto, fft, constant, polynomial]
-u Uncompresses the input file/directory
-h, --help Print help
-V, --version Print version
```
* FFT (Fast Fourier Transforms)
* Constant
* Interpolation - Catmull-Rom
* Interpolation - Inverse Distance Weight

#### Compress a File
For a more detailed insight into ATSC read the paper here: [ATSC - A novel approach to time-series compression](https://some.url.com)

To compress a file using the BRRO Compressor, run:
Currently, ATSC uses an internal format to process time series (WBRO) and outputs a compressed format (BRO). A CSV to WBRO format is available here: [CSV Compressor](https://github.com/instaclustr/fft-compression/tree/main/csv-compressor)

```bash
brro-compressor <input-file>
```
## Where does ATSC fits?

#### Decompress a File
ATSC fits in any place that needs space reduction in trade for precision.
ATSC is to time series what JPG/MP3 is to image/audio.
If there is no need of absolute precision of the output vs the original input, you could probably use ATSC.

To decompress a file, use the following command:
Example of use cases:

```bash
brro-compressor -u <input-file>
```
* In places where time series are rolled over, ATSC is a perfect fit. It would probably offer more space savings without any meaningful loss in precision.
* Time series that are under sampled (e.g. once every 20sec). With ATSC you can greatly increase sample rate (e.g. once per second) without losing space.
* Long, slow moving data series (e.g. Weather data). Those will most probably follow an easy to fit pattern
* Data that is meant to be visualized by humans and not machine processed (e.g. Operation teams). With such a small error, under 1%, it shouldn't impact analysis.

## Programs and description
## Usage ATSC

This repository contains one main program and other programs that serve different purposes, some are for just testing, others do some actual work.
### Prerequisites

### flac-server
* Ensure you have [Rust](https://www.rust-lang.org/tools/install) and Cargo installed on your system.

**NOTE**: Remote read is currently NOT working, as it depends on FLAC files that are no longer generated.
### Usage

Needs a prometheus server. We need it to get our samples out. Supports read and write from prometheus.
ATSC relies on files with a WBRO extension to operate, learn more about that here: [WBRO - A time series format](https://github.com/instaclustr/fft-compression/tree/main/wavbrro)
You can also compress from CSV with the provided [CSV tool](https://github.com/instaclustr/fft-compression/tree/main/csv-compressor)
Those files would work as input for the compressor.

Launch the `flac-server` and set it as your remote endpoint for prometheus, example below.
Compressor usage:

```YAML
# Remote read and Write
remote_write:
- url: "http://localhost:9201/api/write"
```bash
Usage: atsc [OPTIONS] <INPUT>

remote_read:
- url: "http://localhost:9201/api/read"
read_recent: true
name: "flac_server"
Arguments:
<INPUT> input file

--compressor <COMPRESSOR>
Select a compressor, default is auto [default: auto] [possible values: auto, noop, fft, constant, polynomial, idw]
-e, --error <ERROR>
Sets the maximum allowed error for the compressed data, must be between 0 and 50. Default is 5 (5%). 0 is lossless compression 50 will do a median filter on the data. In between will pick optimize for the error [default: 5]
-u
Uncompresses the input file/directory
-c, --compression-selection-sample-level <COMPRESSION_SELECTION_SAMPLE_LEVEL>
Samples the input data instead of using all the data for selecting the optimal compressor. Only impacts speed, might or not increased compression ratio. For best results use 0 (default). Only works when compression = Auto. 0 will use all the data (slowest) 6 will sample 128 data points (fastest) [default: 0]
--verbose
Verbose output, dumps everysample in the input file (for compression) and in the ouput file (for decompression)
-h, --help
Print help
-V, --version
Print version
```

Make Prometheus server a source of your grafana and check the data.
### brro_optimizer
Maybe the most important tool at this point, it picks a WAV file from the datasets described below and optimizes it into a way that we might see a meaningful compression into FLAC.
The tool also has options to dump the output of the file as a single sample per period, instead of the 4 channels. This is good to obtain the data as it was feed into the flac-server.
The code performs optimizations based on file name, so renaming might cause issues.
Usage (Getting raw samples): `./brro_optimizer infile.wav --dump-raw > file.raw`
Usage (Getting optimized samples): `./brro_optimizer infile.wav --dump-optimized > file.raw`
Usage (Generate a optimized file): `./brro_optimizer -w infile.wav`
#### Compress a File

If you set the ENV Variable for Debug it will output what it is doing.
To compress a file using ATSC, run:

### Matlab folder
```bash
atsc <input-file>
```

Exploratory code. Should be removed.
### Decompress a File
To decompress a file, use:
```bash
atsc -u <input-file>
```

## Roadmap

1. Update `flac-server` to read/write WBRO/BRO files.
2. Streaming compression/decompression
3. Automated compressor selection
4. Frame expansion (Allowing new data to be appended to existing frames)
* Frame expansion (Allowing new data to be appended to existing frames)
* Dynamic function loading (e.g. providing more functions without touching the whole code base)
* Global/Per frame error storage
* Efficient error encoding

0 comments on commit 95ad313

Please sign in to comment.