Skip to content

A Rust Vector which swaps to disk based on given parameters

License

Notifications You must be signed in to change notification settings

dujjwalr-aws/swapvec

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwapVec

A vector which swaps to disk when exceeding a certain length.

Useful if creation and consumption of data should be separated by time, but not much memory should be consumed.

Imagine multiple threads slowly producing giant vectors of data, passing it to a single fast consumer.

Or a CSV upload of multiple gigabytes to an HTTP server, in which you want to validate every line while uploading, without directly starting a Database transaction or keeping everything in memory.

Features

  • Multiplatform (Linux, Windows, MacOS)
  • Creates temporary file only after exceeding threshold
  • Works on T: Serialize + Deserialize
  • Temporary file removed even when terminating the program
  • Checksums to guarantee integrity
  • Can be moved across threads

Limitations

  • Due to potentially doing IO, most actions are wrapped in a Result
  • Currently, no "start swapping after n MiB" is implemented
    • Would need element wise space calculation due to heap elements (e.g. String)
  • Compression currently does not compress. It is there to keep the API stable.
  • No async support yet
  • When pushing elements or consuming iterators, SwapVec is "write only"
  • SwapVecIter can only be iterated once

Examples

Basic Usage

use swapvec::SwapVec;
let iterator = (0..9).into_iter();
let mut much_data = SwapVec::default();
// Starts using disk for big iterators
much_data.consume(iterator).unwrap();
for value in much_data.into_iter() {
    println!("Read back: {}", value.unwrap());
}

Extended Usage

This is the code for cargo run (src/main.rs).

use swapvec::{SwapVec, SwapVecConfig};

const DATA_MB: u64 = 20;

fn main() {
    let element_count = DATA_MB / 8;
    let big_iterator = 0..element_count * 1024 * 1024;

    let config = swapvec::SwapVecConfig {
        batch_size: 8 * 1024,
        ..SwapVecConfig::default()
    };
    let mut swapvec: SwapVec<_> = SwapVec::with_config(config);
    swapvec.consume(big_iterator.into_iter()).unwrap();

    println!("Data size: {}MB", DATA_MB);
    println!("Done. Batches written: {}", swapvec.batches_written());
    println!(
        "Filesize: {}MB",
        swapvec
            .file_size()
            .map(|x| x.unwrap() / 1024 / 1024)
            .unwrap_or(0)
    );
    println!("Read back");

    let read_back: Vec<_> = swapvec.into_iter().map(|x| x.unwrap()).collect();

    println!("{:#?}", read_back.len());
}

About

A Rust Vector which swaps to disk based on given parameters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 100.0%