Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic data directory for arecibo #371

Open
winston-h-zhang opened this issue Mar 18, 2024 · 1 comment
Open

Generic data directory for arecibo #371

winston-h-zhang opened this issue Mar 18, 2024 · 1 comment

Comments

@winston-h-zhang
Copy link
Contributor

winston-h-zhang commented Mar 18, 2024

The issue addresses multiple current active issues in that require some kind of external data management from the disk or filesystem:

  1. Load KZG setup parameters from file #270
  2. [LUR-33] Refactor common behavior between Nova/Supernova in public_parameters caching lurk-lab/lurk-beta#1086
  3. Add API for generic data directory #335

Regarding this comment: I include #335 as one of the use cases because:

  1. We've needed it's functionality many times now, as a way to reproduce MSM data for Ingonyama, SN, and most recently spmvm test vectors.
  2. I believe we need some way of generating test vectors from our stack, that is easily accessible and merged into the main branch as a maintained API.
  3. I don't know what exactly the design should look like, but it will also require accessing a data directory, which is why I include it as part of this issue.

Then the two questions I have for discussion is:

  1. How should we unify these 3 topics into a single generic way of managing external data within arecibo?
  2. Is this generic approach the correct way to proceed?

cc: @huitseeker

@huitseeker
Copy link
Contributor

How about having a file reading/writing tool that lives outside of the Arecibo crate and using that tool to write a minimal amount of code to deal with the read-writes?

Here's the reason why I think that may go further:

  1. the public parameter reading can probably start as reading/writing a simple file, but I would expect that eventually we will want a cached memory-mapped file,
  2. serializing test vectors under a particular environment variable / config would help, but it would be nice to have this be minimally invasive, so that tests / main code remains readable while we do this.

One place where we can start: Arecibo is a repository and a crate. We could make the Arecibo repository a workspace, with two crates. One crate could have read and write APIs such at the following:

pub trait ToFromBytes: AsRef<[u8]> + Debug + Sized {
    type ParseError;

    /// Parse an object from its byte representation
    fn from_bytes(bytes: &[u8]) -> Result<Self, Self::ParseError>;
}

type FileOpsError; // some union of the above ParseError and std::io::Error
fn read_vec_from_file<T: ToFromBytes>(path: camino::Utf8Path) -> Result<Vec<T>, FileOpsError>;
fn write_slice_to_file<T: ToFromBytes>(path: camino::Utf8Path, data: &[T]) -> Result<(), FileOpsError>;

Note the important points:

  • this starts with the simple file writing / reading, nothing fancy (for now),
  • this does not depend on Serde and notices that we want to read and write straightforward base data: arrays of primitive group elements in the case of Public parameters, and arrays of scalars in the case of the test vectors.

This should at least form the first version that lets us iterate afterwards, perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants