Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating parquet-tools #177

Open
elferherrera opened this issue Aug 14, 2022 · 2 comments
Open

Updating parquet-tools #177

elferherrera opened this issue Aug 14, 2022 · 2 comments
Labels
question Further information is requested

Comments

@elferherrera
Copy link
Contributor

elferherrera commented Aug 14, 2022

Im trying to update the parquet-tools with the changes after the Delayed dictionary (#160) PR.

Im using the read::decompress command to extract the page and then Im using this function to decode the buffer

pub fn read<T: NativeType>(
    buf: &[u8],
    num_values: usize,
    _is_sorted: bool,
) -> Result<PrimitivePageDict<T>> {
    let size_of = std::mem::size_of::<T>();

    let typed_size = num_values.wrapping_mul(size_of);

    let values = buf.get(..typed_size).ok_or_else(|| {
        Error::OutOfSpec(
            "The number of values declared in the dict page does not match the length of the page"
                .to_string(),
        )
    })?;

    let values = values.chunks_exact(size_of).map(decode::<T>).collect();

    Ok(PrimitivePageDict::new(values))
}

Which is the same function used to decode the page previously.

However, the read values from a sample file are wrong and do not represent the saved values in the file.

Am I missing something during the page decompression stage?

@jorgecarleitao
Copy link
Owner

Hey @elferherrera !

Do you have a draft PR I could look at? I think that the function is correct.

@jorgecarleitao jorgecarleitao added the question Further information is requested label Aug 15, 2022
@little-arhat
Copy link

Hello! %) any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants