Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method internal_memory_builder_partitioned_phf::build_from_hashes #55

Merged
merged 3 commits into from
Apr 30, 2024

Conversation

progval
Copy link
Contributor

@progval progval commented Apr 30, 2024

This mirrors internal_memory_builder_single_phf::build_from_hashes

This mirrors internal_memory_builder_single_phf::build_from_hashes
@jermp
Copy link
Owner

jermp commented Apr 30, 2024

Also this one is not actually needed...

PS. Are you using PTHash for some of your projects? I'm curious.

@progval
Copy link
Contributor Author

progval commented Apr 30, 2024

I'm working with @vigna on porting https://docs.softwareheritage.org/devel/swh-graph/ to Rust and we're considering switching from GOV and GOV3 to PTHash. So right now I'm writing Rust bindings for PTHash here: https://gitlab.softwareheritage.org/swh/devel/pthash-rs

The reason I'm interested in build_from_hashes is that I effectively can't call templated C++ functions from Rust, except through a function with a concrete type. This prevents me from giving users of the Rust crate the ability to pass me arbitrary key iterators I could pass directly to internal_memory_builder_partitioned_phf::build_from_keys.

So instead I build an array of hashes (which is usually smaller than an array of keys, so it's okay-ish to keep it in RAM) and pass it to to a concrete-ized internal_memory_builder_partitioned_phf::build_from_hashes, like this one: https://gitlab.softwareheritage.org/swh/devel/pthash-rs/-/blob/b28d91baf80a15998e8b10ecda41e1b7d716d26e/src/backends.rs#L57-62

I might add a few more concrete-izations later, eg. to store the hashes on disk, but I'm limited to only providing a handful of them because I have to explicitly list them in the binding

@jermp
Copy link
Owner

jermp commented Apr 30, 2024

Oh that makes a lot of sense then. I'll merge the PR.

You might also be interested in PTHash-PHOBIC: https://github.com/jermp/pthash/tree/phobic.
PHOBIC adds some optimizations to PTHash regarding space usage and construction under space-efficient configurations.
The pre-print is forthcoming.

Thanks!

@jermp jermp merged commit 9572855 into jermp:master Apr 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants