Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caphash optimization #45

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

alexander-zw
Copy link
Contributor

@alexander-zw alexander-zw commented May 3, 2021

Cap hashing was implemented for merkle trees. The algebraic and non-algebraic cap hashes were both implemented in hash_enum.tcc, as well as the dummy algebraic hash. See issue #41.

A bug was fixed that merkle tree membership proof generation and verification didn't work when the input queries weren't sorted and unique. They now work when the tree is not zero knowledge, but when the tree is set to be zero knowledge, the code still fails.

The hash count method was updated to count each two-to-one hash as 2 units and the cap hash as 1 unit per input.

More tests were added, to test cap hashing, and to test large trees with randomly generated query leaves. The code no not test unsorted non-unique queries since that still fails for zero knowledge trees.

No snark protocols currently take advantage of the cap hash; they still have cap size set to 2.

Changes are complete, but not ready to merge because I started working based off of PR #43, so that should preferably be merged first. Ready for review

@@ -15,10 +15,10 @@ bcs_transformation_parameters<FieldT, MT_root_hash> default_bcs_params(
params.hash_enum = hash_type;
/* TODO: Push setting leaf hash into internal BCS code. Currently 2 is fine, as leaf size is internally unused. */
const size_t leaf_size = 2;
params.leafhasher_ = get_leafhash<FieldT, MT_root_hash>(hash_type, security_parameter, leaf_size);
params.leafhasher_ = get_leafhash<MT_root_hash, FieldT>(hash_type, security_parameter, leaf_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this just a bug before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I switched the order to be consistent with similar functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe the template ordering before was right

Copy link
Member

@ValarDragon ValarDragon Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you switched the template ordering of the internal command. Currently there are inconsistent orderings throughout, I believe the old <FieldT, MT_root_hash> was consistently used everywhere. We should consistently be using one or the other, I don't really have a preference for which one.

Copy link
Contributor Author

@alexander-zw alexander-zw Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched all instances of the leaf hash functions to be the same as the hashchain. If you insist, I can change it back, but it is currently consistent as well.

@alexander-zw alexander-zw force-pushed the caphash branch 4 times, most recently from f5fb79d to c04d91c Compare May 9, 2021 02:51
@alexander-zw alexander-zw changed the title Caphash Caphash optimization May 9, 2021
libiop/bcs/pow.tcc Outdated Show resolved Hide resolved
libiop/bcs/pow.tcc Outdated Show resolved Hide resolved
params.hashchain_ =
get_hashchain<FieldT, MT_root_hash>(hash_type, security_parameter);
params.cap_hasher = get_cap_hash<MT_root_hash, FieldT>(hash_type, security_parameter);
params.hashchain_ = get_hashchain<FieldT, MT_root_hash>(hash_type, security_parameter);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed above, we are switching <FieldT, MT_root_hash> to <MT_root_hash, FieldT> in leafhasher (now cap_hasher here).

Do we want to apply the same refactoring to hashchain? We probably should do that in a separate PR for ease of testing.

Copy link
Contributor Author

@alexander-zw alexander-zw Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd prefer to keep it the same for this PR, unless it causes merge conflicts

Copy link
Contributor Author

@alexander-zw alexander-zw Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's acceptable, I'd like to keep them both the current order for now. We can refactor both at the same time in a future PR. This PR is already quite big, and it's been a long time since I wrote this so I don't remember all the context

@@ -103,6 +126,8 @@ class merkle_tree {

hash_digest_type get_root() const;

/* These two functions do not currently work if the given positions aren't sorted or
have duplicates, AND the tree is set to be zero knowledge. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one a new limitation that is caused by the present PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't remember. It might be new due to the cap hashes. If so, then the issue will not be present if you don't use the new cap hash optimization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend explicitly sorting the positions within this function and lift such limitation, as it is unnatural.

@@ -200,32 +211,39 @@ std::vector<std::vector<FieldT>> merkle_tree<FieldT, hash_digest_type>::serializ
template<typename FieldT, typename hash_digest_type>
void merkle_tree<FieldT, hash_digest_type>::compute_inner_nodes()
{
// TODO: Better document this function, its hashing layer by layer.
std::size_t n = (this->num_leaves_ - 1) / 2;
/* n is the first index of the layer we're about to compute. It starts at the bottom layer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bottom layer or the layer above the bottom layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it's the bottom inner node layer

std::vector<pos_and_digest_t> S;
S.reserve(positions.size());

std::vector<hash_digest_type> leaf_hashes;
if (this->make_zk_) {
for (auto &leaf : leaf_contents)
{
/* FIXME: This code is currently incorrect if the given list of positions is not
sorted or has duplicates. This could be fixed if both positions and leaf_contents
are sorted before the leaf hashes are calculated, which would require refactoring. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem before this PR, or is it a new one? (My impression is that the way we add ZK is reasonably adding this limitation. Is the original okay when the positions are not sorted?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ same as above. If possible, explicitly sort the positions and lift this limitation.

@@ -474,54 +515,79 @@ bool merkle_tree<FieldT, hash_digest_type>::validate_set_membership_proof(
std::swap(S, new_S);
}

// Add the cap, including the root's direct children and the root.
// The only elements should be the cap (not including the root).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the cap include the root, or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think the cap doesn't include the root. It's been such a long time since I wrote this code so I can only guess.

throw std::invalid_argument("Poseidon only supported for 128 bit soundness.");
}
poseidon_params<FieldT> params = get_poseidon_parameters<FieldT>(hash_enum);
/* security parameter is -1 b/c */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify "-1" and "b/c"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what it means, it was copied from elsewhere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this line should be removed.

/* We explicitly place this on heap with no destructor,
as this reference has to live after the function terminates */
std::shared_ptr<algebraic_vector_hash<FieldT>> hash_class =
std::make_shared<algebraic_vector_hash<FieldT>>(permutation, security_parameter - 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why security_parameter - 1 instead of security_parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, it was also copied from elsewhere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refer to the source?

}

/** Constructs a merkle tree with leaf size 2. Generates and verifies membership proofs for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the convention for ** or * in libiop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's an existing convention, the one I use is /** if it's documentation that the user might want to see on hover, and /* if it's a multiline comment that's only useful to see in the location where it's added (// if it's a single line comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, that's the philosophy I have now. Looks like I didn't stick to it at the time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered this one resolved.

@weikengchen
Copy link

I did a pass of the code, and most are smaller points. Let me know when you want me to reach out to the repo's owner to coordinate merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants