Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing large SSZ objects in Portal #301

Open
pipermerriam opened this issue May 16, 2024 · 1 comment
Open

Storing large SSZ objects in Portal #301

pipermerriam opened this issue May 16, 2024 · 1 comment

Comments

@pipermerriam
Copy link
Member

pipermerriam commented May 16, 2024

Cross posting this here so it's easy to find.

https://ethresear.ch/t/distributed-storage-and-cryptographically-secured-retrieval-of-ssz-objects-for-portal-network/19575

This likely has applications in

@pipermerriam
Copy link
Member Author

pipermerriam commented Nov 18, 2024

Ok, I've got a thread that is maybe at least worth pulling on.

Suppose we have this object:

container BlockType1:
    ...

container BlockType2:
    ....

container History:
    fork_1_blocks: List[BlockType1, ...]
    fork_2_blocks: List[BlockType2, ...]

Lets suppose that the serialized version of this object takes 2GB for the data in the 1st list, and 6GB for the data in the second list. In the original scheme, we SSZ encode it and then re-hash it using a ByteList because we need to even out the spread of the data. All of the data needs to provably part of the whole and all data needs to be provably in the correct location in the network. The ByteList solves this.

Now, suppose that we use the normal SSZ hash, but along with that hash, we also include some metadata about the sizes. Specifically, a sequence of paths into the merkle trie of the SSZ hash and the corresponding size of the variable sized values between each path.

# History.fork_1_blocks merkle trie range
0x0 - 0xdeadbeef -> 2GB

# History.fork_2_blocks merkle trie range
0xdeadbeef - 0xffff -> 6GB

I think that with this simple addition of meta-data that tells us the amount of serialized data that should be expected within a given range of the merkle trie, it satisfies the necessary network conditions that we previously needed the ByteList for. Assuming we were mapping this data onto the full DHT address space, we would do a proportional map of the 0x0 - 0xdeadbeef onto the first 1/4th of the address space, and then map 0xdeadbeef - 0xfff onto the later 3/4ths of the address space.

Some caveats to this are that the underlying data may not actually be evenly distributed, such as some blocks having more transactions than others... which would result in some un-even-ness but my intuition says not enough to cause a problem. And since we're talking about about objects that we would be committing to ahead of time, we could actually do the work to determine what level of granularity we would need to provide for the merkle paths to achieve the necessary data distribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant