Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big File Support Spec #65

Open
MuxZeroNet opened this issue Oct 15, 2017 · 4 comments
Open

Big File Support Spec #65

MuxZeroNet opened this issue Oct 15, 2017 · 4 comments

Comments

@MuxZeroNet
Copy link
Contributor

MuxZeroNet commented Oct 15, 2017

Please provide the specification for the big file support feature. If you don't know where to start, here are some example topics to write about.

How big files are hashed. How merkle trees are made.

Piece size, hashing algorithm, number of leaf nodes, etc.

How a big file is represented in content.json

Piece field format, hashing algorithm, keywords, etc.

Does big file support introduce changes to the network protocol?

Are big files transmitted over the current network protocol? Is the network protocol changed?

What are the preferred ways to store an incomplete big file?

How did you do it.

The status of the current specification.

Draft, pending review, recently revised, final version, etc.

@MuxZeroNet
Copy link
Contributor Author

Relevant code snippets from BigfilePlugin.py

content["files_optional"][file_relative_path] = {
    "sha512": merkle_root,
    "size": upload_info["size"],
    "piecemap": piecemap_relative_path,
    "piece_size": piece_size
}
# ...
return {
    "merkle_root": merkle_root,
    "piece_num": len(piecemap_info["sha512_pieces"]),
    "piece_size": piece_size,
    "inner_path": inner_path
}
# ...
def hashBigfile(self, file_in, size, piece_size=1024 * 1024, file_out=None):
    # method source code...

Relevant code snippets from the unit tests.

merkle_root, piece_size, piecemap_info = site.content_manager.hashBigfile(...)

piecemap_info["sha512_pieces"][0].encode("hex")

msgpack.pack({file_name: piecemap_info}, stream)

assert file_node["piecemap"] == inner_path + ".piecemap.msgpack"

assert piecemap["sha512_pieces"][0].encode("hex") == \
"a73abad9992b3d0b672d0c2a292046695d31bebdcb1e150c8410bbe7c972eff3"

@HelloZeroNet
Copy link
Owner

Thanks for the suggestions it really helped :)
I have added the informations to the docs: https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-plugin

How big files are hashed. How merkle trees are made.

https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-merkle-root

How a big file is represented in content.json

https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-piecemap

Does big file support introduce changes to the network protocol?

Yes, getPieceFields and setPieceFields

What are the preferred ways to store an incomplete big file?

The ZeroNet client creates a sparse file with the final size of the big file at the beginning of the download process. (requires to use fsutil command on windows)

The status of the current specification.

It's already implemented and being used, but nothing is written in stone :)

@MuxZeroNet
Copy link
Contributor Author

Packed format:
Turns the string to an list of int by counting the repeating characters starting with 1.
Example: 1110000001 to [3, 6, 1], 0000000001 to [0, 9, 1], 1111111111 to [10]

Checker-board pattern?

Y N Y
3 6 1
Y N Y
0 9 1
Y
10

@HelloZeroNet
Copy link
Owner

Yes it assumes most of picefield will not be fragmented and users will download pieces in batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants