Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert CBOR dag nodes? #63

Open
ec1oud opened this issue Dec 15, 2020 · 1 comment
Open

insert CBOR dag nodes? #63

ec1oud opened this issue Dec 15, 2020 · 1 comment

Comments

@ec1oud
Copy link

ec1oud commented Dec 15, 2020

The IPLD data model is a superset of JSON: it should be OK to store binary data. So on the command line, ipfs dag put --input-enc cbor actually works. It's another option that this Rust implementation could perhaps support. Do you think it's doable? It looks like there's a pervasive assumption that the data is json.

The http API is asymmetric though: there's no way to read back cbor AFAICT ( ipfs/kubo#4313 ), so until that is done, perhaps there's no point.

I tried to force a byte array into a string, and ran into the problem that Rust expects every string to be valid UTF-8. If I use serde_json::json!(unsafe { String::from_utf8_unchecked(buf) }), it fails at runtime. So it seems the json API is hopeless for dealing with binary data in DAG nodes.

The reason I want to do that is to directly store a contiguous array of numbers into a byte array, to avoid cbor overhead. An array of numbers in CBOR has a one-byte prefix in front of each number to declare the type. If you already know what the type is, that's a waste of space, and prevents passing the array unconverted to other software (for example to draw a line graph). So I'd rather that the dag node uses cbor to annotate the expected data type, and then the actual array of numbers should just be a binary array. It's fine to construct CBOR that way, but getting it into and out of dag nodes is problematic so far.

ec1oud added a commit to ec1oud/ipfs-tsdb that referenced this issue Dec 15, 2020
The trouble is github.com/ferristseng/rust-ipfs-api does not offer
dag_put with cbor: ferristseng/rust-ipfs-api#63
only JSON; and serde_json won't let us sneak a byte array into a string
value either, because Rust expects strings to always be valid UTF-8.

It would probably be a dead end anyway because of
ipfs/kubo#4313 : there's no way to
read back CBOR data via the http API, so I couldn't write the select
function.
ec1oud added a commit to ec1oud/ipfs-tsdb that referenced this issue Dec 15, 2020
The trouble is github.com/ferristseng/rust-ipfs-api does not offer
dag_put with cbor: ferristseng/rust-ipfs-api#63
only JSON; and serde_json won't let us sneak a byte array into a string
value either, because Rust expects strings to always be valid UTF-8.

It would probably be a dead end anyway because of
ipfs/kubo#4313 : there's no way to
read back CBOR data via the http API, so I couldn't write the select
function.
@ec1oud
Copy link
Author

ec1oud commented Jan 1, 2021

Of course it works to use block_get, then serde_cbor to parse the data; and it took me a while to figure out how to get rust to convert a byte vector to an f32 vector, but eventually I succeeded. But if I update it (append another number to the byte array, i.e. append the 4 bytes of a little-endian float) and then write cbor back again via block_put, the result is that ipfs dag get no longer works on the command line: it seems to assume that the data is protobuf instead of cbor.

$ ipfs dag get QmfBs7HAJTqCXS8RgXBiMSj3b5fCKSerM2fkSvTMgNYTrK
Error: failed to decode Protocol Buffers: incorrectly formatted merkledag node: unmarshal failed. proto: PBNode: wiretype end group for non-group

whereas writing the cbor data to a file and then doing ipfs dag put --input-enc cbor updated.cbor is fine. So it will help to get this option added to dag_put, but at least I'm no longer completely stuck in the meantime, because I can write an ipfs client app that will work with the raw cbor data either way.

I also tried tagging: https://www.endpoint.com/blog/2019/03/18/extensible-binary-encoding-with-cbor says “IEEE 754 binary32, little endian, Typed Array” is tag 85. That would probably be the right thing to do; but go-ipfs itself can't deal with that: ipfs dag put --input-enc cbor says "missing an unmarshaller for tag 85".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant