Custom ctx override for derive macro #106

Easyoakland · 2024-10-02T04:38:28Z

Add attribute to override the context in scroll-derive used like the following:

#[derive(Pread, Pwrite, SizeWith)]
#[repr(C)]
struct Data {
    id: u32,
    timestamp: f64,
    #[scroll(ctx = BE)]
    arr: [u16; 2],
    #[scroll(ctx = CustomCtx::len())]
    custom_ctx: CustomCtx,
}

where ctx doesn't have to be of type Endian and ignores pread_with's (and related) ctx argument.

Use case for this includes using the derive macro for structs that don't use Endian as their context and deriving many Network Endian structs while still being able to use pread/gread/pwrite/gwrite without having to specify Network Endian every time.

A future extension to this can enable using previous fields in the expression which would enable a pattern like:

#[derive(Pread, Pwrite, SizeWith)]
#[repr(C)]
struct Data<'b> {
    header: u32,
    #[scroll(ctx = header as usize)]
    body: &'b [u8],
}

without having to manually implement the various traits.

Let me know if you're interested in this idea and/or the future extension. Thanks!

Enables using the syntax: ``` #[scroll(ctx = context_expr)] field: T ``` To use the `context_expr` expression as context when for `Pread` for `field`, regardless of the context for the rest of the struct.

m4b · 2024-10-21T04:55:08Z

omg i wanted something like this for so long, thank you for getting the ball rolling!

m4b

This is super exciting, thank you for this! It really sets the stage for doing stuff like you mention, e.g., using a previous field as the incoming ctx for next field, etc. But also just being able to swap out field by field a different ctx instead of the derive dominating the whole thing is very useful; it may be possible to use this to go back, for example, to most of goblin's PE struct and now just derive it, since many of them had patterns like this!

The only nit/question I had was about the Option used with expr and being idiomatic; if you feel like changing it (and can, and not some oversight by me), feel free; otherwise I'll merge in a few days.

This is super awesome, thank you for this!

scroll_derive/src/lib.rs

scroll_derive/examples/derive_custom_ctx.rs

Easyoakland · 2024-10-21T06:49:51Z

I think your comments have been addressed in the last 2 commits. Let me know if not.

Seeing as there's interest I'll elaborate a little.
The idea with using previous fields is instead of the derive making code like:

let data = Self {
            id: src.gread_with::<u32>(offset, ctx)?,
            timestamp: src.gread_with::<f64>(offset, ctx)?,
            custom_ctx: src.gread_with::<CustomCtx>(offset, USER_SPECIFIED_EXPR)?,
        };

it makes code like:

let id = src.gread_with::<u32>(offset, ctx)?;
let timestamp = src.gread_with::<f64>(offset, ctx)?;
let custom = src.gread_with::<CustomCtx>(offset, USER_SPECIFIED_EXPR)?
let data = Self {
            id,
            timestamp,
            custom,
        };

and so USER_SPECIFIED_EXPR can use the id and timestamp and ctx identifiers.

The only issue that I see with that approach is it breaks structs which use the derive and have a field named ctx.

m4b

this is so awesome, thank you for this! Re the other design questions, expanding on usecases for this, e.g., a previous field used as ctx, etc.,, it might be good to first flesh out some possible approaches (assuming there is more than one) and the possible tradeoffs of choosing them, since backwards compat will be something to think about since scroll is effectively "done" (i need to get around to releasing 1.0 :) )

m4b · 2024-10-22T06:01:37Z

scroll_derive/examples/derive_custom_ctx.rs

+    #[scroll(ctx = BE)]
+    arr: [u16; 2],
+    // You can use arbitrary expressions for the ctx.
+    // You have access to the `ctx` parameter of the `{pread/gread}_with` inside the expression.


this is so wild hah, i have to think about usecases/what it means, or if there's some kind of hygiene issue that a different design might not have, but it's very interesting! is the fact that ctx is available within the scope of the pread something you have made available or is it a property of derive with field overrides in macros?

It's because the user's expression is effectively copy-pasted without hygiene and so the parameters of the trait method are accessible. It would take more work to remove this ability, since we'd have to parse for an instance of a ctx ident and actively block it.

This also means that if only Pwrite derive is used (for example), they won't get an error using self or dst idents.

m4b · 2024-10-22T06:11:32Z

scroll_derive/examples/derive_custom_ctx.rs

+    // You have access to the `ctx` parameter of the `{pread/gread}_with` inside the expression.
+    // TODO(implement) you have access to previous fields.
+    // TODO(check) will this break structs with fields named `ctx`?.
+    #[scroll(ctx = EndianDependent(ctx.clone()).len())]


one last thing I'll say: i think the example is actually better if EndianDependent is the ctx for VariableLength data, and inside of it's impl, it calls len() to get the read/write size, instead of the ctx == read/write size and relying on len() being called/passed in here.

That is clearer and more idiomatic scroll, imho, if there is such a thing :D

However, i understand that you are showing off arbitrary expressions are callable in the ctx here, which is very cool and probably worth retaining as you've written it.

Side note: I'm surprised you have to ctx.clone() ? Endian is Copy and I'd expect to not need that clone there? I also recall that effectively Ctx's generally need to be copy to be usable within pread/gread but maybe I forgot and that bound was relaxed?

Side Side note: if the ctx does need to be cloned, and you don't clone, how terrible is the error message presented to the user (at derive time)? I imagine it could be pretty bad?

error[E0308]: mismatched types --> examples\derive_custom_ctx.rs:56:36 | 56 | #[scroll(ctx = EndianDependent(ctx).len())] | --------------- ^^^ expected `Endian`, found `&Endian` | | | arguments to this struct are incorrect | note: tuple struct defined here --> examples\derive_custom_ctx.rs:4:8 | 4 | struct EndianDependent(Endian); | ^^^^^^^^^^^^^^^ help: consider dereferencing the borrow | 56 | #[scroll(ctx = EndianDependent(*ctx).len())]

But if you follow the directions:

error[E0614]: type `Endian` cannot be dereferenced --> examples\derive_custom_ctx.rs:56:36 | 56 | #[scroll(ctx = EndianDependent(*ctx).len())] | ^^^^

The clone works because Endian.clone() and (&Endian).clone both become Endian.
The issue is that the SizeWith derive makes this code:

fn size_with(ctx: &::scroll::Endian)

but the try_{from,into}_ctx derives make this code:

fn try_into_ctx( self, dst: &mut [u8], ctx: ::scroll::Endian, ) ->

and

fn try_from_ctx( src: &'a [u8], ctx: ::scroll::Endian, ) ->

Yeah, its a little hacky :)

If size_with was by value then EndianDependant(ctx) would work.

m4b · 2024-10-22T06:29:01Z

Here is an example of PE parsing fields out, where the pe_pointer is used as the offset not the ctx, for example; i'll paste all the code here as it's pretty good example.

        let pe_pointer = bytes
            .pread_with(PE_POINTER_OFFSET as usize, scroll::LE)
            .map_err(|_| {
                error::Error::Malformed(format!(
                    "cannot parse PE header pointer (offset {:#x})",
                    PE_POINTER_OFFSET
                ))
            })?;

        let pe_signature: u32 =
            bytes
                .pread_with(pe_pointer as usize, scroll::LE)
                .map_err(|_| {
                    error::Error::Malformed(format!(
                        "cannot parse PE header signature (offset {:#x})",
                        pe_pointer
                    ))
                })?;

there's two interesting things happening here; a hardcoded offset is used for the first instance (although I think this is actually wrong, there's an issue open in goblin recently about this); and the second instance uses it's pread offset based on the previous field/result.

However, that field, pe_pointer is not actually stored, but used to verify the magic value against it; arguably all this just belongs in validation pass after parsing out the raw struct, but there are other examples in PE where some field is used afterwords to parse out.

There's also the classic cases of (count, items) delimited pairs, e.g., many formats have some fixed size number, usually a u32, which is used to parse out the number of successive elements. Generally the count is discarded in this case, e.g.:

(u32, [byte sequence)]) -> Vec<T>

Anyway, just some food for thought. Don't have to solve all this at once :)

Lastly, this made me wonder if we can have better derive error messages, e.g., if we fail to parse a field, we could in theory emit something like PE's error message, with the field name known statically, with something like "could not parse " which should in theory be a &'static str, maybe we already do something like this though, as I said I haven't thought about/touched the derive code in a while :)

Easyoakland · 2024-10-22T18:34:16Z

there's two interesting things happening here; a hardcoded offset is used for the first instance (although I think this is actually wrong, there's an issue open in goblin recently about this); and the second instance uses it's pread offset based on the previous field/result.

I think this would be solved with an offset attribute which works like the ctx one here. The offset identifier would then be available since the derive starts with let offset = ...

There's also the classic cases of (count, items) delimited pairs, e.g., many formats have some fixed size number, usually a u32, which is used to parse out the number of successive elements. Generally the count is discarded in this case, e.g.:

You could make the entire expression to parse a single struct member customizable, but I don't know if that's going to be more readable then implementing the trait manually.

Lastly, this made me wonder if we can have better derive error messages, e.g., if we fail to parse a field, we could in theory emit something like PE's error message, with the field name known statically, with something like "could not parse " which should in theory be a &'static str, maybe we already do something like this though, as I said I haven't thought about/touched the derive code in a while :)

I don't think it currently works that way. It could, though I'm not sure if that would also cause a noticeable binary size increase.

Easyoakland added 6 commits October 1, 2024 20:18

feat: attribute to override ctx for particular fields on derive(Pread)

35c979c

Enables using the syntax: ``` #[scroll(ctx = context_expr)] field: T ``` To use the `context_expr` expression as context when for `Pread` for `field`, regardless of the context for the rest of the struct.

pull out custom_ctx logic to function

8859cd9

use custom ctx for Pwrite

b8ebe4a

use custom ctx for SizeWith

4951257

use custom ctx for IOread and IOwrite

e17150c

add tests and example

e4825b2

Easyoakland changed the title ~~Custom ctx~~ Custom ctx override for derive macro Oct 2, 2024

remove unwrap

f3cf244

m4b approved these changes Oct 21, 2024

View reviewed changes

scroll_derive/src/lib.rs Outdated Show resolved Hide resolved

scroll_derive/examples/derive_custom_ctx.rs Outdated Show resolved Hide resolved

Easyoakland added 2 commits October 21, 2024 00:26

slightly clean custom_ctx impl

bb4aea6

clarify example

840fad3

spelling

618e4cc

m4b approved these changes Oct 22, 2024

View reviewed changes

m4b merged commit 01d8722 into m4b:master Oct 22, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom ctx override for derive macro #106

Custom ctx override for derive macro #106

Easyoakland commented Oct 2, 2024 •

edited

Loading

m4b commented Oct 21, 2024

m4b left a comment

Easyoakland commented Oct 21, 2024

m4b left a comment

m4b Oct 22, 2024

Easyoakland Oct 22, 2024

Easyoakland Oct 22, 2024 •

edited

Loading

m4b Oct 22, 2024

Easyoakland Oct 22, 2024

m4b commented Oct 22, 2024

Easyoakland commented Oct 22, 2024

Custom ctx override for derive macro #106

Custom ctx override for derive macro #106

Conversation

Easyoakland commented Oct 2, 2024 • edited Loading

m4b commented Oct 21, 2024

m4b left a comment

Choose a reason for hiding this comment

Easyoakland commented Oct 21, 2024

m4b left a comment

Choose a reason for hiding this comment

m4b Oct 22, 2024

Choose a reason for hiding this comment

Easyoakland Oct 22, 2024

Choose a reason for hiding this comment

Easyoakland Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

m4b Oct 22, 2024

Choose a reason for hiding this comment

Easyoakland Oct 22, 2024

Choose a reason for hiding this comment

m4b commented Oct 22, 2024

Easyoakland commented Oct 22, 2024

Easyoakland commented Oct 2, 2024 •

edited

Loading

Easyoakland Oct 22, 2024 •

edited

Loading