Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom ctx override for derive macro #106

Merged
merged 10 commits into from
Oct 22, 2024
Merged

Custom ctx override for derive macro #106

merged 10 commits into from
Oct 22, 2024

Conversation

Easyoakland
Copy link
Contributor

@Easyoakland Easyoakland commented Oct 2, 2024

Add attribute to override the context in scroll-derive used like the following:

#[derive(Pread, Pwrite, SizeWith)]
#[repr(C)]
struct Data {
    id: u32,
    timestamp: f64,
    #[scroll(ctx = BE)]
    arr: [u16; 2],
    #[scroll(ctx = CustomCtx::len())]
    custom_ctx: CustomCtx,
}

where ctx doesn't have to be of type Endian and ignores pread_with's (and related) ctx argument.

Use case for this includes using the derive macro for structs that don't use Endian as their context and deriving many Network Endian structs while still being able to use pread/gread/pwrite/gwrite without having to specify Network Endian every time.

A future extension to this can enable using previous fields in the expression which would enable a pattern like:

#[derive(Pread, Pwrite, SizeWith)]
#[repr(C)]
struct Data<'b> {
    header: u32,
    #[scroll(ctx = header as usize)]
    body: &'b [u8],
}

without having to manually implement the various traits.

Let me know if you're interested in this idea and/or the future extension. Thanks!

Enables using the syntax:
```
#[scroll(ctx = context_expr)]
field: T
```
To use the `context_expr` expression as context when for `Pread` for `field`, regardless of the context for the rest of the struct.
@Easyoakland Easyoakland changed the title Custom ctx Custom ctx override for derive macro Oct 2, 2024
@m4b
Copy link
Owner

m4b commented Oct 21, 2024

omg i wanted something like this for so long, thank you for getting the ball rolling!

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super exciting, thank you for this! It really sets the stage for doing stuff like you mention, e.g., using a previous field as the incoming ctx for next field, etc. But also just being able to swap out field by field a different ctx instead of the derive dominating the whole thing is very useful; it may be possible to use this to go back, for example, to most of goblin's PE struct and now just derive it, since many of them had patterns like this!

The only nit/question I had was about the Option used with expr and being idiomatic; if you feel like changing it (and can, and not some oversight by me), feel free; otherwise I'll merge in a few days.

This is super awesome, thank you for this!

scroll_derive/src/lib.rs Outdated Show resolved Hide resolved
scroll_derive/examples/derive_custom_ctx.rs Outdated Show resolved Hide resolved
@Easyoakland
Copy link
Contributor Author

I think your comments have been addressed in the last 2 commits. Let me know if not.

Seeing as there's interest I'll elaborate a little.
The idea with using previous fields is instead of the derive making code like:

let data = Self {
            id: src.gread_with::<u32>(offset, ctx)?,
            timestamp: src.gread_with::<f64>(offset, ctx)?,
            custom_ctx: src.gread_with::<CustomCtx>(offset, USER_SPECIFIED_EXPR)?,
        };

it makes code like:

let id = src.gread_with::<u32>(offset, ctx)?;
let timestamp = src.gread_with::<f64>(offset, ctx)?;
let custom = src.gread_with::<CustomCtx>(offset, USER_SPECIFIED_EXPR)?
let data = Self {
            id,
            timestamp,
            custom,
        };

and so USER_SPECIFIED_EXPR can use the id and timestamp and ctx identifiers.

The only issue that I see with that approach is it breaks structs which use the derive and have a field named ctx.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so awesome, thank you for this! Re the other design questions, expanding on usecases for this, e.g., a previous field used as ctx, etc.,, it might be good to first flesh out some possible approaches (assuming there is more than one) and the possible tradeoffs of choosing them, since backwards compat will be something to think about since scroll is effectively "done" (i need to get around to releasing 1.0 :) )

#[scroll(ctx = BE)]
arr: [u16; 2],
// You can use arbitrary expressions for the ctx.
// You have access to the `ctx` parameter of the `{pread/gread}_with` inside the expression.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so wild hah, i have to think about usecases/what it means, or if there's some kind of hygiene issue that a different design might not have, but it's very interesting! is the fact that ctx is available within the scope of the pread something you have made available or is it a property of derive with field overrides in macros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the user's expression is effectively copy-pasted without hygiene and so the parameters of the trait method are accessible. It would take more work to remove this ability, since we'd have to parse for an instance of a ctx ident and actively block it.

Copy link
Contributor Author

@Easyoakland Easyoakland Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that if only Pwrite derive is used (for example), they won't get an error using self or dst idents.

// You have access to the `ctx` parameter of the `{pread/gread}_with` inside the expression.
// TODO(implement) you have access to previous fields.
// TODO(check) will this break structs with fields named `ctx`?.
#[scroll(ctx = EndianDependent(ctx.clone()).len())]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last thing I'll say: i think the example is actually better if EndianDependent is the ctx for VariableLength data, and inside of it's impl, it calls len() to get the read/write size, instead of the ctx == read/write size and relying on len() being called/passed in here.

That is clearer and more idiomatic scroll, imho, if there is such a thing :D

However, i understand that you are showing off arbitrary expressions are callable in the ctx here, which is very cool and probably worth retaining as you've written it.

Side note: I'm surprised you have to ctx.clone() ? Endian is Copy and I'd expect to not need that clone there? I also recall that effectively Ctx's generally need to be copy to be usable within pread/gread but maybe I forgot and that bound was relaxed?

Side Side note: if the ctx does need to be cloned, and you don't clone, how terrible is the error message presented to the user (at derive time)? I imagine it could be pretty bad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error[E0308]: mismatched types
  --> examples\derive_custom_ctx.rs:56:36
   |
56 |     #[scroll(ctx = EndianDependent(ctx).len())]
   |                    --------------- ^^^ expected `Endian`, found `&Endian`
   |                    |
   |                    arguments to this struct are incorrect
   |
note: tuple struct defined here
  --> examples\derive_custom_ctx.rs:4:8
   |
4  | struct EndianDependent(Endian);
   |        ^^^^^^^^^^^^^^^
help: consider dereferencing the borrow
   |
56 |     #[scroll(ctx = EndianDependent(*ctx).len())]

But if you follow the directions:

error[E0614]: type `Endian` cannot be dereferenced
  --> examples\derive_custom_ctx.rs:56:36
   |
56 |     #[scroll(ctx = EndianDependent(*ctx).len())]
   |                                    ^^^^

The clone works because Endian.clone() and (&Endian).clone both become Endian.
The issue is that the SizeWith derive makes this code:

fn size_with(ctx: &::scroll::Endian)

but the try_{from,into}_ctx derives make this code:

fn try_into_ctx(
        self,
        dst: &mut [u8],
        ctx: ::scroll::Endian,
    ) ->

and

fn try_from_ctx(
        src: &'a [u8],
        ctx: ::scroll::Endian,
    ) ->

Yeah, its a little hacky :)

If size_with was by value then EndianDependant(ctx) would work.

@m4b
Copy link
Owner

m4b commented Oct 22, 2024

Here is an example of PE parsing fields out, where the pe_pointer is used as the offset not the ctx, for example; i'll paste all the code here as it's pretty good example.

        let pe_pointer = bytes
            .pread_with(PE_POINTER_OFFSET as usize, scroll::LE)
            .map_err(|_| {
                error::Error::Malformed(format!(
                    "cannot parse PE header pointer (offset {:#x})",
                    PE_POINTER_OFFSET
                ))
            })?;

        let pe_signature: u32 =
            bytes
                .pread_with(pe_pointer as usize, scroll::LE)
                .map_err(|_| {
                    error::Error::Malformed(format!(
                        "cannot parse PE header signature (offset {:#x})",
                        pe_pointer
                    ))
                })?;

there's two interesting things happening here; a hardcoded offset is used for the first instance (although I think this is actually wrong, there's an issue open in goblin recently about this); and the second instance uses it's pread offset based on the previous field/result.

However, that field, pe_pointer is not actually stored, but used to verify the magic value against it; arguably all this just belongs in validation pass after parsing out the raw struct, but there are other examples in PE where some field is used afterwords to parse out.

There's also the classic cases of (count, items) delimited pairs, e.g., many formats have some fixed size number, usually a u32, which is used to parse out the number of successive elements. Generally the count is discarded in this case, e.g.:

(u32, [byte sequence)]) -> Vec<T>

Anyway, just some food for thought. Don't have to solve all this at once :)

Lastly, this made me wonder if we can have better derive error messages, e.g., if we fail to parse a field, we could in theory emit something like PE's error message, with the field name known statically, with something like "could not parse " which should in theory be a &'static str, maybe we already do something like this though, as I said I haven't thought about/touched the derive code in a while :)

@m4b m4b merged commit 01d8722 into m4b:master Oct 22, 2024
6 checks passed
@Easyoakland
Copy link
Contributor Author

there's two interesting things happening here; a hardcoded offset is used for the first instance (although I think this is actually wrong, there's an issue open in goblin recently about this); and the second instance uses it's pread offset based on the previous field/result.

I think this would be solved with an offset attribute which works like the ctx one here. The offset identifier would then be available since the derive starts with let offset = ...

There's also the classic cases of (count, items) delimited pairs, e.g., many formats have some fixed size number, usually a u32, which is used to parse out the number of successive elements. Generally the count is discarded in this case, e.g.:

You could make the entire expression to parse a single struct member customizable, but I don't know if that's going to be more readable then implementing the trait manually.

Lastly, this made me wonder if we can have better derive error messages, e.g., if we fail to parse a field, we could in theory emit something like PE's error message, with the field name known statically, with something like "could not parse " which should in theory be a &'static str, maybe we already do something like this though, as I said I haven't thought about/touched the derive code in a while :)

I don't think it currently works that way. It could, though I'm not sure if that would also cause a noticeable binary size increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants