Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto witgen #2071

Draft
wants to merge 21 commits into
base: call_jit_from_block
Choose a base branch
from
Draft

Auto witgen #2071

wants to merge 21 commits into from

Conversation

chriseth
Copy link
Member

No description provided.

@chriseth
Copy link
Member Author

chriseth commented Nov 13, 2024

This can now generate the following code for the binary machine:

//Known inputs: main_binary::operation_id, main_binary::A, main_binary::B
let main_binary::sel[1]_u3 = 0
let main_binary::sel[1]_u2 = 0
let main_binary::sel[1]_u1 = 0
let main_binary::sel[1]_d0 = 0
let main_binary::sel[1]_d1 = 0
let main_binary::sel[1]_d2 = 0
let main_binary::sel[1]_d3 = 0
let main_binary::sel[2]_u3 = 0
let main_binary::sel[2]_u2 = 0
let main_binary::sel[2]_u1 = 0
let main_binary::sel[2]_d0 = 0
let main_binary::sel[2]_d1 = 0
let main_binary::sel[2]_d2 = 0
let main_binary::sel[2]_d3 = 0
let main_binary::operation_id_u1 = main_binary::operation_id_d0
let main_binary::A_byte_u1 = (main_binary::A_d0 & 0xff000000) / 16777216;
let main_binary::A_u1 = main_binary::A_d0 & 0xffffff;
let main_binary::B_byte_u1 = (main_binary::B_d0 & 0xff000000) / 16777216;
let main_binary::B_u1 = main_binary::B_d0 & 0xffffff;
let mut lookup_20_3: T = 0.into();
fixed_lookup_machine.process_lookup_direct((20, vec![LookupCell::Input(&(main_binary::operation_id_d0)), LookupCell::Input(&(main_binary::A_byte_u1)), LookupCell::Input(&(main_binary::B_byte_u1)), LookupCell::Output(&mut lookup_20_3)]))
let main_binary::C_byte_u1 = lookup_20_3
let main_binary::sel[0]_d0 = 1
let main_binary::operation_id_u2 = main_binary::operation_id_u1
let main_binary::A_byte_u2 = (main_binary::A_u1 & 0xff0000) / 65536;
let main_binary::A_u2 = main_binary::A_u1 & 0xffff;
let main_binary::B_byte_u2 = (main_binary::B_u1 & 0xff0000) / 65536;
let main_binary::B_u2 = main_binary::B_u1 & 0xffff;
let mut lookup_20_3: T = 0.into();
fixed_lookup_machine.process_lookup_direct((20, vec![LookupCell::Input(&(main_binary::operation_id_u1)), LookupCell::Input(&(main_binary::A_byte_u2)), LookupCell::Input(&(main_binary::B_byte_u2)), LookupCell::Output(&mut lookup_20_3)]))
let main_binary::C_byte_u2 = lookup_20_3
let main_binary::operation_id_u3 = main_binary::operation_id_u2
let main_binary::A_byte_u3 = (main_binary::A_u2 & 0xff00) / 256;
let main_binary::A_u3 = main_binary::A_u2 & 0xff;
let main_binary::B_byte_u3 = (main_binary::B_u2 & 0xff00) / 256;
let main_binary::B_u3 = main_binary::B_u2 & 0xff;
let mut lookup_20_3: T = 0.into();
fixed_lookup_machine.process_lookup_direct((20, vec![LookupCell::Input(&(main_binary::operation_id_u2)), LookupCell::Input(&(main_binary::A_byte_u3)), LookupCell::Input(&(main_binary::B_byte_u3)), LookupCell::Output(&mut lookup_20_3)]))
let main_binary::C_byte_u3 = lookup_20_3
let main_binary::A_byte_u4 = main_binary::A_u3
let main_binary::B_byte_u4 = main_binary::B_u3
let mut lookup_20_3: T = 0.into();
fixed_lookup_machine.process_lookup_direct((20, vec![LookupCell::Input(&(main_binary::operation_id_u3)), LookupCell::Input(&(main_binary::A_byte_u4)), LookupCell::Input(&(main_binary::B_byte_u4)), LookupCell::Output(&mut lookup_20_3)]))
let main_binary::C_byte_u4 = lookup_20_3
let main_binary::sel[0]_u3 = 0
let main_binary::sel[0]_u2 = 0
let main_binary::sel[0]_u1 = 0
let main_binary::sel[0]_d0 = 0
let main_binary::sel[0]_d1 = 0
let main_binary::sel[0]_d2 = 0
let main_binary::sel[0]_d3 = 0

@chriseth chriseth changed the base branch from main to call_jit_from_block November 16, 2024 18:45
@chriseth
Copy link
Member Author

chriseth commented Nov 21, 2024

Ran an extended version of binary_large_test.asm that just loops infinitely and extended the degree of the main machine to 2**20. The binary machine is jit-computed. The compilation time is included in the statistics below.

 == Witgen profile (12469602 events)
   43.7% (   12.4s): Main Machine
   25.9% (    7.3s): Secondary machine 0: main_binary (BlockMachine)
   25.0% (    7.1s): FixedLookup
    5.5% (    1.5s): witgen (outer code)
    0.0% ( 470.0ns): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 28.293288704s

Used rows in binary: 8161892
Used rows in main: 4194304

Witgen takes 800 ns for one lookup to the binary machine (this includes the time spent in the fixed lookup machine)

The same .asm file run on current main:

 == Witgen profile (36955286 events)
   61.4% (   39.0s): Secondary machine 0: main_binary (BlockMachine)
   20.3% (   12.9s): Main Machine
   15.9% (   10.1s): FixedLookup
    2.5% (    1.6s): witgen (outer code)
    0.0% ( 412.0ns): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 63.642381077s
    ```

}
}

fn process_lookup<'b, 'd, T: FieldElement, Q: QueryCallback<T>>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this extern "C"?

// TODO this applies a shift. Maybe we could do it much earlier?
latch_row: usize,
mutable_state: &'b mut MutableState<'a, 'b, T, Q>,
process_lookup: fn(&'b mut MutableState<'a, 'b, T, Q>, u64, Vec<LookupCell<'c, T>>) -> bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LookupCell to use "C" repr

@chriseth
Copy link
Member Author

Now I got it down to roughly 600. I think we can save another 100ms by improving how the known bits are stored (padded to a u32 for each new row).

@chriseth
Copy link
Member Author

Yep, now we have 540ns per block

@chriseth
Copy link
Member Author

We should change process_lookup_direct not to take a Vec, because then we need to allocate. Instead, we should take a mutable ref to a slice and create it on the stack.

@chriseth
Copy link
Member Author

Using mutable refs to slices instead of vectors for process_lookup_direct reduced the time for block witgen (including the 4 fixed lookups) to 490 ns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant