-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using awint
on AVR
#12
Comments
It looks like you are using an older version of
|
Also, the default debug impls use unsigned hexadecimal interpretation, you can view the signed value by |
I'm curious what your application of my crate is, do you want it for fixed width 512 bit arithmetic? Tell me if you have any usability problems. |
Also, I just realized the documentation on |
I have published a new version of |
Yes, I'm aware,
I tried I know how two's complement works, and I was a bit confused as I didn't know how to properly concatenate a negative number.
This is cool, but I'm using this in no_std env, so can't even get unsigned hex. Also there's no easy way to iterate bytes. There is
I'm using it for elliptic curve cryptography on embedded targets, which means it has to be both no_std and no alloc. Right now, only two crates meet the criteria, one is In terms of usability, I think if you implement the operators so that I can use A more interesting set of utility would be to have a mod ring, e.g. I want to multiply
Looks good. I think what is lacking right now in docs (and src) are clear examples. Most examples are doing the same thing, e.g. concatenating a number. However there are not enough examples for multiplication, modulo, subtraction, etc. |
My crate can't implement the std ops because it would involve allocations and fallible operations under the hood. Actually, maybe I could implement them for the The way you use |
I completed my lib using fixed-bigint-rs but as I was suspecting, I ran into issue with RAM size, as my target chip - Arduino Uno - only has 2 KB of RAM. Been trying hard to optimize the code, but there's not much more I can do. Tried getting the stack size from both Seems like my
here's the code: #[inline]
fn double_assign_mod(&mut self, P: &BigNum, one: &BigNum) {
//let X1 = self.x.clone();
let X1 = &self.x; // does this cost any memory or compiler is smart enough?
//let Y1 = self.y.clone();
let Y1 = &self.y;
let two = BigNum::from_u8(2).unwrap(); // 64-bytes, can be reduced to 1-byte
let three = BigNum::from_u8(3).unwrap(); // 64-bytes, can be reduced to 1-byte
// every multiplication creates another 64-bytes num, as they are not in-place
let lam = ((three * ((*X1 * *X1) % P)) % P) * Self::invert(&(two * Y1), P) % P;
let X3 = (lam * lam - two * X1) % P;
let Y3 = if *X1 < X3 {
(lam * (*P + X1 - X3) - Y1) % P
} else {
(lam * (*X1 - X3) - Y1) % P
};
self.x = X3;
self.y = Y3;
} I found the same issue in If there's also a way to do fixed 32-byte Thoughts? |
2 KB of RAM, wow that's tight. I want to point out one performance footgun in such a case, be careful when moving around
Depending on how deep the stack is and how many function calls you have, what you could do is have some Organizational struct that every function is The main limiting factor is that my crate's
I'm running out of time, I haven't bug tested |
Thanks a lot for the code sample, I wasn't expecting that! |
Ported everything and this is orders of magnitude faster than Still I can't get memory usage down, here's I pushed the library here, I don't think there's much more optimization that can be done this. If you find time please have a look at let me know what you think. I'll try the reciprocal idea later. if I can reduce |
also, any chance of supporting |
My crate has a |
Also, is |
One more thing, the macros generate their constants as a byte array entered into a const function, I don't know if the compiler puts that into static memory automatically if |
I tried I tried adding the extra |
I think I forgot to note a fix in a changelog, I needed the special |
no I'll update to the latest soon
I compile with either Memory region Used Size Region Size %age Used
text: 41238 B 32 KB 125.85%
data: 165 B 2 KB 8.06%
eeprom: 0 GB 1 KB 0.00%
fuse: 0 GB 3 B 0.00%
lock: 0 GB 1 KB 0.00%
signature: 0 GB 1 KB 0.00%
user_signatures: 0 GB 1 KB 0.00% as flash size is 32 KB on my chip. I can compile with |
|
made a mistake |
That would be expected, those two are lesser than |
already applied everything there |
32K is crazy small, I unfortunately don't know what to do for that, could you see what https://github.com/RazrFalcon/cargo-bloat says? edit: wait that doesn't support embedded |
there's a C library called micro-ecc which can do this just fine on C, and I got it to work but I was hoping to do it in pure rust. I think he only uses double space for multiplications, that'd solve it for us as well if we can somehow limit everything to 256-bit except for multiplications which would be 512-bit. |
The reciprocal algorithm I put above uses multiplications to 528 bits, but the initial calculation requires |
The other likely culprit is that 8-bit microcontrollers have 8-bit registers but 16-bit |
What would be more appropriate is if I made a second |
Hm.. can you come up with a short example to demonstrate that? Because before starting with |
I'm not talking about correctness, what I'm talking about here is that if the ALU handles data in units of |
I just remembered something, I once made https://github.com/AaronKutch/u64_array_bigints which is a minimalistic biginteger library with just unsigned representations and stack based stuff. If you want total control, you could clone it, rename it to |
Would I gain any performance over Good news, I got let mut private_key = inlawi!(0x02_u512);
let curve = Curve::secp256k1();
let public_key = curve.multiply_simple(&mut private_key);
let mut buf = [0u8; 64]; <------
public_key.x.to_u8_slice(&mut buf);
print_hex_arr("x", &mut serial, &buf);
public_key.y.to_u8_slice(&mut buf);
print_hex_arr("y", &mut serial, &buf); previously I was initiating a 64-byte wide here's some stats: $ avr-size target/avr-atmega328p/release/app.elf
text data bss dec hex filename
12228 196 1 12425 3089 target/avr-atmega328p/release/app.elf program compiles down to 12KB (37.5% of 32KB) and uses 196B to start, plus analysis for ARM shows ~900 bytes usage for multiplication, almost 53% of total RAM. Next, I need to implement sha256, which takes at least 512 bytes, a bit more for buf, so almost 80% of RAM. If I can keep everything under 2KB of RAM usage, this would be the only pure Rust lib that can do ecc signing, for any architecture. Other crates work on >32-bit arch or need alloc, etc. But if not then I'd have to revisit reciprocal or other ideas you've suggested, and in the meanwhile, if you're feeling generous, you're welcome to add 256-bit |
So you jumped straight from 41KB to 12KB, what made the difference? Also, are you saying that my You definitely don't want to use But before you go to that length, I would suggest the reciprocal method as that makes it so only one large 528 bit buffer is needed at a time rather than 4 x 512. |
Correct. I use For example, here's the same program with $ telnet localhost 5678
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
00000000050eac6d7dc02c6da1595563abd6bd57fec00e23e14e0806d8af1ea300000000016a309aa25b0403f1eef75702e84bb7597aabde5c51aa3ec721cb2bC whereas with $ telnet localhost 5678
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
Connection closed by foreign host.
It hangs only when I use a bigger size I should have gone with ARM, but that would have been boring, as there are plenty of libs supporting that.
I'll have to re-visit this in the future. |
One more thing, you can use |
Ok, I'll try them out.
it already is on AVR. also, a lot of embedded projects use an older did a quick estimation re memory usage:
|
actually no. it's almost twice what we estimated! shouldn't local one at least be close to what we estimated? for example, can you confirm memory usage of ashr_assign @ 72/40 bytes, or mul_assign @ 72/56 bytes? rest seems to be ARM-specific memmove, which are unexpected, unless they are from |
Uhh you shouldn't be using I'm experimenting on a small example with |
Ok, next chunk of freetime I get (may be next week, may be a few months), I will do the bifurcation thing so that you don't need to do the On your side, I'm thinking the best strategy is to only use |
I found something, it turns out that the function that
Is subtracting 128 from the stack pointer (plus 32 or so depending on the architecture for other things), doing one
By inlining stuff manually I got the compiler to do what it should do
See how much cleaner it is. If I had more time I would reproduce with primitive arrays and file a bug report. |
Ok, I'm most of the way to fixing things. I just need a way to 'cfg' for AVR (and any other arches you can think of), they aren't distributed by rustup so I would have a hard time testing it. Is there a way to activate feature flags based on arch? |
I'm inlining every function call so I think that should not matter? Also, I can't seem to inline Re the assembly, is that x64, arm, or avr? You're initializing a Btw, you're saying this is compiler's fault, yes?
Actually, I added a feature Do you need help setting up for AVR? Here's the main.rs you can use as an example, with this Cargo.toml but you can remove extra stuff. You also need this target file avr-atmega328p.json this is my final main.rs: #![no_std]
#![no_main]
use panic_halt as _;
use arduino_hal::prelude::*;
use core::fmt::Debug;
use ufmt::uWrite;
use noble_secp256k1::awint::{cc, inlawi, inlawi_ty, Bits, InlAwi};
use noble_secp256k1::{BigNum, Curve};
fn print_hex_arr<S>(tag: &str, serial: &mut S, arr: &[u8])
where
S: uWrite,
<S as uWrite>::Error: Debug,
{
ufmt::uwrite!(serial, "{} = ", tag).unwrap();
for e in arr.iter().rev() {
ufmt::uwrite!(serial, "{:02x}", *e).unwrap();
}
ufmt::uwrite!(serial, "\r\n").unwrap();
}
#[arduino_hal::entry]
fn main() -> ! {
let dp = arduino_hal::Peripherals::take().unwrap();
let pins = arduino_hal::pins!(dp);
let mut serial = arduino_hal::default_serial!(dp, pins, 57600);
let mut private_key = inlawi!(0x02_u512);
let curve = Curve::secp256k1();
let public_key = curve.multiply_simple(&mut private_key);
let mut buf = [0; 32];
public_key.x.to_u8_slice(&mut buf);
print_hex_arr("x", &mut serial, &buf);
public_key.y.to_u8_slice(&mut buf);
print_hex_arr("y", &mut serial, &buf);
ufmt::uwriteln!(&mut serial, "Hello from Arduino!\r").void_unwrap();
loop { }
} and make sure to import noble-secp256k1 = { git = "https://github.com/xphoniex/noble-secp256k1-rs", default-features = false, features = ["8-bit"] } and .cargo/config.toml: rustflags = ["-Z emit-stack-sizes"]
[target.avr-atmega328p]
runner = "qemu-system-avr -M uno -nographic -serial tcp::5678,server=on -bios"
[build]
target = "avr-atmega328p.json"
[unstable]
build-std = ["core"] you also need avr toolchain like # term 1
$ cargo run -r
# or
$ qemu-system-avr -M uno -bios target/avr-atmega328p/release/<file>.elf -nographic -serial tcp::5678,server=on # term 2
$ telnet localhost 5678
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
Hello from Arduino! once you're done, terminate I've simplified my I'll have to add sha256 and then hmac first though. |
I was using
Is one of the |
Gotcha, thanks for explanation. Is this compiler's fault? |
I don't know where in the chain the extra alloc is introduced, but it should be gone in version 0.10.0 which I just released, here is the changelog. In a future version #15 should be implemented, it unfortunately still requires two large storages but should significantly improve execution time and code size. |
when I get around to attempting #18 things should improve further |
I just published v0.12.0, some things should be improved. MSRV is 1.70 now. |
here's what I have now:
output:
but here's what I'm getting from python:
The text was updated successfully, but these errors were encountered: