-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./lodestar: line 7: 2710 Illegal instruction (core dumped) #7074
Comments
Same happens on validator.
|
Do you happen to have |
@q9f what's the machine's cpu arch? |
Are you running in docker or bare metal? What version did you upgrade from to help narrow the search. |
No, I can turn it on though, it won't go further than this.
x64 (Intel® Celeron® Processor N5105 )
Bare metal, managing node through nvm and building from git tag's source. I upgraded from 1.20.1 to 1.21.0. |
Debug logs seem to be useless here.
Let me try to find the core dump. |
Running lodestar through node directly without args says "illegal hardware instruction"
|
Core dump:
|
Looks like a blst wrapper issue (or blst itself?) |
Yeah, 1.21.0 introduced a new blst wrapper (a rewrite of the wrapper using napi-rs) and an updated version of blst (from a Sept 2022 version to the most recent release). Will take some deeper investigation to see where the problem lies (whether in the newer version of blst, or in the wrapper). |
you rock for the stack trace!!! amazing!!! Not sure which layer its coming from though from that info. We rewrote in rust recently so it adds a bit more complexity to that answer. I am putting together a branch for you that will allow a debug build of the bindings to get built locally on your box. Will fill out the function names of the first 6 lines on the back trace to narrow down where the illegal instruction is coming from |
Just to be sane I did remove everything and built a fresh lodestar on both the affected machine and my workstation. I can confirm that this can not be reproduced on a modern Intel Core i9 - so I suspect something is off with the Celeron SoC. I'm wondering if there is an easy way to debug the blst library directly as I don't think this stems from lodestar or the wrapper even. Meanwhile, I'm downgrading to 1.20.2 for the time being. Staying tuned for instructions. |
@q9f I pushed a branch that will run the debug build of the c branch name is As a note, I think you hunch is correct and that its in the assembly portion of blst. I do not think the assembly will show "proper stack traces" so we will get some meta from that. Another likely candidate is the napi-rs bindings layer. We have found some edges that are rough in that lib because its relatively new. Could be something as simple as a conditional build flag. |
@matthewkeil this build from branch You fixed it 😛 |
That is both wonderful and concerning because of the couple modifications I made when testing the debug build. What OS are you running? If you were able to build can I assume you have the rust toolchain on that machine? |
Yes, I always maintain a Rust toolchain. The OS is Archlinux, by the way. |
I've narrowed down the issue but an not sure why your system is reporting to not use I am trying to sort out the why so I can put up a PR in the // account for cross-compilation [by examining environment variables]
let target_os = env::var("CARGO_CFG_TARGET_OS").unwrap();
let target_env = env::var("CARGO_CFG_TARGET_ENV").unwrap();
let target_arch = env::var("CARGO_CFG_TARGET_ARCH").unwrap();
let target_family = env::var("CARGO_CFG_TARGET_FAMILY").unwrap_or_default();
let target_no_std = target_os.eq("none")
|| (target_os.eq("unknown") && target_arch.eq("wasm32"))
|| target_os.eq("uefi")
|| env::var("BLST_TEST_NO_STD").is_ok();
if !target_no_std {
println!("cargo:rustc-cfg=feature=\"std\""); // this is the line I manually overrode
if target_arch.eq("wasm32") || target_os.eq("unknown") {
println!("cargo:rustc-cfg=feature=\"no-threads\"");
}
} |
|
Hey folks, I had a user run into this on my side and can reproduce it easily too. Running
New versions of BLST are supposed to dynamically detect CPU features at runtime; older versions had them fixed during compilation which is why Prysm and Lighthouse used to have "modern" vs. "portable" builds (and more importantly, Docker containers). Either option would probably resolve this issue. |
It's strange that this is happening since we're using a fairly recent version of blst (post 0.3.12, only 3 months old) which seems to have the runtime detection. |
@jclapis I posted the container info to Discord. Please feel free to drop results here when you get a chance to run them on the machine that was having issues. Thanks! |
Forwarded from @jclapis on Discord. Using the same fix from @q9f solved his issue as well. Both are running Celeron processors so there is a common thread. Also got the info below as reference. $ cat /proc/cpuinfo
processor: 0
vendor_id: GenuineIntel
cpu family: 6
model: 156
model name: Intel(R) Celeron(R) N5105 @ 2.00GHz
$
$
$ hyperdrive s ccf
Your CPU is missing support for the following features:
- adx
- avx
- avx2
- bmi1
- bmi2
You must use the 'portable' image. |
Added "portable" feature to bindings and perf tested results. Hopefully we get best of both worlds, as its faster and should be compatible. Building a test lodestar branch now with the change. |
@q9f I updated your branch cd /to/root/of/lodestar
rm -rf temp-deps/blst-ts
yarn clean && yarn clean:nm && yarn && yarn build
# start as you normally would |
That works and does not fail with illegal instruction. |
Awesome news @q9f! Thanks for checking that! Will PR the change in today and try to release with 1.23 |
PR in blst-ts PR in lodestar |
Describe the bug
My lodestar
v1.21.0/ae1f9d5
fails to run after an upgrade.Expected behavior
No illegal instruction.
Steps to reproduce
Additional context
No response
Operating system
Linux
Lodestar version or commit hash
ae1f9d5
The text was updated successfully, but these errors were encountered: