Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault starting agd on intel mac w/VirtioFS #30

Closed
dckc opened this issue Nov 24, 2023 · 14 comments
Closed

segmentation fault starting agd on intel mac w/VirtioFS #30

dckc opened this issue Nov 24, 2023 · 14 comments
Assignees

Comments

@dckc
Copy link
Member

dckc commented Nov 24, 2023

Note resolution is to switch to gRPC FUSE and disable virtualization framework.


What I did:

  1. clone https://github.com/agoric-labs/dapp-game-places (currently 5ef3209)
  2. agoric install and test the contract with cd contract; yarn; yarn test
  3. yarn start:docker spends a long time (552.26s) pulling an image but then...
  4. yarn docker:logs shows that it crashed right away: start_agd.sh: line 10: 134 Segmentation fault

I think @JimLarson ran into these symptoms as well.

Image version:

$ docker images | grep propo
ghcr.io/agoric/agoric-3-proposals   main      d5dcc736bdf3   9 days ago   5.84GB

MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
Purchase Date: February 2021
Intel Core i5

console detail
danbook:contract connolly$ yarn start:docker
yarn run v1.22.19
$ docker-compose up -d
[+] Running 47/1
 ✔ agd 46 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                            551.0s 
[+] Building 0.0s (0/0)                                                                                               docker:desktop-linux
[+] Running 2/2
 ✔ Network dapp-game-places_default  Created                                                                                          0.1s 
 ✔ Container dapp-game-places-agd-1  Started                                                                                          0.5s 
✨  Done in 552.26s.
danbook:contract connolly$ docker-compose top
danbook:contract connolly$ yarn docker:logs
yarn run v1.22.19
$ docker-compose logs --tail 200 -f
dapp-game-places-agd-1  | ENV_SETUP starting
dapp-game-places-agd-1  | ENV_SETUP finished
dapp-game-places-agd-1  | 3:13PM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SDK v0.45). Please explicitly put the desired minimum-gas-prices in your app.toml.
dapp-game-places-agd-1  | /usr/src/upgrade-test-scripts/start_agd.sh: line 10:   134 Segmentation fault      agd start --log_level warn "$@"
✨  Done in 0.23s.
@dckc
Copy link
Member Author

dckc commented Feb 26, 2024

ran into this again today

$ docker images | grep main
ghcr.io/agoric/agoric-3-proposals   main      e47472b2709c   5 weeks ago    3.41GB

@dckc
Copy link
Member Author

dckc commented Feb 26, 2024

today's problem was with dapp-agoric-basics:

  1. clone https://github.com/Agoric/dapp-agoric-basics ( 903f03b54c3cb9bc23aa4c48254373e52fb27065 )
  2. yarn
  3. yarn start:docker

@mhofman
Copy link
Member

mhofman commented Feb 26, 2024

This is likely not a problem of a3p but of agd. Does this reproduce consistently?

It would be nice to have the full logs of agd start, and maybe a core dump

@JimLarson
Copy link

Previously, there were no logs, core dump, stack trace message, etc. Or Docker ate them.

We've had other reports of segfaults of agd in the field, not using Docker or Intel Mac, but they were sporadic and gave a stack trace pointing to GC in the runtime.

I suspect that our dual-runtime process setup might play a role in this, since Node and Go individually are quite stable and have probably been used heavily on Docker/Intel Mac. This is only a circumstantial suspicion. If we had split brain working, we could exonerate or implicate the dual-runtime.

Since every part of this issue works fine on its own, or even in all pairs (Docker, Intel Mac, agd), the issue is likely completely bananas.

@JimLarson
Copy link

Trying to repro, but yarn fails with:

error [email protected]: The engine "node" is incompatible with this module. Expected version "^18.0.0 || >=20.0.0". Got "16.19.0"
error Found incompatible module.

Have we made the jump to node version 18?

@turadg
Copy link
Member

turadg commented Feb 26, 2024

Have we made the jump to node version 18?

SDK hasn't: Agoric/agoric-sdk#8365

Neither has https://github.com/Agoric/dapp-agoric-basics , but its dependencies have (vitest). 16 has been EOL for 5 months.

@JimLarson
Copy link

More specifically, my question is should I upgrade my local machine to node 18, or do I need to install nvm to switch back and forth quickly? I'd much prefer simple monotonic upgrades.

@turadg
Copy link
Member

turadg commented Feb 26, 2024

@JimLarson your local machine can run Active LTS which is 20

@JimLarson
Copy link

JimLarson commented Feb 26, 2024

Upgraded to node 20 and I can repro the failure.

@JimLarson
Copy link

Segfault when launching ag-chain-cosmos.

@JimLarson
Copy link

Can reproduce in docker container by changing to home directory and running

# agd start
10:28PM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SDK v0.45). Please explicitly put the desired minimum-gas-prices in your app.toml.
10:28PM INF agd delegating to JS executable args=["ag-chain-cosmos","--home","/root/.agoric","start"] binary=/usr/src/agoric-sdk/packages/cosmic-swingset/src/entrypoint.js
Segmentation fault

Error appears to be in node startup:

# which ag-chain-cosmos
/usr/local/bin/ag-chain-cosmos
# file `which ag-chain-cosmos`
/usr/local/bin/ag-chain-cosmos: symbolic link to /usr/src/agoric-sdk/packages/cosmic-swingset/bin/ag-chain-cosmos
# file /usr/src/agoric-sdk/packages/cosmic-swingset/bin/ag-chain-cosmos
/usr/src/agoric-sdk/packages/cosmic-swingset/bin/ag-chain-cosmos: symbolic link to ../src/entrypoint.js
# ag-chain-cosmos --home /root/.agoric start
Segmentation fault

Compared to a non-Docker environment (scenario2-run-chain in packages/cosmic-swingset)

$ ./src/entrypoint.js --home=t1/n0 start
3:51PM INF starting node with ABCI Tendermint in-process
3:51PM INF service start impl=multiAppConn module=proxy msg={}
3:51PM INF service start connection=query impl=committingClient module=abci-client msg={}
...

@JimLarson
Copy link

Not sure if this is an acceptable resolution, but following a similar issue I disabled the "use virtualization framework" in docker desktop > settings > general. This also required me to change the file sharing to "gRPC FUSE". Seems to work now.

@mhofman
Copy link
Member

mhofman commented Feb 27, 2024

That sounds like a quirk of MacOS docker, which would make sense given this doesn't seem to happen on Linux (or WSL) machines.

@JimLarson
Copy link

Closing with the above instructions for workaround. Hopefully future releases of Docker or Node (or MacOS?) will fix the issue.

@dckc dckc changed the title segmentation fault starting agd on intel mac segmentation fault starting agd on intel mac w/VirtioFS Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants