Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. #15

Open
ekkooee7 opened this issue Apr 2, 2024 · 4 comments

Comments

@ekkooee7
Copy link

ekkooee7 commented Apr 2, 2024

hi, i met this problem when running python driver.py.

Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT

I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 24090+64080). Do you have any idea about this problem?

@shanyaolingling
Copy link

你好,请问这个问题您解决了?我也出现了和你一样的的问题。
2024-06-07 18:34:27,860 WARNING worker.py:2074 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffacd284e1f8c50cabcb93ada401000000 Worker ID: 1cb486b8f2dfadfd60473610d32c9df8cd0db6defbe70ea46c362cd5 Node ID: f469e2532f4d5ffb080939624763dd3e02888f81bf30331155b64d0a Worker IP address: 10.4.52.11 Worker port: 39717 Worker PID: 9106 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Traceback (most recent call last):
File "/home/dingyanling/downloads/PRIMAL2-main/driver.py", line 235, in
main()
File "/home/dingyanling/downloads/PRIMAL2-main/driver.py", line 176, in main
jobResults, metrics, info = ray.get(done_id)[0]
File "/home/dingyanling/downloads/anaconda3/envs/mapf/lib/python3.9/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/dingyanling/downloads/anaconda3/envs/mapf/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/dingyanling/downloads/anaconda3/envs/mapf/lib/python3.9/site-packages/ray/_private/worker.py", line 2565, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: imitationRunner
actor_id: acd284e1f8c50cabcb93ada401000000
pid: 9106
namespace: 3682ea21-ae56-46e2-8c7a-f56835c9a85c
ip: 10.4.52.11
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

@zhx0506
Copy link

zhx0506 commented Jun 22, 2024

hi, i met this problem when running python driver.py.

Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT

I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 2_4090+6_4080). Do you have any idea about this problem?

你好,我也遇到了一样的问题,请问一下解决了吗?

@ekkooee7
Copy link
Author

ekkooee7 commented Jun 22, 2024 via email

@zhx0506
Copy link

zhx0506 commented Jun 22, 2024

没呢 不会搞😢 发自我的iPhone

------------------ Original ------------------ From: zhx0506 @.> Date: Sat, Jun 22, 2024 8:55 PM To: marmotlab/PRIMAL2 @.> Cc: ekko7 @.>, Author @.> Subject: Re: [marmotlab/PRIMAL2] ray.exceptions.RayActorError: The actor diedunexpectedly before finishing this task. (Issue #15) hi, i met this problem when running python driver.py. Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(args, kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 2_4090+6_4080). Do you have any idea about this problem? 你好,我也遇到了一样的问题,请问一下解决了吗? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.&g

小伙伴,可以加个微信一起交流学习嘛~我的微信:zhxly1018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants