-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. #15
Comments
你好,请问这个问题您解决了?我也出现了和你一样的的问题。 |
你好,我也遇到了一样的问题,请问一下解决了吗? |
没呢 不会搞😢
发自我的iPhone
…------------------ Original ------------------
From: zhx0506 ***@***.***>
Date: Sat, Jun 22, 2024 8:55 PM
To: marmotlab/PRIMAL2 ***@***.***>
Cc: ekko7 ***@***.***>, Author ***@***.***>
Subject: Re: [marmotlab/PRIMAL2] ray.exceptions.RayActorError: The actor diedunexpectedly before finishing this task. (Issue #15)
hi, i met this problem when running python driver.py.
Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT
I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 2_4090+6_4080). Do you have any idea about this problem?
你好,我也遇到了一样的问题,请问一下解决了吗?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
小伙伴,可以加个微信一起交流学习嘛~我的微信:zhxly1018 |
hi, i met this problem when running python driver.py.
Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT
I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 24090+64080). Do you have any idea about this problem?
The text was updated successfully, but these errors were encountered: