forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove useless function declaration #5
Comments
planetA
pushed a commit
that referenced
this issue
Feb 10, 2023
During EEH error injection testing, a deadlock was encountered in the tg3 driver when tg3_io_error_detected() was attempting to cancel outstanding reset tasks: crash> foreach UN bt ... PID: 159 TASK: c0000000067c6000 CPU: 8 COMMAND: "eehd" ... #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3] #7 [c00000000681faf0] eeh_report_error at c00000000004e25c ... PID: 290 TASK: c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1" ... #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c00000003721fc60] process_one_work at c00000000019e5c4 ... PID: 296 TASK: c000000037a65800 CPU: 21 COMMAND: "kworker/21:1" ... #4 [c000000037247bc0] rtnl_lock at c000000000c940d8 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3] #6 [c000000037247c60] process_one_work at c00000000019e5c4 ... PID: 655 TASK: c000000036f49000 CPU: 16 COMMAND: "kworker/16:2" ...:1 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c0000000373ebc60] process_one_work at c00000000019e5c4 ... Code inspection shows that both tg3_io_error_detected() and tg3_reset_task() attempt to acquire the RTNL lock at the beginning of their code blocks. If tg3_reset_task() should happen to execute between the times when tg3_io_error_deteced() acquires the RTNL lock and tg3_reset_task_cancel() is called, a deadlock will occur. Moving tg3_reset_task_cancel() call earlier within the code block, prior to acquiring RTNL, prevents this from happening, but also exposes another deadlock issue where tg3_reset_task() may execute AFTER tg3_io_error_detected() has executed: crash> foreach UN bt PID: 159 TASK: c0000000067d2000 CPU: 9 COMMAND: "eehd" ... #4 [c000000006867a60] rtnl_lock at c000000000c940d8 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3] #6 [c000000006867b00] eeh_report_reset at c00000000004de88 ... PID: 363 TASK: c000000037564000 CPU: 6 COMMAND: "kworker/6:1" ... #3 [c000000036c1bb70] msleep at c000000000259e6c #4 [c000000036c1bba0] napi_disable at c000000000c6b848 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3] #6 [c000000036c1bc60] process_one_work at c00000000019e5c4 ... This issue can be avoided by aborting tg3_reset_task() if EEH error recovery is already in progress. Fixes: db84bf4 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize") Signed-off-by: David Christensen <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://github.com/torvalds/linux/blob/d652ea30ba32db12fe8365182fad5ba2e7c22822/include/rdma/ib_verbs.h#L4280
The text was updated successfully, but these errors were encountered: