Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add lock for enif_send synchronization #19

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

savonarola
Copy link
Collaborator

@savonarola savonarola commented Apr 9, 2024

@savonarola savonarola mentioned this pull request Apr 9, 2024
@savonarola savonarola marked this pull request as ready for review April 9, 2024 06:28
@savonarola savonarola merged commit c0fcb4d into emqx:dev Apr 9, 2024
7 checks passed
@savonarola savonarola deleted the 0409-add-msg-lock branch April 9, 2024 06:46
Comment on lines 85 to 90
if(!cancelled) {
enif_mutex_lock(future->msg_lock);
msg = T2(future->msg_env, future->msg_ref, ATOM_ready);
enif_send(NULL, &(future->pid), future->msg_env, msg);
enif_mutex_unlock(future->msg_lock);
}
Copy link

@jhogberg jhogberg Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is buggy, see erlang/otp#8358 (comment) and the following comments. Both the part about not passing NULL blindly as the environment for enif_send, and creating a new message environment for each call to enif_send (make sure to copy msg_ref into that environment).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhogberg really appreciated for your comments!

We are stuck here, for what to set for the caller_env in enif_send.
It is clear that we could use NULL from callback in a 3rd party managed thread.

However as you mentioned in erlang/otp#8358 (comment), it is tricky to handle the sync callback from the 3rd party which is executed by beam managed scheduler thread.

In the PR https://github.com/emqx/couchdb-erlfdb/pull/18/files#diff-ceadfa165520612e5b6b1a255f7680bcaa9e2a67bf47677f6884fd1db3cb03f5L122 it tried to fix the issue of reusing the env of previous nif function call, as the env becomes invalid after that NIF function call. The old behaviour is to save the caller's env and reuse it in the caller. So we change it to NULL to pretend the enif_send is called from a none beam managed and it seems to work in our CI env.

as I see you propose following:

thread-specific data is probably the most straightforward way

Do you mean we could use "process independent environments " ? We are a bit unsure about that because it was not stated in the doc about call_env

caller_env
The environment of the calling thread ([process bound](https://www.erlang.org/doc/man/erl_nif#proc_bound_env) or [callback](https://www.erlang.org/doc/man/erl_nif#callback_env) environment) or NULL if calling from a custom thread not spawned by ERTS.

BTW I look into the enif_send implementation and I couldn't see set caller_env to NULL in scheduler thread could cause issue as when caller_env is set to NULL, the c_p is set to NULL then the side effect of it is just send the from to undefined. I am double checking, will be appreciated if you could give me some hints.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean using enif_tsd_key_create when the NIF is loaded and then enif_tsd_set(key_goes_here, nif_environment_goes_here) on every entry into a NIF function/callback, and then passing the result of enif_tsd_get(key_goes_here)as the caller environment of enif_send.

That will ensure that the process-bound/callback environment is passed when running on a BEAM-managed thread, and NULL otherwise. :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the process-bound env safe to pass here? what if the process is dead or moved to other scheduler thread?

Copy link

@jhogberg jhogberg Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process-bound or callback environment is guaranteed to be safe to use until its corresponding NIF or NIF callback returns. Thus, if you always set the environment through enif_tsd_set on entry to NIFs/NIF callbacks, you can use enif_tsd_get to get a safe environment in case erlfdb_future_cb is called synchronously.

When erlfdb_future_cb is called asynchronously, enif_tsd_get will return NULL since you've never set it on the third-party threads, which is fine since enif_send requires NULL in that case.

cancelled = future->cancelled;
enif_mutex_unlock(future->lock);
enif_mutex_unlock(future->cancel_lock);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to prevent duplicated message sending below, we should take the locker longer.

cancelled = future->cancelled;
enif_mutex_unlock(future->lock);
enif_mutex_unlock(future->cancel_lock);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to prevent duplicated message sending below, we should take the locker longer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at a newer version 😅: #20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants