Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple storage: Finalize busy object state also for ObjFreeObj() #3954

Merged

Conversation

nigoroll
Copy link
Member

@nigoroll nigoroll commented Jul 9, 2023

If oc->boc is not NULL for an ObjFreeObj() call, the stevedore needs to finish any busy object transaction.

Please see bug ticket for a more detailed explanation.

Fixes #3953

@dridi
Copy link
Member

dridi commented Jul 10, 2023

How hard would it be to create a test case?

dridi added a commit that referenced this pull request Jul 10, 2023
dridi added a commit that referenced this pull request Jul 10, 2023
@nigoroll
Copy link
Member Author

How hard would it be to create a test case?

We would need another storage engine which occupies boc->stevedore_priv, and then make b20.vtc run into it. This could be done with the debug stevedore, but we'd need to wrap/copy some code from malloc and I was not sure if it's worth it.

BTW, it's nice that you picked up the TAKE_OBJ_NOTNULL() idea. :)

@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from ea44b0b to 54e9121 Compare July 11, 2023 21:46
@nigoroll
Copy link
Member Author

bugwash participants would like to get @mbgrydeland\ 's opinion, but I actually think we should be fine:

This is quite contained in varnish-cache, and any stevedore would, if it used stevedore_priv like the in-tree code, hit the same thing. Also, the fix for stevedores does not require the fix in v-c

@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from 54e9121 to 0308f7d Compare July 17, 2023 14:34
@mbgrydeland
Copy link
Contributor

mbgrydeland commented Jul 18, 2023

Isn't this patch introducing a race? sml_objfree() may be called from any thread executing LRU, while the sml_bocdone() is limited to the single thread being last at dropping the boc reference. These may then be racing on clearing the boc->stevedore_priv.

Also isn't the whole patch kind of a noop? The freeing/clearing would happen on the boc dereference eventually, so is there a specific need to have it dont early? And if so, that clearing should be happening in sml_finish(), or some other fetch specific Obj function.

@mbgrydeland
Copy link
Contributor

Reading the ticket description, I guess my noop-argument is false. But my fear of an introduced race persists.

@mbgrydeland
Copy link
Contributor

I think a cleaner solution would be to introduce a ObjBocReset() function that is called from where core decides to reuse the boc.

nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Jul 19, 2023
@nigoroll
Copy link
Member Author

nigoroll commented Jul 19, 2023

re @mbgrydeland

Isn't this patch introducing a race? sml_objfree() may be called from any thread executing LRU, while the sml_bocdone() is limited to the single thread being last at dropping the boc reference. These may then be racing on clearing the boc->stevedore_priv.

No. The single call site of ObjFreeObj() with oc->boc != NULL is:

ObjFreeObj(bo->wrk, oc);

By convention *1, objects are not added to an LRU before ObjBocDone(). So for one, we should never race LRU.

Secondly, at this point in our current code, a non-private objcore has at least two references, one owned by the boc and one held by the fetch code, and LRU only considers objcores with exactly one reference. To give you more confidence, the test suite also passes with this additional commit, which, however, I would not want to add to the code because it feels brittle to me.

Also isn't the whole patch kind of a noop? The freeing/clearing would happen on the boc dereference eventually, so is there a specific need to have it dont early?

No and yes. Because we might (and, by default *2, do) change the storage engine for the synthetic response generated in vcl_backend_error {}, we need to finalize the stevedore_priv owned by the previous storage engine (also explained in #3953). This is exactly what happened in the original panic, where the beresp.storage was of type fellow, but then changed to Transient of type malloc in vcl_backend_error {}.

So it is only kind of a noop for the special case where the storage engine stays the same across all of the backend side processing (that is, the storage engine selected for regular responses is the same as Transient), but even then I would still consider #3953 a bug, because the object (owned by the stevedore) would be freed while the stevedore_priv (also owned by the stevedore) stayed intact. Long story short, this only accidentally worked because malloc has no relationship between stevedore_priv and the object.

I think a cleaner solution would be to introduce a ObjBocReset() function that is called from where core decides to reuse the boc.

This is where I coming from. I wrote this patch with the proposed Stevedore API function ObjBocRelease(), but found it too involved. I would be fine if there was consensus to extend the API in this way, but clarifying the two use cases of ObjFreeObj() (with and without oc->boc set) seemed to achieve the same purpose at lower cost.

*1) I am referring here to the fact that many details of the stevedore API are not documented explicitly, but rather by example of the reference implementation in Varnish-Cache.

*2) Transient storage is, by default, implicitly selected through the following events: For a typical return from vcl_backend_error{}, the object will be uncacheable, shortlived, or both. Consequently, the code path vbf_stp_error() -> vbf_beresp2obj() -> vbf_allocobj() will unconditionally select stv_transient.

edit: The additional refcnt assertion is only valid for non-private objects

@nigoroll
Copy link
Member Author

Oh, and I forgot: Thank you for your review, @mbgrydeland !

nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Jul 27, 2023
@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from 0308f7d to 2568eb3 Compare July 27, 2023 15:59
nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Aug 1, 2023
@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from 2568eb3 to 8773a28 Compare August 1, 2023 16:35
nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Aug 8, 2023
@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from 8773a28 to f5877ec Compare August 8, 2023 11:22
nigoroll added a commit to nigoroll/varnish-cache that referenced this pull request Aug 21, 2023
@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from f5877ec to 3d90332 Compare August 21, 2023 12:29
@mbgrydeland
Copy link
Contributor

Thanks @nigoroll , your detailed explanation has peaced my mind. I have no issues with this patch.

If oc->boc is not NULL for an ObjFreeObj() call, the stevedore
needs to finish any busy object transaction.

Fixes varnishcache#3953
@nigoroll nigoroll force-pushed the 3953_stevedore_priv_ObjFreeObj branch from 3d90332 to 7ef1f94 Compare August 23, 2023 09:47
@nigoroll nigoroll merged commit 7ef1f94 into varnishcache:master Aug 23, 2023
1 check passed
@nigoroll nigoroll deleted the 3953_stevedore_priv_ObjFreeObj branch August 23, 2023 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stevedore busy object finalization missing for "object swap" in vbf_stp_error
3 participants