[not for merge] Parsl serializer plugins development #2718

benclifford · 2023-05-19T12:26:23Z

Description

i'm using this as an excuse to explore a serialiser plugin API for external serialisers - that sort of behaviour is half-implemented in parsl already, in so much as new serialiser subclasses can be implemented. but this is not well exposed to users at all - for example, identifiers are forced to be three bytes and not amenable to shared namespaces; the order in which serializers are registered (when defined in multiple files) is dependent on the order in which imports are executed (which is python can be very subtle)

The tests all pass for this.

Memoization needed some changes, as it uses parsl.serializer to serialize values for memo identifiers - which is not a suitable use for proxystore's proxy objects - this PR switches to using one of the usual serializers to do this, which is probably a better thing to do in the presence of pluggable serializers: there is still a pluggable API to put in new id_for_memo implementations...

…ystore

…uired to return consistent results ii) changing serialiser order/configuration between runs would mean memoisation get different values between runs for the same input structure

…d_for_memo.py that is still failing, for reasons i do not understand

…ystore

This mirrors a removal in Globus Compute, which originally contributed this code as a fork of the funcX serializer: PR globus/globus-compute#1153 commit (in funcX) 955e666fe6279571bf6024c239fc789595d8e206

Prior to this PR, the serialization header was parsed twice during deserialization, and in two different ways, leading to an awkward header format: the header must be exactly three bytes long for one (length based) parser, and the third byte must be a \n for the other (new-line based) parser. This PR removes the length based parser, instead allowing an arbitrary length \n-terminated header. Backwards/forwards compatibility: The parsl serialization wire format is not a user exposed interface, as parsl requires the same version of parsl to be installed at all locations in a parsl deployment. This PR does not change any on the wire byte sequences when used with the two existing de*serializers, but opens the opportunity for future implementations to use more than two bytes for identifiers. Performance: I made a basic benchmark of serializing and deserializing a few thousand integers (deliberately using a simple object like `int` so that the object processing would be quite small, allowing changes in header time more chance to show up). My main concern with this PR is that I didn't make things noticeably worse, not to improve performance. After this PR, a serialization->deserialization round trip is about 200ns faster (1165ns before vs 965ns after), so I am happy that this PR does not] damage performance. Future: This is part of ongoing work to introduce user pluggable de*serializers.

…ystore

… benc-prototype-proxystore Conflicts: parsl/serialize/facade.py

…hods' into benc-prototype-proxystore

…-prototype-proxystore

TODO: (broken) make a test that runs using a custom serializer to an htex worker... to check that remote processes can load a custom serializer. prior to this PR, serializer registration happened in order of class definiton, which in the case of a single concretes file, is the order of the serializers inside that file. this does not work well when trying to define other serializers in other files, because the order of import of files can change subtly and because of distant effects. so doing serialiser registration in a subclass hook is not a very nice thing to do here... this PR should move to a model which defines the default de*serializers explicitly, and allows the user to modify that list explicitly too, to add in new de*serializers. limitations: this is part of work needed to allow serializer plugins, but not all of it, as it does not define how a remote worker will discover plugged-in serializers - for that, perhaps, the header should become something importable? instead of a registry-based ID bytes string. contrasting with that: proxystore doesn't need its own deserializer! it can use the existing pickle/picklecode deserialisers, because what it generates is a pickled object... and that's a legitimate use case... which suggests that serializers don't need to be indexed by "serializer" ID at all on the serializing side, but only on the deserializing side... this API also will not easily allow serializer methods to be defined for values returned from workers to the submit side: the expected place of registration, near the start of the user workflow, does not have a corresponding place on the worker side. this PR separates out serialization behaviour (object -> bytestream) from deserialization behaviour: (identifier, bytestream) -> object, because serializer might not (and in this prototype, *cannot*) implement custom deserialization... due to non-registration on the remote side, and instead can only generate dill/pickle targeted bytestreams the motivating use case for this basic serializer plugin API is supporting proxystore, prototyped in #2718 https://labs.globus.org/projects/proxystore.html this PR could also contain an optional proxystore plugin that would need to be activated by a user loading remote serialisers via importlib is a lot of stuff... perhaps i should... just pickle the deserializer and send it (and let pickle deal with that?) - i guess there's two paths here: one is loading remote deserialisers, and the other is focusing on the proxystore use case, which, differently, wants access to the pickle deserializer remotely - and so that drives clearer separation of serializers vs deserialisers... that 2nd case is probably waht i should concentrate on... right now this does module loading in order to work with the class based serializer design inherited from before... but maybe that's not how it should end up... perhaps just serializer callables and deserializer callables? or based around modules? it seems nice to be able to configure the serializer (aka make a class or a partially applied callable) for configuring proxystore... but unclear how that would work for remote end configuration. TODO: as an example plug in, also show 'serpent' as a serialiser/deserialiser pair

TODO: (broken) make a test that runs using a custom serializer to an htex worker... to check that remote processes can load a custom serializer. prior to this PR, serializer registration happened in order of class definiton, which in the case of a single concretes file, is the order of the serializers inside that file. this does not work well when trying to define other serializers in other files, because the order of import of files can change subtly and because of distant effects. so doing serialiser registration in a subclass hook is not a very nice thing to do here... this PR should move to a model which defines the default de*serializers explicitly, and allows the user to modify that list explicitly too, to add in new de*serializers. limitations: this is part of work needed to allow serializer plugins, but not all of it, as it does not define how a remote worker will discover plugged-in serializers - for that, perhaps, the header should become something importable? instead of a registry-based ID bytes string. contrasting with that: proxystore doesn't need its own deserializer! it can use the existing pickle/picklecode deserialisers, because what it generates is a pickled object... and that's a legitimate use case... which suggests that serializers don't need to be indexed by "serializer" ID at all on the serializing side, but only on the deserializing side... this API also will not easily allow serializer methods to be defined for values returned from workers to the submit side: the expected place of registration, near the start of the user workflow, does not have a corresponding place on the worker side. this PR separates out serialization behaviour (object -> bytestream) from deserialization behaviour: (identifier, bytestream) -> object, because serializer might not (and in this prototype, *cannot*) implement custom deserialization... due to non-registration on the remote side, and instead can only generate dill/pickle targeted bytestreams the motivating use case for this basic serializer plugin API is supporting proxystore, prototyped in #2718 https://labs.globus.org/projects/proxystore.html this PR also contain an optional proxystore plugin that would need to be activated by a user, and an attempt at an in-pickle (recursive style) proxystore plugin that is policy-aware and can proxy stuff deeper inside lists... to deal with proxying only interesting objects deep inside graphs - which is actually what happens with regular argument lists when they're non-trivial. loading remote serialisers via importlib is a lot of stuff... perhaps i should... just pickle the deserializer and send it (and let pickle deal with that?) - i guess there's two paths here: one is loading remote deserialisers, and the other is focusing on the proxystore use case, which, differently, wants access to the pickle deserializer remotely - and so that drives clearer separation of serializers vs deserialisers... that 2nd case is probably waht i should concentrate on... right now this does module loading in order to work with the class based serializer design inherited from before... but maybe that's not how it should end up... perhaps just serializer callables and deserializer callables? or based around modules? it seems nice to be able to configure the serializer (aka make a class or a partially applied callable) for configuring proxystore... but unclear how that would work for remote end configuration.

…ystore' into benc-prototype-proxystore Conflicts: parsl/serialize/facade.py

…ystore Conflicts: parsl/dataflow/memoization.py parsl/serialize/facade.py

…ystore Conflicts: Makefile

…ystore

benclifford and others added 25 commits May 19, 2023 11:22

Passes many tests but fails at test_memoize.py

6932e96

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

008bff9

…ystore

fix flake8, mypy

5e84f9f

install proxystore

af42d3a

Can't use pluggable serializers for memoization... i) they aren't req…

f29054e

…uired to return consistent results ii) changing serialiser order/configuration between runs would mean memoisation get different values between runs for the same input structure

fix flake8

0acdbcf

Merge branch 'master' into benc-prototype-proxystore

df037c5

Isolate one test from parsl/tests/test_python_apps/test_memoize_bad_i…

b15565c

…d_for_memo.py that is still failing, for reasons i do not understand

I think proxystore cannot store objects of type None

c075a36

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

8602853

…ystore

Merge branch 'master' into benc-prototype-proxystore

9a09716

Remove unused internal serializer _list_methods method

501b065

This mirrors a removal in Globus Compute, which originally contributed this code as a fork of the funcX serializer: PR globus/globus-compute#1153 commit (in funcX) 955e666fe6279571bf6024c239fc789595d8e206

Add some basic unit tests of serializer components

66920e7

Add __init__

035bf98

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

3e7f2fa

…ystore

Merge remote-tracking branch 'origin/benc-serialization-headers' into…

ec9438b

… benc-prototype-proxystore Conflicts: parsl/serialize/facade.py

Merge remote-tracking branch 'origin/benc-serializer-remove-_list_met…

ac83db9

…hods' into benc-prototype-proxystore

Merge remote-tracking branch 'origin/benc-serializer-tests' into benc…

e40682a

…-prototype-proxystore

Fix up proxystore serializer for recent changes, add a test case

8254d95

Disable mypy for proxystore connectors

26da368

WIP remove enable_caching

73702ba

remove unused headers - port from PR #2784

4900695

fix linting

b37efc7

Merge branch 'master' into benc-prototype-proxystore

bdf0a86

benclifford added 4 commits July 9, 2023 15:57

Merge remote-tracking branch 'refs/remotes/origin/benc-prototype-prox…

35f6a19

…ystore' into benc-prototype-proxystore Conflicts: parsl/serialize/facade.py

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

59bfd02

…ystore Conflicts: parsl/dataflow/memoization.py parsl/serialize/facade.py

Remove old proxystore serializer impl

71c2879

benclifford added 30 commits November 1, 2023 20:54

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

5970def

…ystore Conflicts: Makefile

Merge commit 'origin/master~300' into benc-prototype-proxystore

c8ee71d

Merge commit 'origin/master~200' into benc-prototype-proxystore

7d79e98

Merge commit 'origin/master~190' into benc-prototype-proxystore

efa4c0f

Merge commit 'origin/master~180' into benc-prototype-proxystore

ef61622

Merge commit 'origin/master~170' into benc-prototype-proxystore

58730a0

Merge commit 'origin/master~160' into benc-prototype-proxystore

01a4b1f

Merge commit 'origin/master~150' into benc-prototype-proxystore

15ea8dc

Merge commit 'origin/master~140' into benc-prototype-proxystore

187c97f

Merge commit 'origin/master~130' into benc-prototype-proxystore

af14a02

Merge commit 'origin/master~120' into benc-prototype-proxystore

0817b4d

Merge commit 'origin/master~119' into benc-prototype-proxystore

4c02802

Merge commit 'origin/master~118' into benc-prototype-proxystore

d412c73

Merge commit 'origin/master~117' into benc-prototype-proxystore

17ec3a6

Merge commit 'origin/master~116' into benc-prototype-proxystore

9a4d97c

Merge commit 'origin/master~115' into benc-prototype-proxystore

1644457

Merge commit 'origin/master~114' into benc-prototype-proxystore

8b68005

Merge commit 'origin/master~100' into benc-prototype-proxystore

35ef6ed

Merge commit 'origin/master~50' into benc-prototype-proxystore

35977a0

Merge commit 'origin/master~40' into benc-prototype-proxystore

3aa6ec8

Merge commit 'origin/master~39' into benc-prototype-proxystore

d7058ab

Merge commit 'origin/master~38' into benc-prototype-proxystore

97f1c3b

Merge commit 'origin/master~37' into benc-prototype-proxystore

6d3184e

Merge commit 'origin/master~30' into benc-prototype-proxystore

f21b38b

Merge commit 'origin/master~29' into benc-prototype-proxystore

e54616c

Merge commit 'origin/master~28' into benc-prototype-proxystore

4ee30b9

Merge commit 'origin/master~27' into benc-prototype-proxystore

2fbb883

Merge commit 'origin/master~26' into benc-prototype-proxystore

76152b8

Merge remote-tracking branch 'origin/master' into benc-prototype-prox…

33dd69e

…ystore

fix: flake8 mypy isort

4631448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[not for merge] Parsl serializer plugins development #2718

[not for merge] Parsl serializer plugins development #2718

benclifford commented May 19, 2023 •

edited

Loading

[not for merge] Parsl serializer plugins development #2718

Are you sure you want to change the base?

[not for merge] Parsl serializer plugins development #2718

Conversation

benclifford commented May 19, 2023 • edited Loading

Description

benclifford commented May 19, 2023 •

edited

Loading