-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor worker selection to facilitate adding multiple worker runtimes #6387
Conversation
(and volume creation), rather than the worker.Client. the eventual goal is to convert worker.Client into a client for a single worker, rather than all the workers. the current flow for e.g. a `get` step is: exec.GetStep.Run(...) -> worker.Client.RunGetStep() -> worker.Pool.FindOrChooseWorkerForContainer() <- worker.Worker -> worker.Worker.Fetch(...) -> Garden specific stuff the problem with this is that `worker.Worker` is Garden specific. `worker.Client` is the ideal entrypoint to supporting multiple runtimes (e.g. Ciro's K8s runtime PoC modified `worker.Client` to be k8s specific instead), but since a `worker.Client` is the interface to interact with ALL workers, that meant breaking Garden-based workers. if instead, the flow went like: exec.GetStep.Run(...) -> worker.Pool.FindOrChooseWorkerClientForContainer() <- worker.Client -> worker.Client.RunGetStep() -> worker.Worker.Fetch(...) -> Garden specific stuff ...then it's easy to swap out the `worker.Client` with a k8s (or nomad, or whatever) implementation. this also makes it possible to run multiple things on the same worker explicitly. e.g. for my services RFC, it'd be nice to say: exec.TaskStep.Run(...) -> worker.Pool.FindOrChooseWorkerClientForContainer() <- worker.Client -> worker.Client.RunService() -> worker.Client.RunTaskStep() // same worker Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
since worker.Client will be scoped to a single worker, and finding volumes requires a pool of workers, it streaming artifacts shouldn't be tied to worker.Client. rather than putting it on worker.Pool, create a new component that depends on worker.Pool - this is because the API doesn't need artifact streaming (via the artifact interface, at least), and we'd need to provide a dummy compression implementation like before, which doesn't make much sense - better to decouple the responsibilities Signed-off-by: Aidan Oldershaw <[email protected]>
the desired result is to remove worker.Client's need to have a reference to the worker.Pool (since worker.Pool will be constructing worker.Clients in later commits). while we could pass a parent reference to the worker.Client upon construction, that isn't so obvious - requiring a tight interface in the method signatures is more clear IMO Signed-off-by: Aidan Oldershaw <[email protected]>
use lagerctx instead Signed-off-by: Aidan Oldershaw <[email protected]>
...by removing a method that needn't be on it. FindOrChooseWorker sounds like it'd be used elsewhere, but it's just used internally to choose a random worker for a volume - so, I gave it a more honest name and made it private Signed-off-by: Aidan Oldershaw <[email protected]>
...rather than in the worker.Client. there was some awkwardness in the ContainerSpec. task/put inputs and (at times) images would be passed in as runtime.Artifacts for the sole purpose of converting them to Sources (which are also stored on the ContainerSpec). there was a TODO in the worker.Client to this effect: // don't modify spec inside here, Specs don't change after you write them I considered just adding this responsibility to the worker.Pool (since it already knows where to find volumes), but that feels like giving too much responsibility to the Pool (and would require adding back the dummy `compression` to the API's worker.Pool) Signed-off-by: Aidan Oldershaw <[email protected]>
rather than in the worker.Client. the worker.Pool now gives a worker.Client from the SelectWorker method, which makes it possible to support different runtimes in parallel in the future (though, the worker.Provider will need to handle this and return the appropriate worker.Client instead of a worker.Worker). selecting a worker for task steps has been moved to the TaskDelegate since there's some funky logic around limit-active-tasks. I also noticed/fixed a bug in this logic wherein the limit-active-tasks DB lock would not be released under some failure cases. side note, but this code really feels like DI/mock hell - it'd be nice to clean this up/reduce the indirection, but I'll leave that for another PR. Signed-off-by: Aidan Oldershaw <[email protected]>
da0120e
to
a9cc764
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so excited about this! Thanks for all the extra work you did :)
There's a lot to go through so I hope I didn't miss anything. The Artifact Streamer and Sourcer seem like good extractions for the volume fetching issue. I did notice they didn't have any tests though. Think you could add some for them?
Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
@taylorsilva sorry, took me a while to find the time to address your comments! I've added some tests for I kinda want to replace the test suites with slightly higher level tests and remove the convoluted web of dependencies within the package (since many of these interfaces aren't really interfaces - we just use interfaces so we can make testing slightly easier). Really, the garden worker implementation (i.e. most of ....but anyway, that's for another PR, this one's already big enough! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit about the one test's structure, but I'm gonna approve this anyways. Up to you if you wanna change it or not. Thanks for adding the test coverage :)
What does this PR accomplish?
Bug Fix | Feature | Documentation | Refactor
Make it easier to support multiple runtimes in the future by removing the responsibility of selecing a worker from the
worker.Client
. This also makes it possible to schedule multiple containers on the same worker, which would be useful for Services.Changes proposed by this PR:
As was shown in #5223, the most natural way to support new runtimes is by constructing a new implementation of
worker.Client
(defining how toRunTaskStep
,RunGetStep
, etc.), rather than trying to find a more fine-grained common interface between worker runtimes up-front.It's worth noting that there exists a
worker.Worker
interface already which seems to address some concerns of multiple runtimes - however, to me,worker.Worker
doesn't really have a clear identity as aninterface
, as many of the interface's methods are just delegation to other things (DB,VolumeClient
), and aren't runtime specific (thoughVolumeClient
possibly will be). There are a few methods (FindContainerByHandle
,FindOrCreateContainer
,Fetch
?) that are runtime specific and should probably be moved toworker.Client
eventually, but that's out of scope for this PR.The prior responsibilities of
worker.Client
included selecting aworker.Worker
from theworker.Pool
, and then running some (Garden specific) logic, including calling some methods on theworker.Worker
to create containers. e.g. the flow for aget
step is:Now,
worker.Client
just represents a single worker, andworker.Pool
givesworker.Clients
. Each step first selects aworker.Client
via the pool, and then runs the appropriate step. The new flow is:This can also enable running services for
task
steps, e.g.Additionally, this adds an
ArtifactSourcer
interface for convertingruntime.Artifact
s toworker.ArtifactSource
s. It doesn't have a super strong identity IMO, but it means we don't need to hold duplicated info on theworker.ContainerSpec
and hackily modify the spec when running steps.Notes to reviewer:
Contributor Checklist
Reviewer Checklist
BOSH and
Helm packaging; otherwise, ignored for
the integration
tests
(for example, if they are Garden configs that are not displayed in the
--help
text).