From 8d66d050b643dc24100672b744046234edccf672 Mon Sep 17 00:00:00 2001 From: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Date: Sat, 7 Oct 2023 07:41:34 -0500 Subject: [PATCH] Updates to deployment guides (#3994) * Updates to deployment guides Signed-off-by: davidmirror-ops * Update multicluster docs round 2 Signed-off-by: davidmirror-ops * Updates instructions from last run Signed-off-by: davidmirror-ops * Add instructions to add clusters Signed-off-by: davidmirror-ops * Fix typos Signed-off-by: davidmirror-ops * Fix JSON indentation in example Signed-off-by: davidmirror-ops * Fix JSON indentation in example 2nd try Signed-off-by: davidmirror-ops * Fix JSON missing blank line Signed-off-by: davidmirror-ops * Fix JSON missing blank line 3rd try Signed-off-by: davidmirror-ops * Fix JSON missing blank line 4th try Signed-off-by: davidmirror-ops * Fix JSON syntax Signed-off-by: davidmirror-ops * Fix JSON syntax 6th try Signed-off-by: davidmirror-ops * Remove JSON block Signed-off-by: davidmirror-ops * Fix error in line 57 Signed-off-by: davidmirror-ops * Fix spelling Signed-off-by: davidmirror-ops * Apply feedback from review Signed-off-by: davidmirror-ops * Fix hyperlink Signed-off-by: davidmirror-ops * Fix blank space Signed-off-by: davidmirror-ops * Incorporate review Signed-off-by: davidmirror-ops * Incorporate 2nd round of review Signed-off-by: davidmirror-ops * Instructions using 2 IAM Roles Signed-off-by: davidmirror-ops * Incorporate 3rd round of feedback Signed-off-by: davidmirror-ops * Add instructions to enable controlplane wf execution Signed-off-by: davidmirror-ops * Incorporate 4th round of reviews Signed-off-by: davidmirror-ops --------- Signed-off-by: davidmirror-ops --- charts/flyte-binary/eks-production.yaml | 2 +- .../deployment/cloud_production.rst | 52 +- rsts/deployment/deployment/cloud_simple.rst | 8 + rsts/deployment/deployment/index.rst | 35 +- rsts/deployment/deployment/multicluster.rst | 660 ++++++++++++++---- rsts/deployment/deployment/sandbox.rst | 16 +- 6 files changed, 574 insertions(+), 199 deletions(-) diff --git a/charts/flyte-binary/eks-production.yaml b/charts/flyte-binary/eks-production.yaml index 727b9b10fa..2db827b804 100644 --- a/charts/flyte-binary/eks-production.yaml +++ b/charts/flyte-binary/eks-production.yaml @@ -132,7 +132,7 @@ ingress: nginx.ingress.kubernetes.io/app-root: /console grpcAnnotations: nginx.ingress.kubernetes.io/backend-protocol: GRPC - host: development.uniondemo.run + host: # change for the URL you'll use to connect to Flyte rbac: extraRules: - apiGroups: diff --git a/rsts/deployment/deployment/cloud_production.rst b/rsts/deployment/deployment/cloud_production.rst index 90997556c9..1736f1eb4c 100644 --- a/rsts/deployment/deployment/cloud_production.rst +++ b/rsts/deployment/deployment/cloud_production.rst @@ -23,23 +23,48 @@ guide already contains the ingress rules, but they are not enabled by default. To turn on ingress, update your ``values.yaml`` file to include the following block. -.. tabbed:: AWS - ``flyte-binary`` +.. tabs:: + + .. group-tab:: ``flyte-binary`` on EKS using NGINX - .. literalinclude:: ../../../charts/flyte-binary/eks-production.yaml - :caption: charts/flyte-binary/eks-production.yaml - :language: yaml - :lines: 123-131 + .. literalinclude:: ../../../charts/flyte-binary/eks-production.yaml + :caption: charts/flyte-binary/eks-production.yaml + :language: yaml + :lines: 127-135 + + .. group-tab:: ``flyte-binary``/ on EKS using ALB + + .. code-block:: yaml + + ingress: + create: true + commonAnnotations: + alb.ingress.kubernetes.io/certificate-arn: '' + alb.ingress.kubernetes.io/group.name: flyte + alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]' + alb.ingress.kubernetes.io/scheme: internet-facing + alb.ingress.kubernetes.io/ssl-redirect: '443' + alb.ingress.kubernetes.io/target-type: ip + kubernetes.io/ingress.class: alb + httpAnnotations: + alb.ingress.kubernetes.io/actions.app-root: '{"Type": "redirect", "RedirectConfig": {"Path": "/console", "StatusCode": "HTTP_302"}}' + grpcAnnotations: + alb.ingress.kubernetes.io/backend-protocol-version: GRPC + host: #use a DNS CNAME pointing to your ALB + + .. group-tab:: ``flyte-core`` on GCP using NGINX + + .. literalinclude:: ../../../charts/flyte-core/values-gcp.yaml + :caption: charts/flyte-core/values-gcp.yaml + :language: yaml + :lines: 156-164 -.. note:: - - This currently assumes that you have nginx ingress. We'll be updating these - in the near future to use the ALB ingress controller instead. *************** Authentication *************** -Authentication comes with Flyte in the form of OAuth 2. Please see the +Authentication comes with Flyte in the form of OAuth 2.0. Please see the `authentication guide `__ for instructions. .. note:: @@ -60,10 +85,3 @@ compatibility being maintained, for the most part. If you're using the :ref:`multi-cluster ` deployment model for Flyte, components should be upgraded together. - -.. note:: - - Expect to see minor version releases roughly 4-6 times a year - we aim to - release monthly, or whenever there is a large enough set of features to - warrant a release. Expect to see patch releases at more regular intervals, - especially for flytekit, the Python SDK. diff --git a/rsts/deployment/deployment/cloud_simple.rst b/rsts/deployment/deployment/cloud_simple.rst index b675df00b9..b280546708 100644 --- a/rsts/deployment/deployment/cloud_simple.rst +++ b/rsts/deployment/deployment/cloud_simple.rst @@ -115,6 +115,14 @@ hello world example: cd flytesnacks/cookbook pyflyte run --remote core/flyte_basics/hello_world.py my_wf +*********************************** +Flyte in on-premises infrastructure +*********************************** + +Sometimes, it's also helpful to be able to set up a Flyte environment in an on-premises Kubernetes environment or even on a laptop for testing and development purposes. +Check out `this community-maintained tutorial `__ to learn how to setup the required dependencies and deploy the `flyte-binary` chart to a local Kubernetes cluster. + + ************* What's Next? ************* diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index ac0765412a..eb06d0a6c0 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -49,29 +49,6 @@ deployment comes with a containerized `Minio `__, which offers - **GCP**: `GCS `__ - **Azure**: `Azure Blob Storage `__ - -Cluster Configuration -===================== - -Flyte configures K8s clusters to work with it. For example, as your Flyte userbase evolves, adding new projects is as -simple as registering them through the command line: - -.. prompt:: bash $ - - flytectl create project \ - --id my-flyte-project \ - --name "My Flyte Project" \ - --description "My first project onboarding onto Flyte" - -Once you invoke this command, this project should immediately show up in the Flyte console after refreshing. - -Flyte runs at a configurable cadence that ensures that all Kubernetes resources necessary for the new project are -created and new workflows can successfully be registered and executed within it. - -.. note:: - - For more information, see :std:ref:`flytectl `. - ************************ Flyte Deployment Paths ************************ @@ -108,7 +85,10 @@ There are three different paths for deploying a Flyte cluster: This option is appropriate if all your compute can `fit on one EKS cluster `__ . As of this writing, a single Flyte cluster can handle more than 13,000 nodes. - Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required. + Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be scaled out as well by using ``sharding`` if scale demands require it. + See `Automatic scale-out `__ to learn more about the sharding mechanism. + + Helm ==== @@ -156,10 +136,13 @@ Deployment Tips and Tricks Due to the many choices and constraints that you may face in your organization, the specific steps for deploying Flyte can vary significantly. For example, which cloud platform to use is typically a big fork in the road for many, and there -are many choices to make in terms of ingresses, auth providers, and versions of different dependent libraries that +are many choices to make in terms of Ingress controllers, auth providers, and versions of different dependent libraries that may interact with other parts of your stack. -In addition to searching and posting on the `Flyte Slack community `__, +Considering the above, we recommend checking out the `"Flyte The Hard Way" `__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and +deploying Flyte. + +In addition to searching and posting on the `#flyte-deployment Slack channel `__, we have a `Github Discussion `__ section dedicated to deploying Flyte. Feel free to submit any hints you've found helpful as a discussion, ask questions, or simply document what worked or what didn't work for you. diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 69c34989ae..e8c6b84f13 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -1,18 +1,18 @@ .. _deployment-deployment-multicluster: -################################## -Multiple K8s Cluster Deployment -################################## +###################################### +Multiple Kubernetes Cluster Deployment +###################################### .. tags:: Kubernetes, Infrastructure, Advanced .. note:: - The multicluster deployment described in this doc assumes you have deployed - the ``flyte`` Helm chart, which runs the individual Flyte services separately. + The multicluster deployment described in this section, assumes you have deployed + the ``flyte-core`` Helm chart, which runs the individual Flyte components separately. This is needed because in a multicluster setup, the execution engine is - deployed to multiple K8s clusters. This will not work with the ``flyte-binary`` - Helm chart, since that chart deploys all Flyte service as one single binary. + deployed to multiple K8s clusters; it won't work with the ``flyte-binary`` + Helm chart, since it deploys all Flyte services as one single binary. Scaling Beyond Kubernetes ------------------------- @@ -24,30 +24,162 @@ Scaling Beyond Kubernetes execution. The data plane fulfills these workflows by launching pods in Kubernetes. -At very large companies, total compute needs could exceed the limits of a single -Kubernetes cluster. -To address this, you can deploy the data plane to multiple Kubernetes clusters. -The control plane (FlyteAdmin) can be configured to load-balance workflows across -these individual data planes, protecting you from failure in a single Kubernetes -cluster increasing scalability. +.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/flyte-multicluster-arch-v2.png -To achieve this, first, you have to create additional Kubernetes clusters. -For now, let's assume you have three Kubernetes clusters and that you can access -them all with ``kubectl``. +The case for multiple Kubernetes clusters may arise due to security constraints, +cost effectiveness or a need to scale out computing resources. + +To address this, you can deploy Flyte's data plane to multiple Kubernetes clusters. +The control plane (FlyteAdmin) can be configured to submit workflows to +these individual data planes. Additionally, Flyte provides the mechanisms for +administrators to retain control on the workflow placement logic while enabling +users to reap the benefits using simple abstractions like ``projects`` and ``domains``. + +Prerequisites +************* + +To make sure that your multicluster deployment is able to scale and process +requests successfully, the following environment-specific requirements should be met: + +.. tabbed:: AWS + + 1. An IAM Policy that defines the permissions needed for Flyte. A minimum set of permissions include: + + .. code-block:: json + + "Action": [ + "s3:DeleteObject*", + "s3:GetObject*", + "s3:ListBucket", + "s3:PutObject*" + ], + "Resource": [ + "arn:aws:s3:::*", + "arn:aws:s3:::*/*" + ], + + + 2. Two IAM Roles configured: one for the control plane components, and another for the data plane where the worker Pods and ``flytepropeller`` run. + + .. note:: + + Using the guidance from this document, make sure to follow your organization's policies to configure IAM resources. -Let's call these clusters ``cluster1``, ``cluster2``, and ``cluster3``. + 3. An OIDC Provider associated with each of your EKS clusters. You can use the following command to create and connect the Provider: -Next, deploy *only* the data planes to these clusters. To do this, remove the -data plane components from the ``flyte`` overlay, and create a new overlay -containing *only* the data plane resources. + .. prompt:: bash + + eksctl utils associate-iam-oidc-provider --cluster --approve + + 4. An IAM Trust Relationship that associates each EKS cluster type (control plane or data plane) with the Service Account(s) and namespaces + where the different elements of the system will run. + + Follow the steps in this section to complete the requirements indicated above: + + **Control plane role** + + 1. Use the following command to simplify the process of both creating a role and configuring an initial Trust Relationship: + + .. prompt:: bash + + eksctl create iamserviceaccount --cluster= --name=flyteadmin --role-only --role-name=flyte-controlplane-role --attach-policy-arn --approve --region --namespace flyte + + 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created + 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** + 4. Add the ``datacatalog`` Service Account to the ``sub`` section + + .. note:: + + When caching is enabled, the ``datacatalog`` service store hashes of workflow inputs alongside with outputs on blob storage. Learn more `here `__. + + Example configuration: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flyteadmin", + "system:serviceaccount:flyte:datacatalog" + ] + } + } + } + ] + } + + **Data plane role** + + 1. Create the role and Trust Relationship: + + .. prompt:: bash + + eksctl create iamserviceaccount --cluster= --name=flytepropeller --role-only --role-name=flyte-dataplane-role --attach-policy-arn --approve --region --namespace flyte + + 2. Edit the **Trust Relationship** of the data plane role + + .. note:: + + By default, every Pod created for Task execution, uses the ``default`` Service Account on their respective namespace. In your cluster, you'll have as many + namespaces as ``project`` and ``domain`` combinations you may have. Hence, it might be useful to use a ``StringLike`` condition and to use a wildcard for the namespace name in the Trust Policy + + 3. Add the ``default`` Service Account: + + + Example configuration for one data plane cluster: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/.:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/.:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] + } + } + } + + .. note:: + + To further refine the Trust Relationship, consider using a ``StringEquals`` condition and adding the ``default`` Service Account only for the ``project``-``domain`` + namespaces where Flyte tasks will run, instead of using a wildcard. + +.. _dataplane-deployment: Data Plane Deployment ********************* -First, add the Flyteorg Helm repo +This guide assumes that you have two Kubernetes clusters and that you can access +them all with ``kubectl``. + +Let's call these clusters ``dataplane1`` and ``dataplane2``. In this section, you'll prepare +the first cluster only. -.. code-block:: +1. Add the ``flyteorg`` Helm repo: + +.. prompt:: bash helm repo add flyteorg https://flyteorg.github.io/flyte helm repo update @@ -55,73 +187,124 @@ First, add the Flyteorg Helm repo helm fetch --untar --untardir . flyteorg/flyte-core cd flyte-core -Install Flyte data plane Helm chart +2. Open the ``values-dataplane.yaml`` file and add the following contents: + + .. code-block:: yaml + + configmap: + admin: + admin: + endpoint: :443 #indicate the URL you're using to connect to Flyte + insecure: false #enables secure communication over SSL. Requires a signed certificate + catalog: + catalog-cache: + endpoint: :443 + insecure: false + +.. note:: + + This step is needed so the ``flytepropeller`` instance in the data plane cluster is able to send notifications + back to the ``flyteadmin`` service in the control plane. The ``catalog`` service runs in the control plane and is used when caching is enabled. + +3. Install Flyte data plane Helm chart: + +.. note:: + + Use the same ``values-eks.yaml`` or ``values-gcp.yaml`` file you used to deploy the control plane. .. tabbed:: AWS .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ - -f values-eks.yaml \ - -f values-dataplane.yaml \ - --create-namespace flyte --install + helm install flyte-core-data flyteorg/flyte-core -n flyte \ + --values values-eks.yaml --values values-dataplane.yaml \ + --create-namespace .. tabbed:: GCP .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ - -f values-gcp.yaml \ - -f values-dataplane.yaml \ - --create-namespace flyte --install + helm install flyte-core-data -n flyte flyteorg/flyte-core \ + --values values-gcp.yaml \ + --values values-dataplane.yaml \ + --create-namespace flyte +.. _control-plane-deployment: -User and Control Plane Deployment +Control Plane configuration ********************************* -Some Flyte deployments may choose to run the control plane separate from the data -plane. FlyteAdmin is designed to create Kubernetes resources in one or more -Flyte data plane clusters. For the admin to access remote clusters, it needs -credentials to each cluster. +For ``flyteadmin`` to access and create Kubernetes resources in one or more +Flyte data plane clusters, it needs credentials to each cluster. +Flyte makes use of Kubernetes Service Accounts to enable every control plane cluster to perform +authenticated requests to the data plane Kubernetes API Server. +The default behaviour is that the Helm chart creates a `ServiceAccount `_ +in each data plane cluster. +In order to verify requests, the Kubernetes API Server expects a `signed bearer token `__ +attached to the Service Account. As of Kubernetes 1.24 and above, the bearer token has to be generated manually. -In Kubernetes, scoped service credentials are created by configuring a "Role" -resource in a Kubernetes cluster. When you attach the role to a "ServiceAccount", -Kubernetes generates a bearer token that permits access. Hence, create a -FlyteAdmin `ServiceAccount `_ -in each data plane cluster to generate these tokens. -.. warning:: - - **Never delete a ServiceAccount 🛑** +1. Use the following manifest to create a long-lived bearer token for the ``flyteadmin`` Service Account in your data plane cluster: - When you first create the FlyteAdmin ``ServiceAccount`` in a new cluster, a - bearer token is generated and will continue to allow access unless the - "ServiceAccount" is deleted. + .. prompt:: bash + + kubectl apply -f - <`__ is used). +.. code-block:: yaml + :caption: secrets.yaml -The credentials have two parts ("ca cert" and "bearer token"). Find the generated secret via: + apiVersion: v1 + kind: Secret + metadata: + name: cluster-credentials + namespace: flyte + type: Opaque + data: + +.. note:: + The credentials have two parts (``CA cert`` and ``bearer token``). + +3. Copy the bearer token of the first data plane cluster's secret to your clipboard using the following command: .. prompt:: bash $ - kubectl get secrets -n flyte | grep flyteadmin-token + kubectl get secret -n flyte dataplane1-token \ + -o jsonpath='{.data.token}' | pbcopy -Once you have the name of the secret, you can copy the ``ca cert`` to your clipboard using the following command: +4. Go to ``secrets.yaml`` and add a new entry under ``stringData`` with the data plane cluster token: -.. prompt:: bash $ +.. code-block:: yaml + :caption: secrets.yaml - kubectl get secret -n flyte {secret-name} \ - -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy + apiVersion: v1 + kind: Secret + metadata: + name: cluster-credentials + namespace: flyte + type: Opaque + data: + dataplane_1_token: -You can copy the bearer token to your clipboard using the following command: +5. Obtain the corresponding certificate: .. prompt:: bash $ - kubectl get secret -n flyte {secret-name} \ - -o jsonpath='{.data.token}' | base64 -D | pbcopy + kubectl get secret -n flyte dataplane1-token \ + -o jsonpath='{.data.ca\.crt}' | pbcopy -Now these credentials need to be included in the control plane. Create a new -file named ``secrets.yaml`` that looks like: +6. Add another entry on your ``secrets.yaml`` file for the certificate: .. code-block:: yaml :caption: secrets.yaml @@ -133,20 +316,16 @@ file named ``secrets.yaml`` that looks like: namespace: flyte type: Opaque data: - cluster_1_token: {{ cluster 1 token here }} - cluster_1_cacert: {{ cluster 1 cacert here }} - cluster_2_token: {{ cluster 2 token here }} - cluster_2_cacert: {{ cluster 2 cacert here }} - cluster_3_token: {{ cluster 3 token here }} - cluster_3_cacert: {{ cluster 3 cacert here }} + dataplane_1_token: + dataplane_1_cacert: -Create cluster credentials secret in the control plane cluster. +7. Connect to your control plane cluster and create the ``cluster-credentials`` secret: .. prompt:: bash $ kubectl apply -f secrets.yaml -Create a file named ``values-override.yaml`` and add the following config to it: +8. Create a file named ``values-override.yaml`` and add the following config to it: .. code-block:: yaml :caption: values-override.yaml @@ -159,129 +338,324 @@ Create a file named ``values-override.yaml`` and add the following config to it: additionalVolumeMounts: - name: cluster-credentials mountPath: /var/run/credentials + initContainerClusterSyncAdditionalVolumeMounts: + - name: cluster-credentials + mountPath: /etc/credentials configmap: clusters: labelClusterMap: - team1: - - id: cluster_1 + label1: + - id: dataplane_1 weight: 1 - team2: - - id: cluster_2 - weight: 0.5 - - id: cluster_3 - weight: 0.5 clusterConfigs: - - name: "cluster_1" - endpoint: {{ your-cluster-1-kubeapi-endpoint.com }} + - name: "dataplane_1" + endpoint: https://:443 enabled: true auth: type: "file_path" - tokenPath: "/var/run/credentials/cluster_1_token" - certPath: "/var/run/credentials/cluster_1_cacert" - - name: "cluster_2" - endpoint: {{ your-cluster-2-kubeapi-endpoint.com }} - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/cluster_2_token" - certPath: "/var/run/credentials/cluster_2_cacert" - - name: "cluster_3" - endpoint: {{ your-cluster-3-kubeapi-endpoint.com }} - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/cluster_3_token" - certPath: "/var/run/credentials/cluster_3_cacert" + tokenPath: "/var/run/credentials/dataplane_1_token" + certPath: "/var/run/credentials/dataplane_1_cacert" +.. note:: + + Typically, you can obtain your Kubernetes API endpoint URL using the following command: -The ``configmap`` is used to schedule pods in different Kubernetes clusters, and -hence, acts like a "load balancer". ``team1`` and ``team2`` are the labels, where -each label can schedule a pod on multiple clusters depending on the weight. + .. prompt:: bash $ + + kubectl cluster-info -.. code-block:: yaml +In this configuration, ``label1`` and ``label2`` are just labels that we will use later in the process +to configure mappings that enable workflow executions matching those labels, to be scheduled +on one or multiple clusters depending on the weight (e.g. ``label1`` on ``dataplane_1``). The ``weight`` is the +priority of a specific cluster, relative to the other clusters under the ``labelClusterMap`` entry. The total sum of weights under a particular +label has to be 1. - configmap: - labelClusterMap: - team1: - - id: cluster_1 - weight: 1 - team2: - - id: cluster_2 - weight: 0.5 - - id: cluster_3 - weight: 0.5 - -Finally, install the Flyte control plane Helm chart. +9. Add the ``flyte-dataplane-role`` IAM Role as the ``defaultIamRole`` in your ``values-eks.yaml`` file. `See section here `__ + +10. Update the control plane Helm release: + +.. note:: + This step will disable ``flytepropeller`` in the control plane cluster, leaving no possibility of running workflows there. If you require + the control plane to run workflows, edit the ``values-controlplane.yaml`` file and set ``flytepropeller.enabled`` to ``true``. Then, perform the ``helm upgrade`` operation and complete the steps in :ref:`this section ` to configure it + as a dataplane cluster. .. tabbed:: AWS .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ - -f values-aws.yaml \ - -f values-controlplane.yaml \ - -f values-override.yaml \ - --create-namespace flyte --install + helm upgrade flyte-core flyteorg/flyte-core \ + --values values-eks-controlplane.yaml --values values-override.yaml \ + --values values-eks.yaml -n flyte .. tabbed:: GCP .. code-block:: helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ - -f values-gcp.yaml \ - -f values-controlplane.yaml \ - -f values-override.yaml \ - --create-namespace flyte --install + --values values-gcp.yaml \ + --values values-controlplane.yaml \ + --values values-override.yaml + +11. Verify that all Pods in the ``flyte`` namespace are ``Running``: + +Example output: + +.. prompt:: bash $ + + kubectl get pods -n flyte + NAME READY STATUS RESTARTS AGE + datacatalog-86f6b9bf64-bp2cj 1/1 Running 0 23h + datacatalog-86f6b9bf64-fjzcp 1/1 Running 0 23h + flyteadmin-84f666b6f5-7g65j 1/1 Running 0 23h + flyteadmin-84f666b6f5-sqfwv 1/1 Running 0 23h + flyteconsole-cdcb48b56-5qzlb 1/1 Running 0 23h + flyteconsole-cdcb48b56-zj75l 1/1 Running 0 23h + flytescheduler-947ccbd6-r8kg5 1/1 Running 0 23h + syncresources-6d8794bbcb-754wn 1/1 Running 0 23h + Configure Execution Cluster Labels ********************************** -The next step is to configure project-domain or workflow to schedule on a specific -Kubernetes cluster, for which the correct label needs to be added. +The next step is to configure project-domain or workflow labels to schedule on a specific +Kubernetes cluster. .. tabbed:: Configure Project & Domain - Get execution cluster label of the project and domain + 1. Create an ``ecl.yaml`` file with the following contents: + + .. code-block:: yaml - .. prompt:: bash $ + domain: development + project: project1 + value: label1 - flytectl get execution-cluster-label \ - -p flytesnacks -d development --attrFile ecl.yaml + .. note:: - Update the label in `ecl.yaml` + Change ``domain`` and ``project`` according to your environment. The ``value`` has + to match with the entry under ``labelClusterMap`` in the ``values-override.yaml`` file. + + 2. Repeat step 1 for every project-domain mapping you need to configure, creating a YAML file for each one. - .. code-block:: yaml + 3. Update the execution cluster label of the project and domain: - domain: development - project: flytesnacks - value: team1 + .. prompt:: bash $ -.. tabbed:: Configure Specific Workflow + flytectl update execution-cluster-label --attrFile ecl.yaml - Get execution cluster label of the project and domain + Example output: - .. prompt:: bash $ + .. prompt:: bash $ + + Updated attributes from team1 project and domain development + + + 4. Execute a workflow indicating project and domain: - flytectl get execution-cluster-label \ - -p flytesnacks -d development \ - core.control_flow.run_merge_sort.merge_sort \ - --attrFile ecl.yaml + .. prompt:: bash $ - Update the label in `ecl.yaml` + pyflyte run --remote --project team1 --domain development example.py training_workflow \  ✔ ╱ docs-development-env  + --hyperparameters '{"C": 0.1}' +.. tabbed:: Configure a Specific Workflow mapping + + 1. Create a ``workflow-ecl.yaml`` file with the following example contents: + .. code-block:: yaml domain: development - project: flytesnacks - workflow: core.control_flow.run_merge_sort.merge_sort - value: team1 + project: project1 + workflow: example.training_workflow + value: project1 -Lastly, update the execution cluster label. + 2. Update execution cluster label of the project and domain -.. prompt:: bash $ + .. prompt:: bash $ + + flytectl update execution-cluster-label \ + -p project1 -d development \ + example.training_workflow \ + --attrFile workflow-ecl.yaml - flytectl update execution-cluster-label --attrFile ecl.yaml + 3. Execute a workflow indicating project and domain: + + .. prompt:: bash $ + + pyflyte run --remote --project team1 --domain development example.py training_workflow \  ✔ ╱ docs-development-env  + --hyperparameters '{"C": 0.1}' Congratulations 🎉! With this, the execution of workflows belonging to a specific -project-domain or a single workflow will be scheduled on the target label +project-domain or a single specific workflow will be scheduled on the target label cluster. + +Day 2 Operations +---------------- + +Add another Kubernetes cluster +****************************** + +Find in this section the necessary steps to scale out your deployment by adding one Kubernetes cluster. +The process can be repeated for additional clusters. + +.. tabbed:: AWS + + + + 1. Create the new cluster: + + .. prompt:: bash $ + + eksctl create cluster --name flyte-dataplane-2 --region --version 1.25 --vpc-private-subnets , --without-nodegroup + + .. note:: + + This is only one of multiple ways to provision an EKS cluster. Follow your organization's policies to complete this step. + + + 2. Add a nodegroup to the cluster. Typically ``t3.xlarge`` instances provide enough resources to get started. Follow your organization's policies in this regard. + + 4. Create an OIDC Provider for the new cluster: + + .. prompt:: bash $ + + eksctl utils associate-iam-oidc-provider --cluster flyte-dataplane-2 --region --approve + + 5. Take note of the OIDC Provider ID: + + .. prompt:: bash $ + + aws eks describe-cluster --region --name flyte-dataplane-2 --query "cluster.identity.oidc.issuer" --output text + + 6. Go to the **IAM** section in the **AWS Management Console** and edit the **Trust Policy** of the ``flyte-dataplane-role`` + 7. Add a new ``Principal`` with the new cluster's OIDC Provider ID. Include the ``Action`` and ``Conditions`` section: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] + } + } + }, + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] + } + } + } + ] + } + + + + 7. Install the data plane Helm chart following the steps in the **Data plane deployment** section. See :ref:`section `. + 8. Follow steps 1-3 in the **control plane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file + + Example: + + .. code-block:: yaml + + apiVersion: v1 + kind: Secret + metadata: + name: cluster-credentials + namespace: flyte + type: Opaque + data: + dataplane_1_token: + dataplane_1_cacert: + dataplane_2_token: + dataplane_2_cacert: + + 9. Connect to the control plane cluster and update the ``cluster-credentials`` Secret: + + .. prompt:: bash $ + + kubect apply -f secrets.yaml + + 10. Go to your ``values-override.yaml`` file and add the information of the new cluster. Adding a new label is not entirely needed. + Nevertheless, in the following example a new label is created to illustrate Flyte's capability to schedule workloads on different clusters + in response to user-defined mappings of ``project``, ``domain`` and ``label``:abbr: + + .. code-block:: yaml + + ... #all the above content remains the same + configmap: + clusters: + labelClusterMap: + label1: + - id: dataplane_1 + weight: 1 + label2: + - id: dataplane_2 + weight: 1 + clusterConfigs: + - name: "dataplane_1" + endpoint: https://.com:443 + enabled: true + auth: + type: "file_path" + tokenPath: "/var/run/credentials/dataplane_1_token" + certPath: "/var/run/credentials/dataplane_1_cacert" + - name: "dataplane_2" + endpoint: https://:443 + enabled: true + auth: + type: "file_path" + tokenPath: "/var/run/credentials/dataplane_2_token" + certPath: "/var/run/credentials/dataplane_2_cacert" + + 11. Update the Helm release in the control plane cluster: + + .. prompt:: bash $ + + helm upgrade flyte-core-control flyteorg/flyte-core -n flyte --values values-controlplane.yaml --values values-eks.yaml --values values-override.yaml + + 12. Create a new execution cluster labels file with the following sample content: + + .. code-block:: yaml + + domain: production + project: team1 + value: label2 + + 13. Update the cluster execution labels for the project: + + .. prompt:: bash $ + + flytectl update execution-cluster-label --attrFile ecl-production.yaml + + 14. Finally, submit a workflow execution that matches the label of the new cluster: + + .. prompt:: bash $ + + pyflyte run --remote --project team1 --domain production example.py training_workflow \ + --hyperparameters '{"C": 0.1}' + + 15. A successful execution should be visible on the UI, confirming it ran in the new cluster: + + .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/multicluster-execution.png \ No newline at end of file diff --git a/rsts/deployment/deployment/sandbox.rst b/rsts/deployment/deployment/sandbox.rst index 073125e5cc..5c40eea5eb 100644 --- a/rsts/deployment/deployment/sandbox.rst +++ b/rsts/deployment/deployment/sandbox.rst @@ -6,11 +6,11 @@ Sandbox Deployment .. tags:: Kubernetes, Infrastructure, Basic -A sandbox deployment of Flyte is bundles together portable versions of Flyte's +A sandbox deployment of Flyte bundles together portable versions of Flyte's dependencies such as a relational database and durable object store. For the blob store requirements, Flyte Sandbox uses `Minio `__, -which offers an S3 compatible interface, and for Postgres, we use the stock +which offers an S3 compatible interface, and for Postgres, it uses the stock Postgres Docker image and Helm chart. .. important:: @@ -41,7 +41,7 @@ Requirements - Install `docker `__ or any other OCI-compatible tool, like Podman or LXD. - Install `flytectl `__, the official CLI for Flyte. -While Flyte can run any OCI-compatible task image, using the default Kubernetes container runtime (cri-o), the Flyte +While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (``containerd``), the Flyte core maintainers typically use Docker. Note that the ``flytectl demo`` command does rely on Docker APIs, but as this demo environment is just one self-contained image, you can also run the image directly using another run time. @@ -79,12 +79,4 @@ who wish to dig deeper into the storage layer. 📂 The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console Now that you have the sandbox cluster running, you can now go to the :ref:`User Guide ` or -:ref:`Tutorials ` to run tasks and workflows written in ``flytekit``, the Python SDK for Flyte. - -************************** -Flyte Sandbox on the Cloud -************************** - -Sometimes it's also helpful to be able to install a sandboxed environment on a cloud provider. That is, you have access -to an EKS or GKE cluster, but provisioning a separate database or blob storage bucket is harder because of a lack of -infrastructure support. Instructions for how to do this will be forthcoming. +:ref:`Tutorials ` to run tasks and workflows written in ``flytekit``, the Python SDK for Flyte. \ No newline at end of file