Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gen3Enclave: An virtual air-gapped Gen3 deployment #53

Open
bwalsh opened this issue May 7, 2024 · 7 comments
Open

Gen3Enclave: An virtual air-gapped Gen3 deployment #53

bwalsh opened this issue May 7, 2024 · 7 comments
Assignees

Comments

@bwalsh
Copy link
Collaborator

bwalsh commented May 7, 2024

Use Case: Gen3Enclave a Secure Cloud System for Jupyter Notebooks in Kubernetes Environment with Helm Charts

Objective:
To deploy and manage Gen3 with Jupyter Notebooks within a Kubernetes environment using Helm charts, ensuring robust protection of sensitive data and critical operations while leveraging the benefits of container orchestration.

"Gen3Enclave: a set of configuration options to deploy Gen3 workspaces in an isolated manner. Where data cannot be downloaded to external destinations and notebook access to the internet is prohibited except by explicit whitelisting."

Requirements:

  1. Kubernetes Deployment with Helm:

    • The system should leverage existing helm charts
  2. Air Gap Boundary:

    • Helm charts shall include configuration options for deploying data buckets and jupyter notebooks within an air gap boundary, ensuring network isolation and restricted communication with external resources.
    • Network policies shall be defined within the Helm charts to enforce ingress and egress traffic rules, maintaining the integrity of the air gap boundary.
  3. Data Security for Notebooks:

    • Gen3Enclave shall provide encryption mechanisms for data stored and processed within Jupyter Notebooks, ensuring confidentiality and compliance with data protection regulations.
    • Integration with Kubernetes secrets or external vault solutions shall be facilitated through Helm templates for secure management of encryption keys and credentials used by Jupyter Notebooks.
  4. Access Control:

    • Existing requestor/arborist RBAC mechanism should be used.
    • Helm values files shall offer flexibility to customize access control policies based on organizational roles and permissions for Jupyter Notebook users.
  5. Internet Connectivity:

    • Helm charts shall include configuration options to define policy exceptions for specific Jupyter Notebook pods, allowing controlled internet access via designated egress gateways or proxies.
    • Outbound traffic restrictions shall be configurable through Helm values files to enforce compliance with organizational policies regarding external communication from Jupyter Notebooks.
  6. Security Monitoring:

    • Helm charts shall include configurations for deploying Kubernetes-native monitoring tools such as Prometheus and Grafana to monitor the health and security of Jupyter Notebook pods and cluster infrastructure.
    • Logging configurations shall be provided to enable the capture and analysis of security-relevant events within the Kubernetes cluster, facilitating integration with centralized logging solutions for Jupyter Notebooks.
  7. Policy Enforcement:

    • Helm charts shall support the deployment of admission controllers and custom resource definitions (CRDs) to enforce custom policies governing pod deployment, network communication, and resource allocation for Jupyter Notebooks within the Kubernetes cluster.
    • CI/CD pipelines shall incorporate Helm chart validation as part of the deployment process to ensure adherence to security policies and best practices for Jupyter Notebooks.
  8. Scalability and Performance:

    • Helm charts shall include configurations for enabling horizontal pod autoscaling (HPA) and cluster autoscaling to dynamically scale Jupyter Notebook pods based on workload demands and resource utilization metrics.
    • Performance tuning parameters shall be configurable through Helm values files to optimize the operation of Jupyter Notebooks within the Kubernetes environment.
  9. High Availability:

    • Helm charts shall provide configurations for deploying Gen3Enclave components for Jupyter Notebooks with appropriate redundancy and fault tolerance mechanisms, leveraging Kubernetes deployment strategies such as ReplicaSets and PodDisruptionBudgets.
    • Disaster recovery configurations, including backup strategies and failover configurations, shall be included in the Helm charts to ensure high availability and data integrity for Jupyter Notebooks.
  10. Comprehensive Documentation:

    • Detailed documentation shall be provided for Helm chart usage, covering installation, configuration, and customization options for deploying Gen3Guard for Jupyter Notebooks within a Kubernetes environment.
    • Best practices and troubleshooting guidelines shall be included to assist operators and data scientists in managing and utilizing Gen3Guard deployments effectively for Jupyter Notebooks.

By incorporating Helm charts into the deployment workflow, Gen3Enclave can be seamlessly deployed and managed for Jupyter Notebooks within a Kubernetes environment, simplifying the operational tasks while maintaining the stringent security requirements of a secure cloud system operating within an air gap boundary.

@bwalsh
Copy link
Collaborator Author

bwalsh commented May 7, 2024

Please comment

@matthewpeterkort
Copy link
Collaborator

matthewpeterkort commented May 7, 2024

Outline seems thorough and complete. Next steps maybe to discuss if all 10 of these requirements are needed / what requirements to add and for each requirement which action items must be completed to complete it?

Seems like many of these items are helm chart improvements. Might be a good place to start

@quinnwai
Copy link

quinnwai commented May 7, 2024

Thanks for writing this, nice to see the problems we faced laid out without being implementation-specific. I took a few notes:

  • 3. Data Security for Notebooks
    • Gen3Guard shall provide encryption mechanisms for data stored and processed within Jupyter Notebooks, ensuring confidentiality and compliance with data protection regulations.

    • Does Gen3Guard need to be defined? Also, I might have missed something, but what do you mean by "encryption mechanisms"? What purpose do they serve?
  • 4. Access Control
    • Existing requestor RBAC mechanism should be used

    • Is this in reference to Synapse and syncing it? I feel like we could add that it "should be used..." to setup role-based access to both notebook pods (compute) and datasets
  • 8. Scalability and Performance
    • not sure if it's in scope given the Gen3Enclave focus, but enabling GPU on notebook pods to allow for ML applications (we never fully figured out with autoscaling) might be something to add
    • Similarly, having version control for containers for notebook pods and otherwise (alluding to failed sidecar the day-before)

@bwalsh
Copy link
Collaborator Author

bwalsh commented May 7, 2024

discuss if all 10 of these requirements are needed

@matthewpeterkort Can you take a pass on either priority order or MUST vs SHOULD terms on each requirement see https://datatracker.ietf.org/doc/html/rfc2119 for MUST SHOULD definitions

what requirements to add and for each requirement which action items must be completed to complete it?

Great point. Perhaps we need to add "Given [precondition], when I [do some action] then I expect [result]”. to some of the items

Does Gen3Guard need to be defined

Yes, my typo. Should be Gen3Enclave. Perhaps "Gen3Enclave: a set of configuration options to deploy Gen3 workspaces in an isolated manner. Where data cannot be downloaded to external destinations and notebook access to the internet is prohibited except by explicit whitelisting.

@bwalsh bwalsh changed the title Gen3Enclave: An air-gapped Gen3 deployment Gen3Enclave: An virtual air-gapped Gen3 deployment May 10, 2024
@matthewpeterkort
Copy link
Collaborator

  1. The system should leverage existing helm charts -- MUST
  2. Air Gap Boundary -- MUST but this might be done in AWS and not helm
  3. encryption mechanisms, kube secrets integration -- SHOULD
  4. Access Control -- MUST -- helm chart configuration flexibility is a SHOULD
  5. Firewall rules -- MUST -- but this might be done in AWS and not helm
  6. Security Loggin of Notebooks -- SHOULD This is nice to have but not mission critical
  7. Validation custom policies -- SHOULD but it would be the first in line after all of the required things are done
  8. Auto scaling -- MUST
  9. Chart backups, replica sets -- This might already exist currently. -- MUST
  10. Good, useful, thorough docs -- MUST

@bwalsh
Copy link
Collaborator Author

bwalsh commented May 29, 2024

Jawad Qureshi to Everyone (May 29, 2024, 1:37 PM)
https://github.com/uc-cdis/gen3-terraform

Jawad Qureshi to Everyone (May 29, 2024, 1:45 PM)
https://www.suse.com/neuvector/
uc-cdis#156

@jawadqur
Copy link

  1. The system should leverage existing helm charts
  • Agreed
  1. Air Gap Boundary
  • Data buckets:
    • out of scope, any data buckets should be created using gen3-terraform
  • network isolation:
    • Partly with network policies
      • This would be a binary block / allow internet access
    • You can also give partial internett access if you use squid as NAT gateway, which is the default network setup when setting up networking with gen3-terraform
  1. encryption mechanisms, kube secrets integration
  • AWS secrets manager + External Secrets Operator already possible in gen3-helm.
  • Data encryption can possibly be done using KMS. Needs more investigation.
  1. Access Control
  • Agreed : in helm using arborist/ requestor
  1. Firewall rules -- MUST -- but this might be done in AWS and not helm
  • network policies + squid
  1. Security Logging of Notebooks
  1. Validation custom policies
  1. Auto scaling
  • Configurable in helm I believe.
  1. Chart backups, replica sets
  1. Good, useful, thorough docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants