forked from aws-samples/eks-workshop-v2
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request aws-samples#639 from bkgardiner/main
Adding Inferentia AI/ML Module to Workshop
- Loading branch information
Showing
29 changed files
with
751 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,59 @@ | ||
# Steering Committee and Module Leads | ||
|
||
## Steering Commitee Members | ||
|
||
The Steering Committee is a 6 member body, overseeing the governance of the EKS Workshop. | ||
|
||
### Terms end in February 2024 | ||
|Name|Profile|Role| | ||
|:----|:-------|:----| | ||
|Sai Vennam|[@svennam92](https://github.com/svennam92)|Principal EKS DA | ||
|Niall Thomson|[@niallthomson](https://github.com/niallthomson)|Specialist Solution Architect, Containers| | ||
|Ray Krueger|[@raykrueger](https://github.com/raykrueger)|Principal Container Specialist| | ||
|Ameet Naik|[@ameetnaik](https://github.com/ameetnaik)|Technical Account Manager| | ||
|Kamran Habib|[@kmhabib](https://github.com/kmhabib)|Solution Architect (TFC at large)| | ||
|Theo Salvo|[@buzzsurfr](https://github.com/buzzsurfr)|Container Specialist (TFC core team member)| | ||
|
||
| Name | Profile | Role | | ||
| :------------ | :----------------------------------------------- | :------------------------------------------ | | ||
| Sai Vennam | [@svennam92](https://github.com/svennam92) | Principal EKS DA | | ||
| Niall Thomson | [@niallthomson](https://github.com/niallthomson) | Specialist Solution Architect, Containers | | ||
| Ray Krueger | [@raykrueger](https://github.com/raykrueger) | Principal Container Specialist | | ||
| Ameet Naik | [@ameetnaik](https://github.com/ameetnaik) | Technical Account Manager | | ||
| Kamran Habib | [@kmhabib](https://github.com/kmhabib) | Solution Architect (TFC at large) | | ||
| Theo Salvo | [@buzzsurfr](https://github.com/buzzsurfr) | Container Specialist (TFC core team member) | | ||
|
||
## Working Groups | ||
|
||
The working groups are led by chairs (6 month terms) and maintainers (6 month terms). | ||
|
||
|Working Group|Chair|Maintainers| | ||
|:----|:-------|:----| | ||
|Infrastructure|[Niall Thomson](https://github.com/niallthomson)|| | ||
|Fundamentals|[Sai Vennam](https://github.com/svennam92)|[Bijith Nair](https://github.com/bijithnair), [Tolu Okuboyejo](https://github.com/oktab1), [Hemanth AVS](https://github.com/hemanth-avs)| | ||
|Autoscaling|[Sanjeev Ganjihal](https://github.com/sanjeevrg89)|| | ||
|Automation|[Carlos Santana](https://github.com/csantanapr)|[Tsahi Duek](https://github.com/tsahiduek), [Christina Andonov](https://github.com/candonov), [Sébastien Allamand](https://github.com/allamand)| | ||
|Machine Learning|[Masatoshi Hayashi](https://github.com/literalice)|| | ||
|Networking|[Sheetal Joshi](https://github.com/sheetaljoshi)|[Umair Ishaq](https://github.com/umairishaq)| | ||
|Observability|[Nirmal Mehta](https://github.com/normalfaults)|[Steven David](https://github.com/StevenDavid)| | ||
|Security|[Rodrigo Bersa](https://github.com/rodrigobersa)| | | ||
|Storage|[Eric Heinrichs](https://github.com/heinrichse)|[Andrew Peng](https://github.com/pengc99)| | ||
| Working Group | Chair | Maintainers | | ||
| :--------------- | :------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| Infrastructure | [Niall Thomson](https://github.com/niallthomson) | | | ||
| Fundamentals | [Sai Vennam](https://github.com/svennam92) | [Bijith Nair](https://github.com/bijithnair), [Tolu Okuboyejo](https://github.com/oktab1), [Hemanth AVS](https://github.com/hemanth-avs) | | ||
| Autoscaling | [Sanjeev Ganjihal](https://github.com/sanjeevrg89) | | | ||
| Automation | [Carlos Santana](https://github.com/csantanapr) | [Tsahi Duek](https://github.com/tsahiduek), [Christina Andonov](https://github.com/candonov), [Sébastien Allamand](https://github.com/allamand) | | ||
| Machine Learning | [Masatoshi Hayashi](https://github.com/literalice) | [Benjamin Gardiner](https://github.com/bkgardiner) | | ||
| Networking | [Sheetal Joshi](https://github.com/sheetaljoshi) | [Umair Ishaq](https://github.com/umairishaq) | | ||
| Observability | [Nirmal Mehta](https://github.com/normalfaults) | [Steven David](https://github.com/StevenDavid) | | ||
| Security | [Rodrigo Bersa](https://github.com/rodrigobersa) | | | ||
| Storage | [Eric Heinrichs](https://github.com/heinrichse) | [Andrew Peng](https://github.com/pengc99) | | ||
|
||
## Wranglers | ||
|
||
Wranglers will work across all topic areas and serve for at least 6 months. | ||
|Name|Profile|Role| | ||
|:----|:-------|:----| | ||
|Math Bruneau|[@ROunofF](https://github.com/ROunofF)|Specialist Solution Architect, Containers| | ||
|
||
|
||
## Emeritus | ||
|Name|Profile|Role| | ||
|:----|:-------|:----| | ||
|Jeremy Cowan|[@jicowan](https://github.com/jicowan)|EKS DA manager| | ||
|
||
| Name | Profile | Role | | ||
| :----------- | :------------------------------------- | :------------- | | ||
| Jeremy Cowan | [@jicowan](https://github.com/jicowan) | EKS DA manager | | ||
|
||
## Meetings | ||
|
||
### Schedule and Cadence | ||
|
||
The steering committee will host a public meeting every third Thursday of the month at 9AM CT. <!--update with Chime link--> | ||
|
||
### Resources | ||
* <!--add links to meeting notes and recordings--> | ||
|
||
- <!--add links to meeting notes and recordings--> | ||
|
||
## Contact | ||
* Mailing List: <[email protected]> | ||
|
||
- Mailing List: <[email protected]> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#!/bin/bash | ||
|
||
set -e | ||
|
||
echo "Deleting AIML resources..." | ||
|
||
kubectl delete namespace aiml > /dev/null | ||
|
||
echo "Deleting Karpenter provisioners..." | ||
|
||
kubectl delete provisioner --all > /dev/null | ||
kubectl delete awsnodetemplate --all > /dev/null | ||
|
||
echo "Waiting for Karpenter nodes to be removed..." | ||
|
||
EXIT_CODE=0 | ||
|
||
timeout --foreground -s TERM 30 bash -c \ | ||
'while [[ $(kubectl get nodes --selector=type=karpenter -o json | jq -r ".items | length") -gt 0 ]];\ | ||
do sleep 5;\ | ||
done' || EXIT_CODE=$? | ||
|
||
if [ $EXIT_CODE -ne 0 ]; then | ||
echo "Warning: Karpenter nodes did not clean up" | ||
fi |
128 changes: 128 additions & 0 deletions
128
manifests/modules/aiml/inferentia/.workshop/terraform/addon.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
data "aws_subnets" "private" { | ||
tags = { | ||
created-by = "eks-workshop-v2" | ||
env = local.addon_context.eks_cluster_id | ||
} | ||
|
||
filter { | ||
name = "tag:Name" | ||
values = ["*Private*"] | ||
} | ||
} | ||
|
||
module "iam_assumable_role_inference" { | ||
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc" | ||
version = "~> v5.5.0" | ||
create_role = true | ||
role_name = "${local.addon_context.eks_cluster_id}-inference" | ||
provider_url = local.addon_context.eks_oidc_issuer_url | ||
role_policy_arns = [aws_iam_policy.inference.arn] | ||
oidc_fully_qualified_subjects = ["system:serviceaccount:aiml:inference"] | ||
|
||
tags = local.tags | ||
} | ||
|
||
|
||
resource "aws_iam_policy" "inference" { | ||
name = "${local.addon_context.eks_cluster_id}-inference" | ||
path = "/" | ||
description = "IAM policy for the inferenct workload" | ||
|
||
policy = <<EOF | ||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Effect": "Allow", | ||
"Action": "s3:*", | ||
"Resource": [ | ||
"arn:aws:s3:::${aws_s3_bucket.inference.id}", | ||
"arn:aws:s3:::${aws_s3_bucket.inference.id}/*" | ||
] | ||
} | ||
] | ||
} | ||
EOF | ||
} | ||
|
||
module "karpenter" { | ||
source = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.25.0//modules/kubernetes-addons/karpenter" | ||
addon_context = merge(local.addon_context, { default_repository = local.amazon_container_image_registry_uris[data.aws_region.current.name] }) | ||
|
||
node_iam_instance_profile = aws_iam_instance_profile.karpenter_node.name | ||
|
||
helm_config = { | ||
set = [{ | ||
name = "replicas" | ||
value = "1" | ||
}] | ||
} | ||
} | ||
|
||
resource "aws_iam_instance_profile" "karpenter_node" { | ||
name = "${local.addon_context.eks_cluster_id}-karpenter-node" | ||
role = aws_iam_role.karpenter_node.name | ||
} | ||
|
||
resource "aws_iam_role" "karpenter_node" { | ||
name = "${local.addon_context.eks_cluster_id}-karpenter-node" | ||
|
||
assume_role_policy = jsonencode({ | ||
Version = "2012-10-17" | ||
Statement = [ | ||
{ | ||
Action = "sts:AssumeRole" | ||
Effect = "Allow" | ||
Sid = "" | ||
Principal = { | ||
Service = "ec2.amazonaws.com" | ||
} | ||
}, | ||
] | ||
}) | ||
|
||
managed_policy_arns = [ | ||
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEKS_CNI_Policy", | ||
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEKSWorkerNodePolicy", | ||
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", | ||
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonSSMManagedInstanceCore" | ||
] | ||
|
||
tags = local.tags | ||
} | ||
|
||
data "http" "neuron_device_plugin_rbac_manifest" { | ||
url = "https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/v2.6.0/src/k8/k8s-neuron-device-plugin-rbac.yml" | ||
} | ||
|
||
data "http" "neuron_device_plugin_manifest" { | ||
url = "https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/v2.6.0/src/k8/k8s-neuron-device-plugin.yml" | ||
} | ||
|
||
data "kubectl_file_documents" "neuron_device_plugin_rbac_doc" { | ||
content = data.http.neuron_device_plugin_rbac_manifest.response_body | ||
} | ||
|
||
data "kubectl_file_documents" "neuron_device_plugin_doc" { | ||
content = data.http.neuron_device_plugin_manifest.response_body | ||
} | ||
|
||
resource "kubectl_manifest" "neuron_device_plugin_rbac" { | ||
for_each = data.kubectl_file_documents.neuron_device_plugin_rbac_doc.manifests | ||
yaml_body = each.value | ||
} | ||
|
||
resource "kubectl_manifest" "neuron_device_plugin" { | ||
for_each = data.kubectl_file_documents.neuron_device_plugin_doc.manifests | ||
yaml_body = each.value | ||
} | ||
|
||
output "environment" { | ||
value = <<EOF | ||
export AIML_NEURON_ROLE_ARN=${module.iam_assumable_role_inference.iam_role_arn} | ||
export AIML_NEURON_BUCKET_NAME=${resource.aws_s3_bucket.inference.id} | ||
export AIML_DL_IMAGE=763104351884.dkr.ecr.${data.aws_region.current.name}.amazonaws.com/pytorch-inference-neuron:1.13.1-neuron-py310-sdk2.12.0-ubuntu20.04 | ||
export AIML_SUBNETS=${data.aws_subnets.private.ids[0]},${data.aws_subnets.private.ids[1]},${data.aws_subnets.private.ids[2]} | ||
export KARPENTER_NODE_ROLE="${aws_iam_role.karpenter_node.arn}" | ||
EOF | ||
} |
6 changes: 6 additions & 0 deletions
6
manifests/modules/aiml/inferentia/.workshop/terraform/addon_infrastructure.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
resource "aws_s3_bucket" "inference" { | ||
bucket_prefix = "eksworkshop-inference" | ||
force_destroy = true | ||
|
||
tags = local.tags | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
AIML_NEURON_ROLE_ARN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
configMapGenerator: | ||
- name: base-vars | ||
namespace: aiml | ||
env: config.properties | ||
options: | ||
disableNameSuffixHash: true | ||
replacements: | ||
- source: | ||
kind: ConfigMap | ||
name: base-vars | ||
version: v1 | ||
namespace: aiml | ||
fieldPath: data.AIML_NEURON_ROLE_ARN | ||
targets: | ||
- select: | ||
kind: ServiceAccount | ||
name: inference | ||
namespace: aiml | ||
fieldPaths: | ||
- metadata.annotations.[eks.amazonaws.com/role-arn] | ||
resources: | ||
- serviceaccount.yaml | ||
- namespace.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
apiVersion: v1 | ||
kind: Namespace | ||
metadata: | ||
name: aiml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
name: inference | ||
namespace: aiml | ||
annotations: | ||
eks.amazonaws.com/role-arn: ${AIML_NEURON_ROLE_ARN} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
labels: | ||
role: compiler | ||
name: compiler | ||
namespace: aiml | ||
spec: | ||
containers: | ||
- command: | ||
- sh | ||
- -c | ||
- sleep infinity | ||
image: ${AIML_DL_IMAGE} | ||
name: compiler | ||
serviceAccountName: inference |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
AIML_DL_IMAGE |
26 changes: 26 additions & 0 deletions
26
manifests/modules/aiml/inferentia/compiler/kustomization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
bases: | ||
- ../base | ||
configMapGenerator: | ||
- name: compiler-vars | ||
namespace: aiml | ||
env: config.properties | ||
options: | ||
disableNameSuffixHash: true | ||
replacements: | ||
- source: | ||
kind: ConfigMap | ||
name: compiler-vars | ||
version: v1 | ||
namespace: aiml | ||
fieldPath: data.AIML_DL_IMAGE | ||
targets: | ||
- select: | ||
kind: Pod | ||
name: compiler | ||
namespace: aiml | ||
fieldPaths: | ||
- spec.containers.0.image | ||
resources: | ||
- compiler.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
import torch | ||
import numpy as np | ||
import os | ||
import torch_neuron | ||
from torchvision import models | ||
|
||
image = torch.zeros([1, 3, 224, 224], dtype=torch.float32) | ||
|
||
## Load a pretrained ResNet50 model | ||
model = models.resnet50(pretrained=True) | ||
|
||
## Tell the model we are using it for evaluation (not training) | ||
model.eval() | ||
model_neuron = torch.neuron.trace(model, example_inputs=[image]) | ||
|
||
## Export to saved model | ||
model_neuron.save("resnet50_neuron.pt") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
AIML_DL_IMAGE |
Oops, something went wrong.