Skip to content

Commit

Permalink
Infra: add Arc conformance tests (#896)
Browse files Browse the repository at this point in the history
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

- Move Ginkgo files under a new folder to be shared between testkube and
arc conformance runners
- Add instructions for running the conformance tests locally and for the
matrix and add step to the release docs
- Add dockerfile and script for the sonobuoy container to run the ginkgo
tests and format results for sonobuoy
  • Loading branch information
gracewehner authored Jun 5, 2024
1 parent 8adefff commit 20593d8
Show file tree
Hide file tree
Showing 36 changed files with 498 additions and 16 deletions.
2 changes: 1 addition & 1 deletion internal/docs/BUILDANDRELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Each merge into `main` will push the image to the public mcr and deploy to the d
- Select `Create release`, then choose the build version which should be the same as the image tag.
- This pushes the linux, windows, and chart builds to the prod ACR which is synchronized with the prod MCR.
- Once pushed, you can manually start the `Deploy to prod clusters` stage to deploy the image to our prod clusters.
- **E2E Conformance Tests**: Ask for our conformance tests to be run in the [Arc Conformance teams channel](https://teams.microsoft.com/l/channel/19%3arlnJ5tIxEMP-Hhe-pRPPp9C6iYQ1CwAelt4zTqyC_NI1%40thread.tacv2/General?groupId=a077ab34-99ea-490c-b204-358d31c24fbe&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47).
- **E2E Conformance Tests**: Ask for our conformance tests to be run in the [Arc Conformance teams channel](https://teams.microsoft.com/l/channel/19%3arlnJ5tIxEMP-Hhe-pRPPp9C6iYQ1CwAelt4zTqyC_NI1%40thread.tacv2/General?groupId=a077ab34-99ea-490c-b204-358d31c24fbe&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47). Follow the instructions in the [Arc test README](../../otelcollector/test/arc-conformance/README.md#testing-on-the-arc-conformance-matrix).
- **PR 2**: Get the chart semver or container image tag from the commit used for **Build 1** and update the release notes with the changelog. Link to a similar PR [here](https://github.com/Azure/prometheus-collector/pull/298)
- **PR 3**: Make a PR to update the [Geneva docs](https://msazure.visualstudio.com/One/_git/EngSys-MDA-GenevaDocs?path=%2Fdocumentation%2Fmetrics%2FPrometheus&version=GBmaster&_a=contents) with any changes made in `/otelcollector/deploy/eng.ms/docs/Prometheus`
- **PR 4**: Make changes in AgentBaker for this new image version. Link to similar PR [here](https://github.com/Azure/AgentBaker/pull/2285/files)
Expand Down
6 changes: 3 additions & 3 deletions otelcollector/test/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,12 +134,12 @@ Ginkgo can be used for any tests written in golang, whether they are unit, integ
- You have run `az login` from the terminal you will be running the tests in.

## Running the Tests
- Run the commands below by replacing the placeholders with the SP Client ID, SP Secret, and the AMW query endpoint:
- Run the commands below by replacing the placeholders with the AMW query endpoint:
```
(bash) export GOPROXY=https://proxy.golang.org / (powershell) $env:GOPROXY = "https://proxy.golang.org"
sudo -E go install -v github.com/onsi/ginkgo/v2/ginkgo@latest
cd otelcollector/test
cd otelcollector/test/ginkgo-e2e
AMW_QUERY_ENDPOINT="<query endpoint>" \
ginkgo -p -r --keep-going --label-filter='!/./' -ldflags="-s -X github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring.GroupName=azmonitoring.coreos.com"
Expand Down Expand Up @@ -350,7 +350,7 @@ Some highlights are that:
cd ./testkube
kubectl apply -f testkube-test-crs.yaml
```
- Get the full resource ID of your AMW and a the client ID of an AKS cluster managed identity. Run the following command to allow query access from the cluster:
- Get the full resource ID of your AMW and the client ID of the AKS cluster kubelet managed identity. Run the following command to allow query access from the cluster:

```
az role assignment create --assignee <client ID> --role "Monitoring Data Reader" --scope <AMW resource ID>
Expand Down
28 changes: 28 additions & 0 deletions otelcollector/test/arc-conformance/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
FROM mcr.microsoft.com/oss/go/microsoft/golang:1.21

RUN go install -v github.com/onsi/ginkgo/v2/ginkgo@latest

RUN curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash \
&& helm version

RUN apt-get update && apt-get -y upgrade && \
apt-get -f -y install curl apt-transport-https lsb-release gnupg python3-pip && \
curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > /etc/apt/trusted.gpg.d/microsoft.asc.gpg && \
CLI_REPO=$(lsb_release -cs) && \
echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ ${CLI_REPO} main" \
> /etc/apt/sources.list.d/azure-cli.list && \
apt-get update && \
apt-get install -y azure-cli && \
rm -rf /var/lib/apt/lists/*

RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
RUN install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

COPY arc-conformance/e2e_tests.sh /
COPY ginkgo-e2e ginkgo-e2e/
RUN ginkgo build -r ./ginkgo-e2e
RUN mkdir ginkgo-test-binaries && mv ginkgo-e2e/containerstatus/containerstatus.test ginkgo-e2e/livenessprobe/livenessprobe.test ginkgo-e2e/operator/operator.test ginkgo-e2e/prometheusui/prometheusui.test ginkgo-e2e/querymetrics/querymetrics.test ginkgo-test-binaries/

RUN ["chmod", "+x", "/e2e_tests.sh"]
ENTRYPOINT [ "/bin/bash" ]
CMD [ "/e2e_tests.sh" ]
58 changes: 58 additions & 0 deletions otelcollector/test/arc-conformance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Arc Conformance Testing

Instead of TestKube, Arc uses Sonobuoy as the test runner on the cluster. The underlying Ginkgo tests can be used in the same way as TestKube however.

A custom Sonobuoy plugin container image is created to run the tests. This container has an entrypoint of [e2e_tests.sh](./e2e_tests.sh). It ensures the cluster is connected to Arc, has the Arc pods running, then installs the ama-metrics extension, and waits for the pods to be ready. Then the Ginkgo tests are run inside the cluster and the results are stored in an XML format that the Sonobuoy pod recognizes.

The [Dockerfile](./Dockerfile) for the image uses the Microsoft Golang base image, installs Helm, kubectl, the Azure CLI, and the Ginkgo CLI. It builds the Ginkgo tests as binaries so that the tests don't need to be built at runtime. These test binaries are copied into the container and then [e2e_tests.sh](./e2e_tests.sh) is set as the entrypoint.

The Arc team only uses the file [arc-conformance.yaml](./arc-conformance.yaml) to run our plugin in the conformance test matrix. The latest image tag needs to be updated here whenever a new one is built.

## Building our Sonobuoy plugin image
From the repository root:
```bash
cd otelcollector/test
sudo docker build -f arc-conformance/Dockerfile -t containerinsightsprod.azurecr.io/public/azuremonitor/containerinsights/cidev/prometheus-collector:conf-<tag> .
az acr login -n containerinsightsprod -u containerinsightsprod -p <password>
docker push containerinsightsprod.azurecr.io/public/azuremonitor/containerinsights/cidev/prometheus-collector:conf-<tag>
```

## Testing locally
Use [local-e2e-tests.yaml](./local-e2e-tests.yaml) to setup sonobuoy and run the tests on your cluster. Use the cluster managed identity and give permissions to enable the extension and query the AMW.

In this file, replace the enivronment variables:
```yaml
- name: WORKLOAD_CLIENT_ID
value: "<Managed identity client ID>"
- name: TENANT_ID
value: "<Arc cluster and managed identity tenant ID>"
- name: SUBSCRIPTION_ID
value: "<Arc cluster subscription ID>"
- name: RESOURCE_GROUP
value: "<Arc cluster reource group>"
- name: CLUSTER_NAME
value: "<Arc cluster name>"
```
Run the Sonobuoy pod that will deploy a job to run our plugin:
```bash
kubectl apply -f local-e2e-tests.yaml
kubectl get pods -n sonobuoy
kubectl logs <sonobuoy-agenttests-job-* pod name> -n sonobuoy -f
sonobuoy status --json
```

The logs will have the full output of the Ginkgo tests.

The sonobuoy status command will have the number of tests that passed, failed, or were skipped:
```json
{"plugins":[{"plugin":"agenttests","node":"global","status":"complete","result-status":"passed","result-counts":{"passed":50,"skipped":18}}],"status":"complete","tar-info":{"name":"202405152328_sonobuoy_bf5c02ed-1948-48f1-b12d-5a2d74435e46.tar.gz","created":"2024-05-15T23:49:32.876748551Z","sha256":"559406070bd5738dd077355be5fdb5560497680be938d3d0a63a2a8f4ac66d15","size":282521}}
```

## Testing on the Arc conformance matrix
1. In the [release](https://github-private.visualstudio.com/azure/_releaseDefinition?definitionId=79&_a=definition-pipeline), the task `Deploy to Prod Clusters` will deploy the arc extension to the `Staging` release train. This is the release train our conformance tests use.
2. After releasing to `Staging`, create a duplicate task of [this format](https://dev.azure.com/ArcValidationProgram/ArcValidationProgram/_workitems/edit/1161) and update the title to have the latest agent version.
3. Post in the [Teams channel](https://teams.microsoft.com/l/channel/19%3ArlnJ5tIxEMP-Hhe-pRPPp9C6iYQ1CwAelt4zTqyC_NI1%40thread.tacv2/General?groupId=a077ab34-99ea-490c-b204-358d31c24fbe&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47) asking for the conformance tests to be run. An example post is [here](https://teams.microsoft.com/l/message/19:[email protected]/1715902653350?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&groupId=a077ab34-99ea-490c-b204-358d31c24fbe&parentMessageId=1715902653350&teamName=Azure%20Arc%20Conformance%20Testing&channelName=General&createdTime=1715902653350).
4. Wait until the Arc team responds if the `Extension Plugin` tests have passed. The logs of the Ginkgo tests can be viewed by navigating to the test result page and downloading all logs.
5. After the tests have passed, the extension can be released to the `Stable` release train by starting the `ARC Small Region` release task.

15 changes: 15 additions & 0 deletions otelcollector/test/arc-conformance/arc-conformance.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
sonobuoy-config:
driver: Job
plugin-name: azure-arc-ama-metrics-conformance
result-format: junit
spec:
image: mcr.microsoft.com/azuremonitor/containerinsights/cidev/prometheus-collector:conf-053124
imagePullPolicy: Always
name: plugin
resources: {}
volumes:
- name: results
emptyDir: {}
volumeMounts:
- mountPath: /tmp/results
name: results
207 changes: 207 additions & 0 deletions otelcollector/test/arc-conformance/e2e_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
#!/bin/bash
set -x
$x &> /dev/null
results_dir="${RESULTS_DIR:-/tmp/results}"

validateAuthParameters() {
if [ -z $WORKLOAD_CLIENT_ID ]; then
echo "ERROR: parameter WORKLOAD_CLIENT_ID is required." > ${results_dir}/error
fi
}

validateArcConfTestParameters() {
if [ -z $SUBSCRIPTION_ID ]; then
echo "ERROR: parameter SUBSCRIPTION_ID is required." > ${results_dir}/error
fi

if [ -z $RESOURCE_GROUP ]; then
echo "ERROR: parameter RESOURCE_GROUP is required." > ${results_dir}/error
fi

if [ -z $CLUSTER_NAME ]; then
echo "ERROR: parameter CLUSTER_NAME is required." > ${results_dir}/error
fi
}

login_to_azure() {
az login --identity --username $WORKLOAD_CLIENT_ID
echo "setting subscription: ${SUBSCRIPTION_ID} as default subscription"
az account set -s $SUBSCRIPTION_ID
}

addArcConnectedK8sExtension() {
echo "adding Arc K8s connectedk8s extension"
az extension add --name connectedk8s 2> ${results_dir}/error
}

waitForResourcesReady() {
ready=false
max_retries=60
sleep_seconds=10
NAMESPACE=$1
RESOURCETYPE=$2
RESOURCE=$3
# if resource not specified, set to --all
if [ -z $RESOURCE ]; then
RESOURCE="--all"
fi
for i in $(seq 1 $max_retries)
do
allPodsAreReady=$(kubectl wait --for=condition=Ready ${RESOURCETYPE} ${RESOURCE} --namespace ${NAMESPACE})
if [ $? -ne 0 ]; then
echo "waiting for the resource:${RESOURCE} of the type:${RESOURCETYPE} in namespace:${NAMESPACE} to be ready state, iteration:${i}"
sleep ${sleep_seconds}
else
echo "resource:${RESOURCE} of the type:${RESOURCETYPE} in namespace:${NAMESPACE} in ready state"
ready=true
break
fi
done

echo "waitForResourcesReady state: $ready"
}

waitForArcK8sClusterCreated() {
connectivityState=false
max_retries=60
sleep_seconds=10
for i in $(seq 1 $max_retries)
do
echo "iteration: ${i}, clustername: ${CLUSTER_NAME}, resourcegroup: ${RESOURCE_GROUP}"
clusterState=$(az connectedk8s show --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --query connectivityStatus -o json)
clusterState=$(echo $clusterState | tr -d '"' | tr -d '"\r\n')
echo "cluster current state: ${clusterState}"
if [ ! -z "$clusterState" ]; then
if [[ ("${clusterState}" == "Connected") || ("${clusterState}" == "Connecting") ]]; then
connectivityState=true
break
fi
fi
sleep ${sleep_seconds}
done
echo "Arc K8s cluster connectivityState: $connectivityState"
}

addArcK8sCLIExtension() {
if [ ! -z "$K8S_EXTENSION_WHL_URL" ]; then
echo "adding Arc K8s k8s-extension cli extension from whl file path ${K8S_EXTENSION_WHL_URL}"
az extension add --source $K8S_EXTENSION_WHL_URL -y
else
echo "adding Arc K8s k8s-extension cli extension"
az extension add --name k8s-extension
fi
}

createArcAMAMetricsExtension() {
echo "iteration: ${i}, clustername: ${CLUSTER_NAME}, resourcegroup: ${RESOURCE_GROUP}"
installState=$(az k8s-extension show --cluster-name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --cluster-type connectedClusters --name azuremonitor-metrics --query provisioningState -o json)
installState=$(echo $installState | tr -d '"' | tr -d '"\r\n')
echo "extension install state: ${installState}"
if [ ! -z "$installState" ]; then
if [ "${installState}" == "Succeeded" ]; then
installedState=true
return
fi
fi

echo "creating extension type: Microsoft.AzureMonitor.Containers.Metrics"
basicparameters="--cluster-name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --cluster-type connectedClusters --extension-type Microsoft.AzureMonitor.Containers.Metrics --scope cluster --name azuremonitor-metrics --allow-preview true"
if [ ! -z "$AMA_METRICS_ARC_RELEASE_TRAIN" ]; then
basicparameters="$basicparameters --release-train $AMA_METRICS_ARC_RELEASE_TRAIN"
fi
if [ ! -z "$AMA_METRICS_ARC_VERSION" ]; then
basicparameters="$basicparameters --version $AMA_METRICS_ARC_VERSION --AutoUpgradeMinorVersion false"
fi

az k8s-extension create $basicparameters
}

showArcAMAMetricsExtension() {
echo "Arc AMA Metrics extension status"
az k8s-extension show --cluster-name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --cluster-type connectedClusters --name azuremonitor-metrics
}

waitForAMAMetricsExtensionInstalled() {
installedState=false
max_retries=60
sleep_seconds=10
for i in $(seq 1 $max_retries)
do
echo "iteration: ${i}, clustername: ${CLUSTER_NAME}, resourcegroup: ${RESOURCE_GROUP}"
installState=$(az k8s-extension show --cluster-name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --cluster-type connectedClusters --name azuremonitor-metrics --query provisioningState -o json)
installState=$(echo $installState | tr -d '"' | tr -d '"\r\n')
echo "extension install state: ${installState}"
if [ ! -z "$installState" ]; then
if [ "${installState}" == "Succeeded" ]; then
installedState=true
break
fi
fi
sleep ${sleep_seconds}
done
}

getAMAMetricsAMWQueryEndpoint() {
amw=$(az k8s-extension show --cluster-name ci-dev-arc-wcus --resource-group ci-dev-arc-wcus --cluster-type connectedClusters --name azuremonitor-metrics --query configurationSettings -o json)
echo "Azure Monitor Metrics extension amw: $amw"
amw=$(echo $amw | tr -d '"\r\n {}')
amw="${amw##*:}"
echo "extension amw: ${amw}"
queryEndpoint=$(az monitor account show --ids ${amw} --query "metrics.prometheusQueryEndpoint" -o json | tr -d '"\r\n')
echo "queryEndpoint: ${queryEndpoint}"
export AMW_QUERY_ENDPOINT=$queryEndpoint
}

deleteArcAMAMetricsExtension() {
az k8s-extension delete --name azuremonitor-metrics \
--cluster-type connectedClusters \
--cluster-name $CLUSTER_NAME \
--resource-group $RESOURCE_GROUP --yes
}

# saveResults prepares the results for handoff to the Sonobuoy worker.
# See: https://github.com/vmware-tanzu/sonobuoy/blob/master/docs/plugins.md
saveResults() {
cd ${results_dir}

# Sonobuoy worker expects a tar file.
tar czf results.tar.gz *

# Signal to the worker that we are done and where to find the results.
printf ${results_dir}/results.tar.gz > ${results_dir}/done
}

# Ensure that we tell the Sonobuoy worker we are done regardless of results.
trap saveResults EXIT

validateAuthParameters

validateArcConfTestParameters

login_to_azure

addArcConnectedK8sExtension

waitForResourcesReady azure-arc pods

waitForArcK8sClusterCreated

addArcK8sCLIExtension

createArcAMAMetricsExtension

showArcAMAMetricsExtension

waitForAMAMetricsExtensionInstalled

getAMAMetricsAMWQueryEndpoint

sleep 5m
cd ginkgo-test-binaries
files=("containerstatus.test" "prometheusui.test" "operator.test" "querymetrics.test" "livenessprobe.test")
for file in "${files[@]}"; do
AMW_QUERY_ENDPOINT=$AMW_QUERY_ENDPOINT ginkgo -p -r --junit-report=${results_dir}/results-$file.xml --keep-going --label-filter='!/./ || arc-extension' -ldflags="-s -X github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring.GroupName=azmonitoring.coreos.com" "$file"
done
cd ..

deleteArcAMAMetricsExtension
Loading

0 comments on commit 20593d8

Please sign in to comment.