Skip to content

Commit

Permalink
Fix ec2 CI testing framework (#1541)
Browse files Browse the repository at this point in the history
This fixing ec2 CI testing framework we've turned off. I tried
introducing intentional test failures to see if the framework will fail
correctly (as shown in the commit history of this PR). I also introduced
10 builds to run simultaneously, to see if we would get the sporadic
test failures we had been having.
The general run fails early in the build and the FIPS specific one fails
sooner since it's being ran after the sanitizer tests. Good news is all
runs fail and succeed as anticipated without the original sporadic
issues we've been having.

1. Apparently the docker container needs to be in "privileged-mode" for
the TSAN tests to work, so I've turned that on in the SSM document.
2. An sporadic error message `E: Unable to lock the administration
directory (/var/lib/dpkg/) is another process using it? ` occurs when we
try to call `apt-get update` right after the instance is spun up.
Calling `killall apt apt-get` effectively gets around the issue.

More details on debugging can be found in thread of `P113131493`
  • Loading branch information
samuel40791765 authored Apr 29, 2024
1 parent 9a4b43e commit 4d280eb
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 27 deletions.
12 changes: 8 additions & 4 deletions tests/ci/cdk/cdk/codebuild/ec2_test_framework_omnibus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,17 @@
version: 0.2

# Doc for batch https://docs.aws.amazon.com/codebuild/latest/userguide/batch-build-buildspec.html#build-spec.batch.build-list
# The batch and build-list targets can't be null or 0 length. pre_push_checker is the fastest placeholder
batch:
build-list:
- identifier: pre_push_checker
buildspec: ./tests/ci/codebuild/linux-x86/pre-push.yml
# Actual tests are ran on an Graviton3 ec2 instance via SSM Commands.
- identifier: graviton3_tests
buildspec: ./tests/ci/codebuild/common/run_ec2_target.yml
env:
type: LINUX_CONTAINER
privileged-mode: false
compute-type: BUILD_GENERAL1_SMALL
image: 620771051181.dkr.ecr.us-west-2.amazonaws.com/aws-lc-docker-images-linux-x86:ubuntu-20.04_clang-8x_latest
image: 620771051181.dkr.ecr.us-west-2.amazonaws.com/aws-lc-docker-images-linux-x86:ubuntu-20.04_clang-7x-bm-framework_latest
variables:
EC2_AMI: "ami-0a24e6e101933d294"
EC2_INSTANCE_TYPE: "c7g.2xlarge"
ECR_DOCKER_TAG: "amazonlinux-2023_clang-15x_sanitizer"
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ batch:

# Sanitizer is very slow on ARM.
# https://github.com/google/sanitizers/issues/1331
- identifier: amazonlinux_2023_clang_14x_sanitizer
- identifier: amazonlinux_2023_clang_15x_sanitizer
buildspec: ./tests/ci/codebuild/common/run_simple_target.yml
env:
type: ARM_CONTAINER
Expand Down
50 changes: 39 additions & 11 deletions tests/ci/cdk/cdk/ssm/general_test_run_ssm_document.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,31 +14,59 @@ mainSteps:
# Codebuild just "stops" the instance calling the script, so "trap cleanup" is not executed.
- shutdown -P +60
- sudo -i
- systemctl stop apt-daily.timer
- export DEBIAN_FRONTEND=noninteractive
- export CPU_TYPE=$(dpkg --print-architecture)
- export SOURCE={SOURCE}
# if we have a cpu type of x86, we want linux-x86
- if [ "${CPU_TYPE}" = amd64 ]; then export CPU_ARCH=linux-x86; export AWS_CLI_PREFIX=x86_; sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"; fi
- if [ "${CPU_TYPE}" = amd64 ]; then export CPU_ARCH=linux-x86; export AWS_CLI_PREFIX=x86_; fi
# if we have a cpu type of arm, we want linux-aarch
- if [ "${CPU_TYPE}" = arm64 ]; then export CPU_ARCH=linux-aarch; export AWS_CLI_PREFIX=aarch; fi
# install aws-cli
- killall apt apt-get
- apt-get update
- apt-get -y install unzip
- curl "https://awscli.amazonaws.com/awscli-exe-linux-${AWS_CLI_PREFIX}64.zip" -o "awscliv2.zip"
- unzip awscliv2.zip
- ./aws/install
- git clone {GITHUB_REPO} aws-lc-pr
- cd aws-lc-pr
- git checkout {COMMIT_ID}
# Check if the source code is on S3, otherwise treat the source as a PR.
- >
if [ "$(expr substr "$SOURCE" 1 16)" = "aws-lc-codebuild" ]; then
aws s3api get-object --bucket {S3_BUCKET} --key "${SOURCE##{S3_BUCKET}/}" aws-lc-pr.zip
unzip aws-lc-pr.zip
else
git clone {SOURCE} aws-lc-pr
cd aws-lc-pr
git fetch origin pull/{PR_NUM}/head:temp
git checkout temp
git show
if [ "$(git log -n 1 --pretty=format:"%H")" != "{COMMIT_ID}" ]; then
exit 1
fi
fi
# install docker if its not already installed
- chmod +x ./tests/ci/benchmark_framework/install_docker.sh
- ./tests/ci/benchmark_framework/install_docker.sh
# log into docker and get needed docker image from ecr
- export ECR_REPO="{AWS_ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com/aws-lc-docker-images-${CPU_ARCH}"
- aws ecr get-login-password --region us-west-2 | docker login -u AWS --password-stdin "${ECR_REPO}"
- docker pull "${ECR_REPO}:{ECR_DOCKER_TAG}"
- # start the container and run the bm script
- exec_docker="docker run --env AWS_ACCOUNT_ID={AWS_ACCOUNT_ID} --env PR_NUM={PR_NUM} --env COMMIT_ID={COMMIT_ID} --env CPU_TYPE=${CPU_TYPE} -v `pwd`:`pwd` -w `pwd` ${ECR_REPO}:{ECR_DOCKER_TAG}_latest"
- chmod +x ./tests/ci/run_posix_sanitizers.sh
- $exec_docker ./tests/ci/run_posix_sanitizers.sh
- chmod +x ./tests/ci/run_fips_tests.sh
- $exec_docker ./tests/ci/run_fips_tests.sh
- docker pull "${ECR_REPO}:{ECR_DOCKER_TAG}_latest"
# Start the container. Docker needs to be run in "privileged" mode for TSAN tests to pass.
- exec_docker="docker run -v `pwd`:`pwd` -w `pwd` --privileged ${ECR_REPO}:{ECR_DOCKER_TAG}_latest"
# Check if container was spun up succesfully. Then run test scripts and check the output.
- >
if [ -n "$exec_docker" ]; then
chmod +x ./tests/ci/run_posix_sanitizers.sh
$exec_docker ./tests/ci/run_posix_sanitizers.sh
if [ $? != 0 ]; then
exit 1
fi
chmod +x ./tests/ci/run_fips_tests.sh
$exec_docker ./tests/ci/run_fips_tests.sh
if [ $? != 0 ]; then
exit 1
fi
else
exit 1
fi
- echo All ec2 test framework tests passed
26 changes: 15 additions & 11 deletions tests/ci/run_ec2_test_framework.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ cleanup() {
generate_ssm_document_file() {
# use sed to replace placeholder values inside preexisting document
sed -e "s,{AWS_ACCOUNT_ID},${AWS_ACCOUNT_ID},g" \
-e "s,{PR_NUM},${CODEBUILD_WEBHOOK_TRIGGER},g" \
-e "s,{PR_NUM},${CODEBUILD_WEBHOOK_TRIGGER//pr\/},g" \
-e "s,{COMMIT_ID},${CODEBUILD_SOURCE_VERSION},g" \
-e "s,{GITHUB_REPO},${CODEBUILD_SOURCE_REPO_URL},g" \
-e "s,{SOURCE},${CODEBUILD_SOURCE_REPO_URL},g" \
-e "s,{S3_BUCKET},${s3_bucket_name},g" \
-e "s,{ECR_DOCKER_TAG},${ecr_docker_tag},g" \
tests/ci/cdk/cdk/ssm/general_test_run_ssm_document.yaml \
> "tests/ci/cdk/cdk/ssm/${ec2_ami_id}_ssm_document.yaml"
Expand All @@ -45,9 +46,8 @@ trap cleanup EXIT

# print some information for reference
echo GitHub PR Number: "${CODEBUILD_WEBHOOK_TRIGGER}"
echo GitHub Commit Version: "${CODEBUILD_SOURCE_VERSION}"
echo AWS Account ID: "${AWS_ACCOUNT_ID}"
echo GitHub Repo Link: "${CODEBUILD_SOURCE_REPO_URL}"
echo Source: "${CODEBUILD_SOURCE_REPO_URL}"
export ec2_ami_id="$1"
export ec2_instance_type="$2"
export ecr_docker_tag="$3"
Expand Down Expand Up @@ -77,6 +77,9 @@ for i in {1..30}; do
sleep 60
done

# Wait 5 minutes for instance to "warm up"?
echo "Instances need to initialize a few minutes before SSM commands can be properly run"
sleep 300

# Create, and run ssm command.
ssm_doc_name=$(create_ssm_document "${ec2_ami_id}")
Expand All @@ -85,17 +88,18 @@ cloudwatch_group_name="aws-lc-ci-ec2-test-framework-cw-logs"
ec2_test_ssm_command_id=$(run_ssm_command "${ssm_doc_name}" "${instance_id}" ${cloudwatch_group_name})

run_url="https://${AWS_REGION}.console.aws.amazon.com/cloudwatch/home?region=${AWS_REGION}\
#logsV2:log-groups/log-group/${cloudwatch_group_name}/log-events/\
${ec2_test_ssm_command_id}\$252F${instance_id}\$252FrunShellScript\$252Fstdout"
#logsV2:log-groups/log-group/${cloudwatch_group_name}/log-events/${ec2_test_ssm_command_id}\$252F${instance_id}\$252FrunShellScript\$252F"

echo "Actual Run in EC2 can be observered at CloudWatch URL: ${run_url}"
echo "Actual Run in EC2 can be observered at CloudWatch URL: ${run_url}stdout"
echo "Error outputs can be observered at CloudWatch URL: ${run_url}stderr"

# Give some time for the commands to run

# Give some time for the commands to run, total wait time is 90 minutes.
done=false
success=false
for i in {1..45}; do
echo "${i}: Continue to wait 2 min for SSM commands to finish."
sleep 120
for i in {1..90}; do
echo "${i}: Continue to wait 1 min for SSM commands to finish."
sleep 60

ssm_command_status="$(aws ssm list-commands --command-id "${ec2_test_ssm_command_id}" --query Commands[*].Status --output text)"
ssm_target_count="$(aws ssm list-commands --command-id "${ec2_test_ssm_command_id}" --query Commands[*].TargetCount --output text)"
Expand Down

0 comments on commit 4d280eb

Please sign in to comment.