Skip to content

Commit

Permalink
Support Graviton instances and switch to EC2 fleets (4dn-dcic#375)
Browse files Browse the repository at this point in the history
* Initial version

* Minor changes

* Some refactoring, add tests

* Modifiy docs

* Cleanup + version bump

* Add AMIs

* Address reviewer comments

* Remove unnecessary permissions
  • Loading branch information
alexander-veit authored Dec 6, 2022
1 parent b8e50bd commit e968c0c
Show file tree
Hide file tree
Showing 30 changed files with 590 additions and 488 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
Change Log
==========

3.0.0
=====

* Added support for Graviton instances.
* Removed ``other_instance_types`` as option for ``behavior_on_capacity_limit``. It will fall back to ``wait_and_retry``.
* Multiple instance types can be specified in the configuration. If ``spot_instance`` is enabled, Tibanna will run the workflow on the instance with the highest available capacity. If ``spot_instance`` is disabled, it will run the workflow on the cheapest instance in the list.
* Instead of using the ``run_instance`` command we switch to EC2 fleets (in instant mode) to start up instances.


2.2.6
=====

Expand Down
28 changes: 17 additions & 11 deletions awsf3-docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,25 +35,31 @@ RUN apt update -y && apt upgrade -y && apt install -y \
libseccomp-dev \
pkg-config \
openjdk-8-jre-headless \
nodejs
nodejs \
gnupg \
lsb-release

RUN ln -s /usr/bin/python3.8 /usr/bin/python
#RUN ln -s /usr/bin/pip3 /usr/bin/pip

WORKDIR /usr/local/bin

# docker inside docker
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - \
&& apt-key fingerprint 0EBFCD88 \
&& add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
RUN apt-get update -y \
&& apt-cache policy docker-ce \
&& apt-get install -y docker-ce
# install docker inside docker
RUN mkdir -p /etc/apt/keyrings
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg

RUN echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

RUN apt-get update
RUN apt-get --assume-yes install docker-ce

# singularity
RUN wget https://golang.org/dl/go1.16.6.linux-amd64.tar.gz && \
tar -xzf go1.16.6.linux-amd64.tar.gz && \
rm go1.16.6.linux-amd64.tar.gz
RUN ARCH="$(dpkg --print-architecture)" && \
wget "https://golang.org/dl/go1.16.6.linux-${ARCH}.tar.gz" && \
tar -xzf "go1.16.6.linux-${ARCH}.tar.gz" && \
rm "go1.16.6.linux-${ARCH}.tar.gz"
RUN export SINGULARITY_VERSION=3.8.1 && \
export PATH=/usr/local/bin/go/bin/:$PATH && \
wget https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-ce-${SINGULARITY_VERSION}.tar.gz && \
Expand Down
1 change: 1 addition & 0 deletions awsf3-docker/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ export TOPLATESTFILE=$LOCAL_OUTDIR/$JOBID.top_latest # this one includes only t
export INSTANCE_ID=$(ec2metadata --instance-id|cut -d' ' -f2)
export INSTANCE_REGION=$(ec2metadata --availability-zone | sed 's/[a-z]$//')
export INSTANCE_AVAILABILITY_ZONE=$(ec2metadata --availability-zone)
export INSTANCE_TYPE=$(ec2metadata --instance-type)
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity| grep Account | sed 's/[^0-9]//g')
export AWS_REGION=$INSTANCE_REGION # this is for importing awsf3 package which imports tibanna package
export LOCAL_OUTDIR_CWL=$MOUNT_DIR_PREFIX$LOCAL_OUTDIR
Expand Down
5 changes: 4 additions & 1 deletion awsf3/aws_run_workflow_generic.sh
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,11 @@ exl echo "## Installing and activating Cloudwatch agent to collect metrics"
cwd0=$(pwd)
cd ~

ARCHITECTURE="$(dpkg --print-architecture)"
CW_AGENT_LINK="https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/${ARCHITECTURE}/latest/amazon-cloudwatch-agent.deb"
apt install -y wget
wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
exl echo "Loading Cloudwatch Agent from ${CW_AGENT_LINK}"
wget "${CW_AGENT_LINK}"
sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
# If we want to collect new metrics, the following file has to be modified
exl echo "## Using CW Agent config: https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/cloudwatch_agent_config.json"
Expand Down
3 changes: 2 additions & 1 deletion awsf3/cloudwatch_agent_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@
"measurement": [
"io_time",
"read_bytes",
"iops_in_progress"
"iops_in_progress",
"read_time"
],
"metrics_collection_interval": 60,
"resources": [
Expand Down
1 change: 1 addition & 0 deletions awsf3/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ def update_postrun_json_init(json_old, json_new):
prj.Job.instance_id = os.getenv('INSTANCE_ID')
prj.Job.filesystem = os.getenv('EBS_DEVICE')
prj.Job.instance_availablity_zone = os.getenv('INSTANCE_AVAILABILITY_ZONE')
prj.Job.instance_type = os.getenv('INSTANCE_TYPE')

# write to new json file
write_postrun_json(json_new, prj)
Expand Down
4 changes: 2 additions & 2 deletions docs/ami.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Amazon Machine Image
====================

Tibanna now uses a single Amazon Machine Image (AMI) ``ami-06e2266f85063aabc``, which is made public for ``us-east-1``. One can find them among Community AMIs. (Tibanna automatically finds and uses them, so no need to worry about it.)
Tibanna now uses the Amazon Machine Images (AMI) ``ami-06e2266f85063aabc`` (``x86``) and ``ami-0f3e90ad8e76c7a32`` (``Arm``), which are made public for ``us-east-1``. One can find them among Community AMIs. (Tibanna automatically finds and uses them, so no need to worry about it.)

For regions that are not ``us-east-1``, a copy of the same AMI is publicly available (different AMI ID) and is auto-detected by Tibanna.
For regions that are not ``us-east-1``, copies of these AMIs are publicly available (different AMI IDs) and are auto-detected by Tibanna.

11 changes: 7 additions & 4 deletions docs/execution_json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,12 @@ The ``config`` field describes execution configuration.
:instance_type:
- <instance_type>
- This or ``mem`` and ``cpu`` are required if Benchmark is not available for a given workflow.
- If both ``instance_type`` and ``mem`` & ``cpu`` are specified, then ``instance_type`` is the first choice.
- ``instance_type`` can be a string (e.g., ``t3.micro``) or a list (e.g., ``[t3.micro, t3.small]``). If ``spot_instance``
is enabled, Tibanna will run the workflow on the instance with the highest available capacity. If ``spot_instance``
is disabled, it will run the workflow on the cheapest instance in the list.
- If both ``instance_type`` and ``mem`` & ``cpu`` are specified, Tibanna internally creates a list of instances that
are directly specified in ``instance_type`` and instances that satisfy the ``mem`` & ``cpu`` requirement. One instance is chosen
according to the rules above to run the workflow.

:mem:
- <memory_in_gb>
Expand Down Expand Up @@ -588,9 +593,7 @@ The ``config`` field describes execution configuration.
- available options :

- ``fail`` (default)
- ``wait_and_retry`` (wait and retry with the same instance type again),
- ``other_instance_types`` top 10 cost-effective instance types will be tried in the order
(``mem`` and ``cpu`` must be set in order for this to work),
- ``wait_and_retry`` (wait and retry with the same instance type again.),
- ``retry_without_spot`` (try with the same instance type but not a spot instance) : this option is applicable only when
``spot_instance`` is set to ```True``

Expand Down
6 changes: 6 additions & 0 deletions docs/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ Version updates

.. _releases: https://github.com/4dn-dcic/tibanna/releases

**Nov 18, 2022** The latest version is now 3.0.0_.
- Tibanna now supports AWS Graviton-based instances.
- The instance type configuration now allows single instances (e.g., ``t3.micro``) and lists (e.g., ``[t3.micro, t3.small]``). If ``spot_instance`` is enabled, Tibanna will run the workflow on the instance with the highest available capacity. If ``spot_instance`` is disabled, it will run the workflow on the cheapest instance in the list.
- The option ``other_instance_types`` for ``behavior_on_capacity_limit`` has been removed. It will fall back to ``wait_and_retry``.


**Mar 10, 2022** The latest version is now 2.0.0_.
- The default Python version for Tibanna is now 3.8 (or 3.7). Python 3.6 is no longer supported.

Expand Down
58 changes: 21 additions & 37 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "tibanna"
version = "2.2.6"
version = "3.0.0"
description = "Tibanna runs portable pipelines (in CWL/WDL) on the AWS Cloud."
authors = ["4DN-DCIC Team <[email protected]>"]
license = "MIT"
Expand Down Expand Up @@ -43,7 +43,6 @@ flake8 = "^3.9.0"
pytest = "^6.0"
pytest-cov = "^3.0.0"
pytest-parallel = "^0.1.1"
mock = "4.0"
pytest-mock = "3.7"

coverage = {extras = ["toml"], version = "^6.3.2"}
Expand Down
5 changes: 3 additions & 2 deletions scripts/publish-docker
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
export BUILD_LOG=/tmp/build-log
export VERSION=$(python -c 'from tibanna._version import __version__; print(__version__)')
export AWSF_IMAGE=$(python -c 'from tibanna.vars import DEFAULT_AWSF_IMAGE; print(DEFAULT_AWSF_IMAGE)')
docker build -t $AWSF_IMAGE --build-arg version=$VERSION awsf3-docker/ > $BUILD_LOG
docker push $AWSF_IMAGE
# Your local docker driver needs to support the multiple platforms feature
docker buildx build --push --platform linux/amd64,linux/arm64 -t $AWSF_IMAGE --build-arg version=$VERSION awsf3-docker/ > $BUILD_LOG

1 change: 1 addition & 0 deletions test_json/unicorn/medium_nonspot.postrun.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
"status": "0",
"filesystem": "/dev/nvme1n1",
"instance_id": "i-01769a822e5dbb407",
"instance_type": "t3.medium",
"instance_availablity_zone": "us-east-1b",
"total_input_size": "12K",
"total_output_size": "36K",
Expand Down
1 change: 1 addition & 0 deletions test_json/unicorn/small_spot.postrun.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
"status": "0",
"filesystem": "/dev/nvme1n1",
"instance_id": "i-01769a822e5dbb407",
"instance_type": "t3.small",
"instance_availablity_zone": "us-east-1b",
"total_input_size": "12K",
"total_output_size": "36K",
Expand Down
1 change: 1 addition & 0 deletions test_json/unicorn/small_spot_gp3_iops.postrun.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
"status": "0",
"filesystem": "/dev/nvme1n1",
"instance_id": "i-01769a822e5dbb407",
"instance_type": "t3.small",
"instance_availablity_zone": "us-east-1b",
"total_input_size": "12K",
"total_output_size": "36K",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
"status": "0",
"filesystem": "/dev/nvme1n1",
"instance_id": "i-01769a822e5dbb407",
"instance_type": "t3.small",
"instance_availablity_zone": "us-east-1b",
"total_input_size": "12K",
"total_output_size": "36K",
Expand Down
1 change: 1 addition & 0 deletions test_json/unicorn/small_spot_io2_iops.postrun.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
"status": "0",
"filesystem": "/dev/nvme1n1",
"instance_id": "i-01769a822e5dbb407",
"instance_type": "t3.small",
"instance_availablity_zone": "us-east-1f",
"total_input_size": "12K",
"total_output_size": "36K",
Expand Down
1 change: 1 addition & 0 deletions tests/awsf3/postrunjson/GBPtlqb2rFGH.postrun.json
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@
"filesystem": "",
"instance_availablity_zone": "",
"instance_id": "",
"instance_type": "",
"start_time": "20210312-14:04:23-UTC"
},
"config": {
Expand Down
Loading

0 comments on commit e968c0c

Please sign in to comment.