To help make R more accessible on Serverless Cloud Hosting, mainly with AWS Lambda. Python Package Index Releases: https://pypi.org/project/serveRmore/
Before this utility can be used, there are a number of assumptions made about your AWS environment. If you have a master account or your IAM account is provided full admin permissions, you can skip over the permissions section.
Permissions:
- Can create and terminate EC2 instances
- Can create API Gateways & Modify them
- Can create Lambda Functions & Modify them
- Can create S3 Buckets & Modify them
Also, if you do not already have a Base R Runtime Layer for your new Lambda Function, you'll need to build your own to use before your new Function will work. Some knowledge of AWS EC2 and AWS Cloud Networking is required in order to build a layer using the scripts provided.
Please refer to the LAPTOP.md Guide for necessary manual configurations.
To install the latest package:
python3 -m pip install serveRmore
In addition to the utility, if you plan to create your own R base Runtime Layer as well, you'll need to clone the entire repository locally:
git clone [email protected]:Origent/ServeRmore.git
Note:: We have used the following repo for inspiration on managing our layers: https://github.com/bakdata/aws-lambda-r-runtime
Create a new file called "serveRmore.yaml" in your home directory. The template for the YAML file is shown below:
build_vm:
ami: ami-02507631a9f7bc956
default_security_group: null
ssh_security_group: null
subnet: null
instance_type: t2.large
domain_name: null
instance_id: null
private_key: null
dev:
additional_layer:
arn: null
name: null
r_packages: null
aws:
s3_bucket: null
s3_key: null
function:
arn_role: arn:aws:iam::<AWS_ID>:role/lambda_basic_execution
handler: lambda.handler
name: null
zip_file_name: null
runtime: provided.al2
runtime_layer:
name: r-runtime-4_0_3
arn: arn:aws:lambda:us-east-1:<AWS_ID>:layer:<name>:<version>
r_packages: httr logging yaml jsonlite aws.s3
r_version: 4.0.3
env: dev
For deploying a new Lambda function only (i.e. not including a Lambda layer), at the least you will need the following parameters:
arn_role
: AWS Account IDhandler
: path to the starting method callname
: the name of the Lambda functionzip_file_name
: temporary zip file that contains the main script along with any helper scripts required by the functions3_bucket
: storage bucket name for temporary function and layer zip files used to publish to Lambda Service.s3_key
: the directory path within the bucketarn
: "ARN" address for a configured runtime layer
Additional Settings for Building the Layer:
ami
: The Amazon Machine Image ID. The specific one listed above is required, as it uses Amazon Linux 2 operating system with the Docker agent pre-installed. Our scripts will pull from DockerHub.default_security_group
: When creating an EC2 virtual machine instance, a security group is created automatically. We recommend creating your own, or grabbing an existing security group ID and using that as your default here. A security group is similar to a firewall, but is wrapped around a group of instances.ssh_security_group
: In order for the scripts to work, SSH must be enabled and reachable with the new Virtual Machine the build script creates. We recommend creating a new security group and allowing SSH port 22 inside the security group, and recording the ID here.subnet
: When creating an EC2 virtual machine instance, it is added to a subnet and provided an IP address. The subnets list can be found in the EC2 console. Add the ID to one of them here.instance_type
: The type determines cost and capability of the virtual machine. The type provided has been tested, but many others could potentially work.instance_id
: After the build script creates the virtual machine, the virtual machine Instance ID will be automatically placed here.domain_name
: After the build script creates the virtual machine, the domain name of the VM will be automatically placed here.private_key
: SSH key for AWS EC2 (see step 3 of the LAPTOP.md Guide)
-
Create a new lambda.R script and create a
handler
method in R. Insert "hello world" or custom code inside your handler method. -
Try out the SRM utility with any of these commands:
srm help
srm version
srm status
-
Create a new deploy.R script to do the following: (1) generate a zipped file containing your lambda.R script (and other helper scripts required by the function) and (2) upload the zipped file to the S3 directory specified by the
s3_bucket
ands3_key
parameters in the YAML file. -
To deploy your zip file directly to Lambda, try out our new workflow here.
srm env <name-of-env-in-yaml> (i.e. dev)
srm lambda create
srm lambda update
srm lambda invoke
srm lambda destroy
Note: create
will establish a brand new Lambda function, if it does not exist, and publish your zip file; update
will republish your zip file, if your lambda function already exists.
If you don't already have an R Runtime layer, you'll have to create your own before you can get your function code to run. We provide instructions for creating an R base Runtime Layer only, with intention to improve our scripts and instructions to include multiple layers. If all is setup correctly from the additional settings above, all the heavy lifting is done! Building and Publishing the new R runtime layer only requires running three commands and waiting for them to complete.
srm create
srm deploy
srm terminate
Double check the AWS Lambda Console and Layers registry as well as your serveRmore.yaml file to confirm that your layer was indeed published.
The following is included and required for the Runtime to work:
- R 4.0.x - In theory, all builds of 4.x should work, but only this version has passed testing.
- httr - Used to communicate with other web APIs.
- jsonlite - Used to load, parse, and create JSON documents.
- aws.s3 - Used to interact with AWS S3 storage buckets.
- logging - Used to help create well formed log streams.
- yaml - Used to set configuration settings in a standardized way.
Note:: The build and compilation process uses a Docker image called docker-lambda.
If there are challenges with the layer build, there are ways to enter into an interactive mode. First, make sure that you've already run deploy once before without terminating. Next, check the status to ensure a VM is running. Finally, login to the VM itself and then the Docker container through the following commands:
srm deploy
srm status
srm ssh
docker run -it lambda-r:build-4.0.x bash
There's a way to see which shared libraries are being used in the build environment by running the following command to get a list:
ldd /opt/R/bin/exec/R
There's also a way to introduce print log statements in the Lambda R Runtime layer that will add log entries into AWS CloudWatch from AWS Lambda. Once inside the Docker container, change directories and view the following file:
ServeRmore/layer/r-runtime/build/layers/r-runtime/build/layer/R/library/base/R/Rprofile
Then browse until you encounter the following function:
.First.sys()
Next, enter any of the following print statements, or enter your own:
print(paste0("PATH = ", Sys.getenv("PATH")))
print(paste0("Listing files in PATH /usr/local/bin:", paste(list.files("/usr/local/bin/"), collapse = ",")))
print(paste0("Listing files in PATH /usr/bin/:", paste(list.files("/usr/bin/"), collapse = ",")))
print(paste0("Listing files in PATH /bin:", paste(list.files("/bin/"), collapse = ",")))
print(paste0("Listing files in PATH /opt/bin", paste(list.files("/opt/bin/"), collapse = ",")))
print(paste0("R.home() = ", file.path(R.home())))
print(paste0("Listing files in ", file.path(R.home(), "library"), ":", paste(list.files(file.path(R.home(), "library")), collapse = ",")))
Finally, exit the Docker container, and your VM, then re-run the deploy command:
srm deploy
Your new R Runtime Layer should now be published with your print statements.
AWS Lambda is limited to running with 3GB RAM and must finish within 15 minutes. It is therefore not feasible to execute long running R scripts with this runtime. Furthermore, only the /tmp/
directory is writeable on AWS Lambda. This must be considered when writing to the local disk.
If you decide to create your own layer, here's a few things to think about and a few steps to help you get started.
- There is a current limit of 5 layers that a Lambda Function can have.
- The Lambda Layer zip package has size limits. For example, it is extremely unlikely to be able to package up the entire Tidyverse as a layer. This could change as the AWS Lambda Service changes its requirements.
- The more that is added to the layer, the slower the function performance will become, as it will be spending more time starting up the environment to run the function code.
- Precision is important. Unlike an R&D or exploratory programming environment, each decision has an impact on functionality, performance, and quality.
Please refer to our guide for more information. CONTRIBUTING.md