-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing start, stop with ec2.py, adding validations #1191
base: main
Are you sure you want to change the base?
Conversation
8f297b5
to
2542232
Compare
scripts/aws/ec2.py
Outdated
self.__setup_vsockproxy(log_level) | ||
self.__run_config_server() | ||
self.__run_socks_proxy() | ||
time.sleep(5) #TODO: Change to while loop if required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to sleep here? doesn't subprocess.run
already run the command synchronously and will wait for its completion? For config server, it would be run as separate process so will never need to wait for it anyway?
or is this to wait for config server to startup so you could do validations? if so, you might wanna consider making it more robust (are we 100% sure 5 seconds wait is enough?), or at the very least, logs something during this 5 second wait to inform the customer this script is still running but waiting for something.
or maybe it should be in a loop and every 5 seconds the validation script will try to connect to the server until it's successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is this to wait for config server to startup so you could do validations? i
Yes. we can "Change to while loop if required. " as mentioned in comment.
"""Sets up the necessary auxiliary services and configuration.""" | ||
self.configs = self._get_secret(self.__get_secret_name_from_userdata()) | ||
self.validate_configuration() | ||
log_level = 3 if self.configs["debug_mode"] else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right translation from the old bash script:
VSOCK_LOG_LEVEL=${VSOCK_LOG_LEVEL:-3}
?
(granted don't quite understand this bash script)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if VSOCK_LOG_LEVEL ? then VSOCK_LOG_LEVEL else 3
there is no way to set VSOCK_LOG_LEVEL though.
So I added if debug_mode ? then set to 3 otherwise 1.
In future ticket, we will be adding more tracing on debug mode.
Plan is to evaluate adding
echo 1 > /sys/kernel/debug/tracing/events/vsock/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
setup_vsockproxy | ||
setup_dante | ||
run_config_server | ||
run_enclave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought this is very clear in terms of what are the actual steps and now most of this is grouped under _setup_auxiliaries
in py script which isn't necessarily as just easy to read what are the steps comparing to this.
i don't have any suggestion how to improve but food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abstracted to _setup_auxiliaries
which has
self.__setup_vsockproxy(log_level)
self.__run_config_server()
self.__run_socks_proxy()
and run_compute to run compute.
I went on that route, as developers can easily compare and understand differences between Cloud provider implementation.
As in, _setup_auxiliaries will setup all required processes for starting enclave.
If you need to know what Azure needs, just look into _setup_auxiliaries in Azure.py
This also enables the consistently phased rollout
self._validate_auxiliaries() | ||
command = [ | ||
"nitro-cli", "run-enclave", | ||
"--eif-path", "/opt/uid2operator/uid2operator.eif", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be helpful for debugging to have the ability to pass this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
We need to build an EIF and register the EIF in core. Why not just use the ami build along with EIF? otherwise we have to copy the registered EIF and run that by SSH ing into the host right?
Changes enclave start script to Python, to improve readability, remove duplications, and add more validations while starting enclave.
HOSTNAME, IDENTITY_SCOPE, CORE + OPTOUT_URLS fetched from userdata
Testing
Code is tested by creating an EC2 instance using "ami-0df2894e3f4971bda" (the recently published AMI) and copying the python files onto the instance and starting with python files.
You can change the values in secret manager to see errors thrown based on validations added
Sample run