Removing start, stop with ec2.py, adding validations #1191

abuabraham-ttd · 2024-12-09T20:25:46Z

Changes enclave start script to Python, to improve readability, remove duplications, and add more validations while starting enclave.

Removed unused configurations
HOSTNAME, IDENTITY_SCOPE, CORE + OPTOUT_URLS fetched from userdata
Removes duplications in user-data parsing to get secret manger key name
Removes unused default cpu, memory and added values from nitro-enclave config to get right value
Remove read_allocation, update_allocation as it is unused
Adds typing to config_server, to improve readability
Kills only valid processes, added flask as it was missing earlier
Requires core, optout urls to be specified in secret manager

Testing

Code is tested by creating an EC2 instance using "ami-0df2894e3f4971bda" (the recently published AMI) and copying the python files onto the instance and starting with python files.

python3 -m venv /opt/uid2operator/init
nano requirements.txt and copy the requirements.txt there 
/opt/uid2operator/init/bin/pip install -r requirements.txt 
nano /opt/uid2operator/confidential_compute.py copy and save contents 
nano /opt/uid2operator/ec2.py copy and save contents
nano /etc/systemd/system/uid2operator.service update contents and save
sudo systemctl daemon-reload
sudo systemctl restart uid2operator

and

curl http://127.0.0.1:80/ops/healthcheck

You can change the values in secret manager to see errors thrown based on validations added

Sample run

Running in us-east-1
Fetched configs from uid2-config-stack-tjm-unvalidate-eif-test1
Validated https://core-integ.uidapi.com/ matches other config parameters
Validated https://optout-integ.uidapi.com/ matches other config parameters
Validated operator key matches environment
Validated connectivity to https://core-integ.uidapi.com/
Validated connectivity to https://optout-integ.uidapi.com/
Completed static validation of confidential compute config values
Running command: /usr/bin/vsockpx -c /etc/uid2operator/proxy.yaml --workers 5 --log-level 3 --daemon
Running command: ./bin/flask run --host 127.0.0.1 --port 27015
Running command: sockd -D
Connecting to config server, attempt 1 failed with ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=27015): Max retries exceeded with url: /getConfig (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efebaadfca0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Config server is reachable
Connectivity check to config server passes
Running in debug_mode
Running command: nitro-cli run-enclave --eif-path /opt/uid2operator/uid2operator.eif --memory 24576 --cpu-count 6 --enclave-cid 42 --enclave-name uid2operator --debug-mode --attach-console
Start allocating memory...

scripts/aws/ec2.py

sunnywu · 2024-12-11T06:44:51Z

scripts/aws/ec2.py

+        self.__setup_vsockproxy(log_level)
+        self.__run_config_server()
+        self.__run_socks_proxy()
+        time.sleep(5)  #TODO: Change to while loop if required. 


why do we need to sleep here? doesn't subprocess.run already run the command synchronously and will wait for its completion? For config server, it would be run as separate process so will never need to wait for it anyway?

or is this to wait for config server to startup so you could do validations? if so, you might wanna consider making it more robust (are we 100% sure 5 seconds wait is enough?), or at the very least, logs something during this 5 second wait to inform the customer this script is still running but waiting for something.

or maybe it should be in a loop and every 5 seconds the validation script will try to connect to the server until it's successful.

or is this to wait for config server to startup so you could do validations? i

Yes. we can "Change to while loop if required. " as mentioned in comment.

scripts/aws/ec2.py

sunnywu · 2024-12-11T07:11:21Z

scripts/aws/ec2.py

+        """Sets up the necessary auxiliary services and configuration."""
+        self.configs = self._get_secret(self.__get_secret_name_from_userdata())
+        self.validate_configuration()
+        log_level = 3 if self.configs["debug_mode"] else 1


is this right translation from the old bash script:

VSOCK_LOG_LEVEL=${VSOCK_LOG_LEVEL:-3}
?
(granted don't quite understand this bash script)

if VSOCK_LOG_LEVEL ? then VSOCK_LOG_LEVEL else 3

there is no way to set VSOCK_LOG_LEVEL though.

So I added if debug_mode ? then set to 3 otherwise 1.

In future ticket, we will be adding more tracing on debug mode.

Plan is to evaluate adding

echo 1 > /sys/kernel/debug/tracing/events/vsock/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on

scripts/aws/ec2.py

sunnywu · 2024-12-11T07:34:43Z

scripts/aws/start.sh

-setup_vsockproxy
-setup_dante
-run_config_server
-run_enclave


i thought this is very clear in terms of what are the actual steps and now most of this is grouped under _setup_auxiliaries in py script which isn't necessarily as just easy to read what are the steps comparing to this.

i don't have any suggestion how to improve but food for thought.

abstracted to _setup_auxiliaries which has

self.__setup_vsockproxy(log_level)
self.__run_config_server()
self.__run_socks_proxy()

and run_compute to run compute.

I went on that route, as developers can easily compare and understand differences between Cloud provider implementation.

As in, _setup_auxiliaries will setup all required processes for starting enclave.

If you need to know what Azure needs, just look into _setup_auxiliaries in Azure.py

This also enables the consistently phased rollout

scripts/aws/ec2.py

cody-constine-ttd · 2024-12-11T22:27:01Z

scripts/aws/ec2.py

+        self._validate_auxiliaries()
+        command = [
+            "nitro-cli", "run-enclave",
+            "--eif-path", "/opt/uid2operator/uid2operator.eif",


would be helpful for debugging to have the ability to pass this in.

why?

We need to build an EIF and register the EIF in core. Why not just use the ami build along with EIF? otherwise we have to copy the registered EIF and run that by SSH ing into the host right?

scripts/aws/ec2.py

Removing start, stop with ec2.py, adding validations

3b11c13

abuabraham-ttd requested a review from sunnywu December 9, 2024 20:25

abuabraham-ttd changed the title ~~Removing start, stop with ec2.py, adding validations~~ DRAFT: Removing start, stop with ec2.py, adding validations Dec 9, 2024

Removing start, stop with ec2.py, adding validations

6034c5e

abuabraham-ttd marked this pull request as draft December 9, 2024 22:25

abuabraham-ttd changed the title ~~DRAFT: Removing start, stop with ec2.py, adding validations~~ Removing start, stop with ec2.py, adding validations Dec 9, 2024

abuabraham-ttd added 2 commits December 9, 2024 17:18

Updates

d1f1756

Updates

f42f872

sunnywu reviewed Dec 10, 2024

View reviewed changes

scripts/aws/ec2.py Outdated Show resolved Hide resolved

Add virtual env and start it in systemd

2542232

abuabraham-ttd force-pushed the abu-UID2-4555-EC2-improvements branch from 8f297b5 to 2542232 Compare December 10, 2024 19:42

abuabraham-ttd added 5 commits December 10, 2024 11:44

Add virtual env and start it in systemd

edf85f3

Add virtual env and start it in systemd

3e95e4c

use venv like flask service

cb70032

use versions

5fe844c

Add URL validation

937e7a2

abuabraham-ttd marked this pull request as ready for review December 10, 2024 20:59

abuabraham-ttd added 5 commits December 10, 2024 13:14

Move validations around

2b23ff0

Move validations around

44aa71f

Move validations around

711d50b

Remove aws implemnttion from typedict

4c694e7

Remove aws implemnttion from typedict

62cc490

abuabraham-ttd requested review from cYKatherine, cody-constine-ttd, thomasm-ttd and Ian-Nara December 10, 2024 21:45

sunnywu reviewed Dec 11, 2024

View reviewed changes

abuabraham-ttd added 2 commits December 11, 2024 08:27

Adding more logs

5de70be

Adding min capacity

77f1f4a

Loop every sec for 10sec for confg server to be up

a4241fc

Ian-Nara reviewed Dec 11, 2024

View reviewed changes

scripts/aws/ec2.py Outdated Show resolved Hide resolved

Ian-Nara reviewed Dec 11, 2024

View reviewed changes

scripts/aws/ec2.py Show resolved Hide resolved

abuabraham-ttd added 2 commits December 11, 2024 10:10

Fix regex

0bff456

validate after default

e669887

Ian-Nara approved these changes Dec 11, 2024

View reviewed changes

cody-constine-ttd reviewed Dec 11, 2024

View reviewed changes

abuabraham-ttd and others added 8 commits December 11, 2024 14:41

Add tested min values for capacity

85fc3e7

[CI Pipeline] Released Snapshot version: 5.43.1-alpha-93-SNAPSHOT

d7b24c7

Add to build eif stage

4499dcf

[CI Pipeline] Released Snapshot version: 5.43.2-alpha-94-SNAPSHOT

3dd967d

Dont check for enclave, kill all

45b2908

Change version on ami build

d890e5d

[CI Pipeline] Released Snapshot version: 5.43.3-alpha-100-SNAPSHOT

8eeaf9a

Use AuxilaryConfig to store and return URLs

eb8955c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing start, stop with ec2.py, adding validations #1191

Removing start, stop with ec2.py, adding validations #1191

abuabraham-ttd commented Dec 9, 2024 •

edited

Loading

sunnywu Dec 11, 2024 •

edited

Loading

abuabraham-ttd Dec 11, 2024

sunnywu Dec 11, 2024

abuabraham-ttd Dec 11, 2024

sunnywu Dec 11, 2024

abuabraham-ttd Dec 11, 2024

cody-constine-ttd Dec 11, 2024

abuabraham-ttd Dec 12, 2024

Removing start, stop with ec2.py, adding validations #1191

Are you sure you want to change the base?

Removing start, stop with ec2.py, adding validations #1191

Conversation

abuabraham-ttd commented Dec 9, 2024 • edited Loading

sunnywu Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

abuabraham-ttd Dec 11, 2024

Choose a reason for hiding this comment

sunnywu Dec 11, 2024

Choose a reason for hiding this comment

abuabraham-ttd Dec 11, 2024

Choose a reason for hiding this comment

sunnywu Dec 11, 2024

Choose a reason for hiding this comment

abuabraham-ttd Dec 11, 2024

Choose a reason for hiding this comment

cody-constine-ttd Dec 11, 2024

Choose a reason for hiding this comment

abuabraham-ttd Dec 12, 2024

Choose a reason for hiding this comment

abuabraham-ttd commented Dec 9, 2024 •

edited

Loading

sunnywu Dec 11, 2024 •

edited

Loading