Skip to content

hzjane/BigDL-PPML-Azure-Occlum-Example

 
 

Repository files navigation

BigDL-PPML-Azure-Occlum-Example

Overview

This repository demonstrates how to run standard Apache Spark applications with BigDL PPML and Occlum on Azure Intel SGX enabled Confidential Virtual machines (DCsv3 or Azure Kubernetes Service (AKS)). These Azure Virtual Machines include the Intel SGX extensions.

Key points:

  • Azure DC Series: We run distributed Spark 3.1.2 examples, on an Azure DCsv3 machine running Docker. These machines are backed by the 3rd generation Intel Xeon Scalabe Processor with large Encrypted Page Cache (EPC) memory.
  • Occlum: To run Spark inside an Intel SGX enclave - we leverage Occlum, who have essentially taken the Open Source Spark code, and wrapped it with their enclave runtime so that Spark can run inside SGX enclaves (a task that requires deep knowledge of the SGX ecosystem - something Occlum is an expert at).

Distributed Spark in SGX on Azure

  • For Azure attestation details in Occlum init process please refer to maa_init.

Prerequisites

Pull the image from Dockerhub.

docker pull intelanalytics/bigdl-ppml-azure-occlum:2.1.0-SNAPSHOT

Or you can clone this repository and build image with build-docker-image.sh. Configure environment variables in build-docker-image.sh.

Build the docker image:

bash build-docker-image.sh

Single Node Spark Examples on Azure

Single node Spark Examples require 1 Azure VM with SGX. All examples are running in SGX. You can apply it to your application with a few changes in dockerfile or scripts.

SparkPi example

Run the SparkPi example with run_spark_on_occlum_glibc.sh.

docker run --rm -it \
    --name=azure-ppml-example-with-occlum \
    --device=/dev/sgx/enclave \
    --device=/dev/sgx/provision \
    intelanalytics/bigdl-ppml-azure-occlum:2.1.0-SNAPSHOT bash 

cd /opt
bash run_spark_on_occlum_glibc.sh pi

Nytaxi example with Azure NYTaxi

Run the Nytaxi example with run_azure_nytaxi.sh.

docker run --rm -it \
    --name=azure-ppml-example-with-occlum \
    --device=/dev/sgx/enclave \
    --device=/dev/sgx/provision \
    intelanalytics/bigdl-ppml-azure-occlum:2.1.0-SNAPSHOT bash 

bash run_azure_nytaxi.sh

You should get Nytaxi dataframe count and aggregation duration when succeed.

Distributed Spark Examples on AKS

SparkPi on AKS

Configure environment variables in run_spark_pi.sh, driver.yaml and executor.yaml. Then you can submit SparkPi task with run_spark_pi.sh.

bash run_spark_pi.sh

Nytaxi on AKS

Configure environment variables in run_nytaxi_k8s.sh, driver.yaml and executor.yaml. Then you can submit Nytaxi query task with run_nytaxi_k8s.sh.

bash run_nytaxi_k8s.sh

Known issues

  1. If you meet the following error when running the docker image:
aesm_service[10]: Failed to set logging callback for the quote provider library.
aesm_service[10]: The server sock is 0x5624fe742330

This may be associated with SGX DCAP. And it's expected error message if not all interfaces in quote provider library are valid, and will not cause a failure.

  1. If you meet the following error when running MAA example:
[get_platform_quote_cert_data ../qe_logic.cpp:352] p_sgx_get_quote_config returned NULL for p_pck_cert_config.
thread 'main' panicked at 'IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed', /opt/src/occlum/tools/toolchains/dcap_lib/src/occlum_dcap.rs:70:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ERROR] occlum-pal: The init process exit with code: 101 (line 62, file src/pal_api.c)
[ERROR] occlum-pal: Failed to run the init process: EINVAL (line 150, file src/pal_api.c)
[ERROR] occlum-pal: Failed to do ECall: occlum_ecall_broadcast_interrupts with error code 0x2002: Invalid enclave identification. (line 26, file src/pal_interrupt_thread.c)
/opt/occlum/build/bin/occlum: line 337:  3004 Segmentation fault      (core dumped) RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"

This may be associated with [RFC] IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed.

Reference

  1. https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html
  2. https://github.com/intel-analytics/BigDL
  3. https://github.com/occlum/occlum
  4. Confidential Data Analytics with Apache Spark on Intel SGX Confidential Containers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 86.1%
  • Dockerfile 10.0%
  • Scala 3.9%