Random forest is a Supervised Learning algorithm which uses ensemble learning method for classification and regression.This example shows you how to use OCI Data Flow to build a Random Forest Regression Model to predict mileage (mpg) for cars.
Before you begin:
- Ensure your tenant is configured according to the instructions to setup admin
- Know your object store namespace.
- (Optional, strongly recommended): Install Spark to test your code locally before deploying.
Upload a sample CSV file to OCI object store.
Customize random_forest_regression.py
with:
- Set INPUT_PATH to the OCI path of your CSV data.
- Customize
random_forest_regression.py
with the OCI path to your CSV data. The format isoci://<bucket>@<namespace>/path
2a. Don't know what your namespace is? Runoci os ns get
2b. Don't have the OCI CLI installed? See to install it. - Customize
random_forest_regression.py
with the OCI path where you would like to save output data. - Recommended: run the sample locally to test it. Refer Develop Oracle Cloud Infrastructure Data Flow Applications Locally, Deploy to The Cloud
- Upload
random_forest_regression.py
to an object store bucket. - Create a Python Data Flow application pointing to
random_forest_regression.py
Test the Application Locally (recommended):You can test the application locally using :
python3 random_forest_regression.py
If it works you'll see output like :
RMSE=3.888566643366089 r2=0.7532412722764463
- Copy
random_forest_regression
to object store. - Create a Data Flow Python application. Refer Create PySpark App for more information.
- Run the application.
Create a bucket. Alternatively you can re-use an existing bucket.
oci os object put --bucket-name <bucket> --file random_forest_regression
oci data-flow application create \
--compartment-id <compartment_ocid> \
--display-name "Random Forest Regression" \
--driver-shape VM.Standard2.1 \
--executor-shape VM.Standard2.1 \
--num-executors 1 \
--spark-version 2.4.4 \
--file-uri oci://<bucket>@<namespace>/loadadw.py \
--archive-uri oci://<bucket>@<namespace>/archive.zip \
--language Python
oci data-flow run create \
--application-id <application_ocid> \
--compartment-id <compartment_ocid> \
--application-id <application_ocid> \
--display-name "Random Forest Regression"