Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
random_forest_regression.py		random_forest_regression.py

README.md

Overview

Random forest is a Supervised Learning algorithm which uses ensemble learning method for classification and regression.This example shows you how to use OCI Data Flow to build a Random Forest Regression Model to predict mileage (mpg) for cars.

Prerequisites

Before you begin:

Ensure your tenant is configured according to the instructions to setup admin
Know your object store namespace.
(Optional, strongly recommended): Install Spark to test your code locally before deploying.

Load Required Data

Upload a sample CSV file to OCI object store.

Application Setup

Customize random_forest_regression.py with:

Set INPUT_PATH to the OCI path of your CSV data.
Customize random_forest_regression.py with the OCI path to your CSV data. The format is oci://<bucket>@<namespace>/path 2a. Don't know what your namespace is? Run oci os ns get 2b. Don't have the OCI CLI installed? See to install it.
Customize random_forest_regression.py with the OCI path where you would like to save output data.
Recommended: run the sample locally to test it. Refer Develop Oracle Cloud Infrastructure Data Flow Applications Locally, Deploy to The Cloud
Upload random_forest_regression.py to an object store bucket.
Create a Python Data Flow application pointing to random_forest_regression.py

Testing Locally

Test the Application Locally (recommended):You can test the application locally using :

python3 random_forest_regression.py

If it works you'll see output like :

 RMSE=3.888566643366089 r2=0.7532412722764463

Deploy and Run the Application

Copy random_forest_regression to object store.
Create a Data Flow Python application. Refer Create PySpark App for more information.
Run the application.

Run the Application using OCI Cloud Shell or OCI CLI

Create a bucket. Alternatively you can re-use an existing bucket.

oci os object put --bucket-name <bucket> --file random_forest_regression
oci data-flow application create \
    --compartment-id <compartment_ocid> \
    --display-name "Random Forest Regression" \
    --driver-shape VM.Standard2.1 \
    --executor-shape VM.Standard2.1 \
    --num-executors 1 \
    --spark-version 2.4.4 \
    --file-uri oci://<bucket>@<namespace>/loadadw.py \
    --archive-uri oci://<bucket>@<namespace>/archive.zip \
    --language Python
oci data-flow run create \
    --application-id <application_ocid> \
    --compartment-id <compartment_ocid> \
    --application-id <application_ocid> \
    --display-name "Random Forest Regression"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random_forest_regression

random_forest_regression

README.md

Overview

Prerequisites

Load Required Data

Application Setup

Testing Locally

Deploy and Run the Application

Run the Application using OCI Cloud Shell or OCI CLI

Files

random_forest_regression

Directory actions

More options

Directory actions

More options

Latest commit

History

random_forest_regression

Folders and files

parent directory

README.md

Overview

Prerequisites

Load Required Data

Application Setup

Testing Locally

Deploy and Run the Application

Run the Application using OCI Cloud Shell or OCI CLI