Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate the ranger library for batch training with Random Forest #19

Open
georgeslabreche opened this issue May 5, 2021 · 0 comments
Assignees
Labels
enhancement This issue is related to a new feature epic This issue is an epic and needs to be broken downs into smaller issues

Comments

@georgeslabreche
Copy link
Owner

Description

Use a lightweight C++ library for Random Forest batch training using the training data logged in the training.csv file.

The ranger Libraries

ranger is a fast implementation of random forests (Breiman 2001) or recursive partitioning, particularly suited for high dimensional data. Classification, regression, and survival forests are supported. Classification and regression forests are implemented as in the original Random Forest (Breiman 2001), survival forests as in Random Survival Forests (Ishwaran et al. 2008). Includes implementations of extremely randomized trees (Geurts et al. 2006) and quantile regression forests (Meinshausen 2006).

The GitHub repo: https://github.com/imbs-hl/ranger

Saving the trained models seems possible:
https://github.com/imbs-hl/ranger/blob/ce497711884c783e133fb36750b60de4c140773f/src/Forest.cpp#L403-L443

The training input

Use the training inputs collected during flight:

  • PD values
  • Target label is in the 8th column.

The training inputs collected during flight:
https://github.com/georgeslabreche/opssat-orbitai/blob/main/results/learning/mochi-2021-04-18_02-51-48/logs/training.csv

What to do

  1. Compile and run C++ implementation of ranger.
  2. Check that trained model can indeed be saved and loaded.
  3. Transform the training.csv data file into whatever format is expected by ranger to train with PD inputs and the target label.
  4. Compile for ARM architecture and check that it can run on the flatsat.
  5. Run ranger with the PD values and target label inputs.
  6. Calculate the classification metrics of the trained model by using the same dataset used in the paper when evaluating the performance of the Mochi models.
  7. Integrate ranger into the OrbitAI app.

We can discuss how we approach step 7 once step 1 to 6 have been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue is related to a new feature epic This issue is an epic and needs to be broken downs into smaller issues
Projects
None yet
Development

No branches or pull requests

2 participants