This repository has been archived by the owner on Dec 6, 2018. It is now read-only.

Learn cli v2.0 Roadmap

Jump to bottom

Jotham Apaloo edited this page Nov 1, 2016 · 1 revision

This roadmap outlines the steps towards completion of learn-2-enhancement-proposal

Table of Contents

Before November 2016
November 2016
December 2016
January 2017
February 2017
March 2017

Personnel

Except where otherwise specified, Jotham is responsible for completing tasks.

Before November 2016

implement CLI and CLI helpers for working with learn1 for comparisons
implement learn1 lda in such a way that we can easily get class predictions
run learn1 and learn2 on fredericton, hamilton, coquitlam data using continuous, continuous scaled, all data, and all data with continuous scaled data subsets
- not that the continuous/scaled/quantiles came from here which identified the potential to reduce the number of variables used in our model
- if it works, makes things much simpler for users and lowers our data storage requirements

November 2016

Goal

Understand the underlying cause of current differences between v1 and v2 performance

Deliverable(s)

Jupyter notebook in learn2_prototype repository explaining the cause of the differences

Tasks

Reconcile performance differences between learn1 and learn2
- check how learn1 treating -9999 values
- keep a smaller number of variables in learn2
- identify and implement additional strategies as required

December 2016

Goal

Bring performance of v2 to that of v1

Deliverable(s)

Jupyter notebook in learn2_prototype repository showing comparable performance for 3 cities (fredericton, hamilton, coquitlam) between v1 and v2

Tasks

Improve performance of learn2
- try other models (e.g. random forest) families
- explore feature interactions (be advised that the pairwise number of combinations is very large)
- identify and implement additional strategies as required
Review with ian

January 2017

Goal

Determine if MLAAS infrastructure meets our needs

Deliverable(s)

report detailing criteria and to what degree they are fulfilled by Azure, Turi, AWS ML

Tasks

Test/review ML AAS platforms for following criteria
- programmatic fitting of models (e.g. in response to someone clicking a button in learn-app)
- versioning
- access to model internal data and metadata (e.g. params, variable importance)
- custom / cutting edge models (e.g. with R/python code modules)
  - e.g. georglm that ian shared
- logs/error handling?
- ease of changing platform
- cost
Prepare spreadsheet with criteria and brief summary of how each service stacks up
Review with spencer, boris, ian

February 2017

This assumes MLAAS infrastructure is not chosen. It is subject to revision based on outcome of previous section.

Goal

Determine if, or how to make, ecs suitable for realtime predictions on single observations

Deliverable(s)

Pseudo-code implementation
Code & task definition for a MWE ecs task

Tasks

Quantify the performance needs (e.g. 1s response time)
Identify the interface (e.g. a json payload with params and values)
Implement caching of predictions. of course the ecs task can lookup/send its results to a cache
- what is the unique key to a result?
- model, payload/parameter values
- collaborate with boris/matt/yves
Review with boris

March 2017

Goal

Integrate progress during past 4 months

Deliverable(s)

mrat model deployed in MLAAS infrastructure, or
Deployed beta V2 of learn-cli and an ecs task which supports realtime predictions

Tasks

Review past progress with mrat team
Jotham, Spencer, Boris decide on infrastructure - MLAAS, ECS, web service
formalize and test v2
- this is either
  - the v2 library with support for realtime predictions
  - or implementation of a model on MLAAS infrastructure

Stretch

Goal

Support climate change scenarios in v2

Deliverable(s)

V2 library / revised MLAAS model which supports climate change by scaling probabilities

Tasks

Resolve climate change
- the effect of the current approach is to rescale the probabilities while maintaining the limits (0,1)
- such a transformation can be added as a custom transformer on the end of a pipeline
- recall that it is convenient to have predictions for all climate scenarios done when the main batch prediction is done and that currently those predictions are in the same dataset as the main (no climate change) prediction
- review with Ian, Spencer