Skip to content
This repository has been archived by the owner on Dec 6, 2018. It is now read-only.

Learn cli v2.0 Roadmap

Jotham Apaloo edited this page Nov 1, 2016 · 1 revision

This roadmap outlines the steps towards completion of learn-2-enhancement-proposal

Table of Contents

Personnel

Except where otherwise specified, Jotham is responsible for completing tasks.

Before November 2016

November 2016

Goal

  • Understand the underlying cause of current differences between v1 and v2 performance

Deliverable(s)

  • Jupyter notebook in learn2_prototype repository explaining the cause of the differences

Tasks

  • Reconcile performance differences between learn1 and learn2
    • check how learn1 treating -9999 values
    • keep a smaller number of variables in learn2
    • identify and implement additional strategies as required

December 2016

Goal

  • Bring performance of v2 to that of v1

Deliverable(s)

  • Jupyter notebook in learn2_prototype repository showing comparable performance for 3 cities (fredericton, hamilton, coquitlam) between v1 and v2

Tasks

  • Improve performance of learn2
    • try other models (e.g. random forest) families
    • explore feature interactions (be advised that the pairwise number of combinations is very large)
    • identify and implement additional strategies as required
  • Review with ian

January 2017

Goal

  • Determine if MLAAS infrastructure meets our needs

Deliverable(s)

  • report detailing criteria and to what degree they are fulfilled by Azure, Turi, AWS ML

Tasks

  • Test/review ML AAS platforms for following criteria
    • programmatic fitting of models (e.g. in response to someone clicking a button in learn-app)
    • versioning
    • access to model internal data and metadata (e.g. params, variable importance)
    • custom / cutting edge models (e.g. with R/python code modules)
      • e.g. georglm that ian shared
    • logs/error handling?
    • ease of changing platform
    • cost
  • Prepare spreadsheet with criteria and brief summary of how each service stacks up
  • Review with spencer, boris, ian

February 2017

This assumes MLAAS infrastructure is not chosen. It is subject to revision based on outcome of previous section.

Goal

  • Determine if, or how to make, ecs suitable for realtime predictions on single observations

Deliverable(s)

  • Pseudo-code implementation
  • Code & task definition for a MWE ecs task

Tasks

  • Quantify the performance needs (e.g. 1s response time)
  • Identify the interface (e.g. a json payload with params and values)
  • Implement caching of predictions. of course the ecs task can lookup/send its results to a cache
    • what is the unique key to a result?
    • model, payload/parameter values
    • collaborate with boris/matt/yves
  • Review with boris

March 2017

Goal

  • Integrate progress during past 4 months

Deliverable(s)

  • mrat model deployed in MLAAS infrastructure, or
  • Deployed beta V2 of learn-cli and an ecs task which supports realtime predictions

Tasks

  • Review past progress with mrat team
  • Jotham, Spencer, Boris decide on infrastructure - MLAAS, ECS, web service
  • formalize and test v2
    • this is either
      • the v2 library with support for realtime predictions
      • or implementation of a model on MLAAS infrastructure

Stretch

Goal

  • Support climate change scenarios in v2

Deliverable(s)

  • V2 library / revised MLAAS model which supports climate change by scaling probabilities

Tasks

  • Resolve climate change
    • the effect of the current approach is to rescale the probabilities while maintaining the limits (0,1)
    • such a transformation can be added as a custom transformer on the end of a pipeline
    • recall that it is convenient to have predictions for all climate scenarios done when the main batch prediction is done and that currently those predictions are in the same dataset as the main (no climate change) prediction
    • review with Ian, Spencer