Skip to content

Commit

Permalink
Colab Triage Tutorial (#878)
Browse files Browse the repository at this point in the history
* Created using Colaboratory

* Created using Colaboratory

* Created using Colaboratory

* First draft end-to-end (WIP)

* Run through and debugging

* Better introduction to orient the modeling problem

* update docs

* Fix bug in postgres setup
  • Loading branch information
shaycrk authored Jun 21, 2022
1 parent b360b83 commit b089208
Show file tree
Hide file tree
Showing 5 changed files with 6,010 additions and 111 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ Triage is designed to:

## Quick Links

- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Are you completely new to Triage? Go through the tutorial here with sample data
- [Tutorial on Google Colab](https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb) - Are you completely new to Triage? Run through a quick tutorial hosted on google colab (no setup necessary) to see what triage can do!
- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Want a more in-depth walk through of triage's functionality and concepts? Go through the dirty duck tutorial here with sample data
- [QuickStart Guide](https://dssg.github.io/triage/quickstart/) - Try Triage out with your own project and data
- [Triage Documentation Site](https://dssg.github.io/triage/) - Used Triage before and want more reference documentation?
- [Development](https://github.com/dssg/triage#development) - Contribute to Triage development.
Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ plugins:

nav:
- Home: index.md
- Online Tutorial (Google Colab): https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb
- Get started with your own project:
- Quickstart guide: quickstart.md
- Suggested workflow: triage_project_workflow.md
Expand Down
122 changes: 13 additions & 109 deletions docs/sources/index.md
Original file line number Diff line number Diff line change
@@ -1,122 +1,26 @@
Triage
======
# Triage

Data Science Toolkit for Social Good and Public Policy Problems
[![Build Status](https://travis-ci.org/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage)
[![codecov](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage)
[![codeclimate](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage)

[![image](https://travis-ci.com/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage)
[![image](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage)
[![image](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage)

Building data science systems requires answering many design questions, turning them into modeling choices, which in turn run machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanantory variables) generation, model/classifier training, evaluation, selection, and list generation are often complicated and hard to choose apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project.
## What is Triage?

Triage is designed to:
Triage is an open source machine learning toolkit to help data scientists, machine learning developers, and analysts quickly prototype, build and evaluate end-to-end predictive risk modeling systems for public policy and social good problem.

- Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions.
- Provide an integrated interface to components that are needed throughout a data science project workflow.
While many tools (sklearn, keras, pytorch, etc.) exist to build ML models, an end-to-end project requires a lot more than just building models. Developing data science systems requires making many design decisions that need to match with how the system is going to be used. These choices then get turned into modeling choices and code. Triage lets you focus on the problem you’re solving and guides you through design choices you need to make at each step of the machine learning pipeline.

## Quick Links
## How to get started with Triage?

- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Are you completely new to Triage? Go through the tutorial here with sample data
- [QuickStart Guide](https://dssg.github.io/triage/quickstart/) - Try Triage out with your own project and data
- [Triage Documentation Site](https://dssg.github.io/triage/) - Used Triage before and want more reference documentation?
- [Development](https://github.com/dssg/triage#development) - Contribute to Triage development.
### [Go through a quick online tutorial with sample data (no setup required)](https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb)

## Installation
### [Go through a more in-depth tutorial with sample data](dirtyduck/index.md)

To install Triage, you need:
### [Get started with your own project and data](quickstart.md)

- Python 3.8+
- A PostgreSQL 9.6+ database with your source data (events,
geographical data, etc) loaded.
- **NOTE**: If your database is PostgreSQL 11+ you will get some
speed improvements. We recommend to update to a recent
version of PostgreSQL.
- Ample space on an available disk, (or for example in Amazon Web
Services's S3), to store the needed matrices and models for your
experiments

We recommend starting with a new python virtual environment (with Python 3.6 or greater) and pip installing triage there.
```bash
$ virtualenv triage-env
$ . triage-env/bin/activate
(triage-env) $ pip install triage
```
## Background

## Data
Triage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in [example/database.yaml](https://github.com/dssg/triage/blob/master/example/database.yaml)).
Triage was initially developed at the University of Chicago's [Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained and enhanced at Carnegie Mellon University.

If you don't want to install Postgres yourself, try `triage db up` to create a vanilla Postgres 12 database using docker. For more details on this command, check out [Triage Database Provisioner](db.md)

## Configure Triage for your project

Triage is configured with a config.yaml file that has parameters defined for each component. You can see some [sample configuration with explanations](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) to see what configuration looks like.

## Using Triage

1. Via CLI:
```bash

triage experiment example/config/experiment.yaml
```
2. Import as a python package:
```python
from triage.experiments import SingleThreadedExperiment

experiment = SingleThreadedExperiment(
config=experiment_config, # a dictionary
db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html
project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/'
)
experiment.run()
```

There are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the [Running an Experiment](https://dssg.github.io/triage/experiments/running/) page.

## Development

Triag was initially developed at [University of Chicago's Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained at Carnegie Mellon University.

To build this package (without installation), its dependencies may
alternatively be installed from the terminal using `pip`:

pip install -r requirement/main.txt

### Testing

To add test (and development) dependencies, use **test.txt**:

pip install -r requirement/test.txt [-r requirement/dev.txt]

Then, to run tests:

pytest

### Development Environment

To quickly bootstrap a development environment, having cloned the
repository, invoke the executable `develop` script from your system
shell:

./develop

A "wizard" will suggest set-up steps and optionally execute these, for
example:

(install) begin

(pyenv) installed

(python-3.9.10) installed

(virtualenv) installed

(activation) installed

(libs) install?
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}
2) no, ignore
#? 1

### Contributing

If you'd like to contribute to Triage development, see the [CONTRIBUTING.md](https://github.com/dssg/triage/blob/master/CONTRIBUTING.md) document.
2 changes: 1 addition & 1 deletion docs/update_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ def copy_templates():

if __name__ == "__main__":
#copy_templates()
update_index_md()
#update_index_md()
#generate_api_docs()
Loading

0 comments on commit b089208

Please sign in to comment.