-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * First draft end-to-end (WIP) * Run through and debugging * Better introduction to orient the modeling problem * update docs * Fix bug in postgres setup
- Loading branch information
Showing
5 changed files
with
6,010 additions
and
111 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,122 +1,26 @@ | ||
Triage | ||
====== | ||
# Triage | ||
|
||
Data Science Toolkit for Social Good and Public Policy Problems | ||
[![Build Status](https://travis-ci.org/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage) | ||
[![codecov](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage) | ||
[![codeclimate](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage) | ||
|
||
[![image](https://travis-ci.com/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage) | ||
[![image](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage) | ||
[![image](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage) | ||
|
||
Building data science systems requires answering many design questions, turning them into modeling choices, which in turn run machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanantory variables) generation, model/classifier training, evaluation, selection, and list generation are often complicated and hard to choose apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project. | ||
## What is Triage? | ||
|
||
Triage is designed to: | ||
Triage is an open source machine learning toolkit to help data scientists, machine learning developers, and analysts quickly prototype, build and evaluate end-to-end predictive risk modeling systems for public policy and social good problem. | ||
|
||
- Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions. | ||
- Provide an integrated interface to components that are needed throughout a data science project workflow. | ||
While many tools (sklearn, keras, pytorch, etc.) exist to build ML models, an end-to-end project requires a lot more than just building models. Developing data science systems requires making many design decisions that need to match with how the system is going to be used. These choices then get turned into modeling choices and code. Triage lets you focus on the problem you’re solving and guides you through design choices you need to make at each step of the machine learning pipeline. | ||
|
||
## Quick Links | ||
## How to get started with Triage? | ||
|
||
- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Are you completely new to Triage? Go through the tutorial here with sample data | ||
- [QuickStart Guide](https://dssg.github.io/triage/quickstart/) - Try Triage out with your own project and data | ||
- [Triage Documentation Site](https://dssg.github.io/triage/) - Used Triage before and want more reference documentation? | ||
- [Development](https://github.com/dssg/triage#development) - Contribute to Triage development. | ||
### [Go through a quick online tutorial with sample data (no setup required)](https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb) | ||
|
||
## Installation | ||
### [Go through a more in-depth tutorial with sample data](dirtyduck/index.md) | ||
|
||
To install Triage, you need: | ||
### [Get started with your own project and data](quickstart.md) | ||
|
||
- Python 3.8+ | ||
- A PostgreSQL 9.6+ database with your source data (events, | ||
geographical data, etc) loaded. | ||
- **NOTE**: If your database is PostgreSQL 11+ you will get some | ||
speed improvements. We recommend to update to a recent | ||
version of PostgreSQL. | ||
- Ample space on an available disk, (or for example in Amazon Web | ||
Services's S3), to store the needed matrices and models for your | ||
experiments | ||
|
||
We recommend starting with a new python virtual environment (with Python 3.6 or greater) and pip installing triage there. | ||
```bash | ||
$ virtualenv triage-env | ||
$ . triage-env/bin/activate | ||
(triage-env) $ pip install triage | ||
``` | ||
## Background | ||
|
||
## Data | ||
Triage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in [example/database.yaml](https://github.com/dssg/triage/blob/master/example/database.yaml)). | ||
Triage was initially developed at the University of Chicago's [Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained and enhanced at Carnegie Mellon University. | ||
|
||
If you don't want to install Postgres yourself, try `triage db up` to create a vanilla Postgres 12 database using docker. For more details on this command, check out [Triage Database Provisioner](db.md) | ||
|
||
## Configure Triage for your project | ||
|
||
Triage is configured with a config.yaml file that has parameters defined for each component. You can see some [sample configuration with explanations](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) to see what configuration looks like. | ||
|
||
## Using Triage | ||
|
||
1. Via CLI: | ||
```bash | ||
|
||
triage experiment example/config/experiment.yaml | ||
``` | ||
2. Import as a python package: | ||
```python | ||
from triage.experiments import SingleThreadedExperiment | ||
|
||
experiment = SingleThreadedExperiment( | ||
config=experiment_config, # a dictionary | ||
db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html | ||
project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/' | ||
) | ||
experiment.run() | ||
``` | ||
|
||
There are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the [Running an Experiment](https://dssg.github.io/triage/experiments/running/) page. | ||
|
||
## Development | ||
|
||
Triag was initially developed at [University of Chicago's Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained at Carnegie Mellon University. | ||
|
||
To build this package (without installation), its dependencies may | ||
alternatively be installed from the terminal using `pip`: | ||
|
||
pip install -r requirement/main.txt | ||
|
||
### Testing | ||
|
||
To add test (and development) dependencies, use **test.txt**: | ||
|
||
pip install -r requirement/test.txt [-r requirement/dev.txt] | ||
|
||
Then, to run tests: | ||
|
||
pytest | ||
|
||
### Development Environment | ||
|
||
To quickly bootstrap a development environment, having cloned the | ||
repository, invoke the executable `develop` script from your system | ||
shell: | ||
|
||
./develop | ||
|
||
A "wizard" will suggest set-up steps and optionally execute these, for | ||
example: | ||
|
||
(install) begin | ||
|
||
(pyenv) installed | ||
|
||
(python-3.9.10) installed | ||
|
||
(virtualenv) installed | ||
|
||
(activation) installed | ||
|
||
(libs) install? | ||
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt} | ||
2) no, ignore | ||
#? 1 | ||
|
||
### Contributing | ||
|
||
If you'd like to contribute to Triage development, see the [CONTRIBUTING.md](https://github.com/dssg/triage/blob/master/CONTRIBUTING.md) document. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.