Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scope pip-installable #514

Merged
merged 59 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
5d0aa85
Initial commit: restructure, add _instantiate.py, modify pyproject.toml
bfhealy Dec 1, 2023
d0ab4ed
Add new config-specified args
bfhealy Dec 11, 2023
1324918
Update _instantiate.py with more functions
bfhealy Dec 11, 2023
8023c6f
Refactor methods, add argument parsers
bfhealy Dec 11, 2023
53324b3
List all arguments in utils
bfhealy Dec 11, 2023
04de660
Update docstrings
bfhealy Dec 11, 2023
71155f0
Install/update scope.py scripts
bfhealy Dec 11, 2023
c9a2267
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Dec 11, 2023
2a21683
Update workflows
bfhealy Dec 12, 2023
7bf50c5
Update imports
bfhealy Dec 12, 2023
079285f
Rename scope.py to scope_class.py, update other code
bfhealy Dec 12, 2023
39235b7
Change imports
bfhealy Dec 12, 2023
fd0417e
Update more imports in scope_class
bfhealy Dec 12, 2023
378ba38
Update more imports
bfhealy Dec 12, 2023
a37662a
More updated imports
bfhealy Dec 12, 2023
3d51d73
Relative imports for scope code
bfhealy Dec 13, 2023
258eb28
Use poetry instead of pip to install scope
bfhealy Dec 13, 2023
667bb4d
Remove py311 from black
bfhealy Dec 13, 2023
2fbc236
Use poetry run
bfhealy Dec 13, 2023
c33082e
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Dec 14, 2023
9ca95a4
Change package name, include config defaults
bfhealy Dec 15, 2023
e8c1bc9
Add initialization script, golden dataset mapper
bfhealy Jan 26, 2024
b075065
Install scope-download-classification, allow user-specified config
bfhealy Jan 26, 2024
c5907ad
Update docs with science user install, new usage
bfhealy Jan 30, 2024
42a4cf1
Add more useful data files to package
bfhealy Jan 31, 2024
86f0637
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Feb 1, 2024
1aca065
Include more useful files
bfhealy Feb 6, 2024
d5829b4
Refactor fritz tools with new config code
bfhealy Feb 6, 2024
9037618
Refactor feature code
bfhealy Feb 6, 2024
a4ddda7
Add missing args to generate_features_slurm.py
bfhealy Feb 6, 2024
f6ea2a8
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Feb 16, 2024
19ebcbc
Refactor training, inference scripts
bfhealy Feb 19, 2024
abe2fa7
Refactor remaining scripts
bfhealy Feb 19, 2024
1227e30
Refactor scope_class, utils to update config reading/checking
bfhealy Feb 19, 2024
ce7d998
Standardize argument format to use hyphens, update docs
bfhealy Feb 20, 2024
aab6988
Update gcn_cronjob.py
bfhealy Feb 20, 2024
3f4cb1f
Move example notebooks out of tools directory
bfhealy Feb 20, 2024
c150e1d
Delete old example notebooks
bfhealy Feb 20, 2024
61ae060
Update pre-commit config
bfhealy Feb 20, 2024
a57284e
Fix new linting issues, move initialization function
bfhealy Feb 20, 2024
3a725f5
Fix isinstance changes
bfhealy Feb 20, 2024
d41fe71
Fix feature generation bug
bfhealy Feb 20, 2024
cee4bb1
Update package metadata, readme
bfhealy Feb 20, 2024
74d261a
Fix typo in readme
bfhealy Feb 20, 2024
6dc25a8
Separate dev requirements from others
bfhealy Feb 21, 2024
743266c
Reorganize requirments, update docs
bfhealy Feb 21, 2024
e347af7
Update author list, version
bfhealy Feb 21, 2024
616e453
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Feb 22, 2024
3a1c461
Enable --doGPU flag in scope-test
bfhealy Feb 22, 2024
e4f7325
Account for path_to_features in scope-test
bfhealy Feb 22, 2024
a707f0d
More path_to_features debugging in scope-test
bfhealy Feb 23, 2024
44dcb09
Debug GPU testing
bfhealy Feb 23, 2024
94f3344
Fix period_suffix bug
bfhealy Feb 23, 2024
4d9740e
Debug generate-features/get-quad-ids paths
bfhealy Feb 23, 2024
d01ec19
Change logs path in training/inference slurm scripts
bfhealy Feb 23, 2024
98fc7c2
Merge remote-tracking branch 'origin/main' into pip-install-scope
bfhealy Feb 23, 2024
4f4d655
Fix typos in tool.poetry.scripts
bfhealy Feb 23, 2024
2720bed
Update readme with new repo name
bfhealy Feb 23, 2024
db52661
Restrict tensorflow requirements
bfhealy Feb 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.3.0
rev: v4.5.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
exclude: .ipynb_checkpoints|data/Gaia_hp8_densitymap.fits|tools/classification_stats.ipynb
- id: trailing-whitespace
exclude: .ipynb_checkpoints|data/Gaia_hp8_densitymap.fits
- repo: https://github.com/python/black
rev: 22.3.0
rev: 24.2.0
hooks:
- id: black
pass_filenames: true
exclude: .ipynb_checkpoints|data|^.fits
- repo: https://github.com/pycqa/flake8
rev: 3.8.4
rev: 7.0.0
hooks:
- id: flake8
pass_filenames: true
Expand Down
9 changes: 0 additions & 9 deletions .requirements/dev.txt

This file was deleted.

23 changes: 0 additions & 23 deletions .requirements/doc.txt

This file was deleted.

9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# SCoPe: ZTF source classification project
# SCoPe: ZTF Source Classification Project

[![arXiv](https://img.shields.io/badge/arXiv-2102.11304-brightgreen)](https://arxiv.org/abs/2102.11304)
[![arXiv](https://img.shields.io/badge/arXiv-2009.14071-brightgreen)](https://arxiv.org/abs/2009.14071)
[![arXiv](https://img.shields.io/badge/arXiv-2312.00143-brightgreen)](https://arxiv.org/abs/2312.00143)

The documentation is hosted at [https://zwickytransientfacility.github.io/scope-docs/](https://zwickytransientfacility.github.io/scope-docs/). To generate HTML files of the documentation locally, run `./scope.py doc`
`scope-ml` uses machine learning to classify light curves from the Zwicky Transient Facility ([ZTF](https://www.ztf.caltech.edu)). The documentation is hosted at [https://zwickytransientfacility.github.io/scope-docs/](https://zwickytransientfacility.github.io/scope-docs/). To generate HTML files of the documentation locally, clone the repository and run `scope-doc` after installing.

## Funding
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for <a href="https://a3d3.ai">Accelerated AI Algorithms for Data-Driven Discovery (A3D3)</a> under Cooperative Agreement No. <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997">PHY-2117997</a>.

<p align="center">
<img src="https://github.com/ZwickyTransientFacility/scope/blob/main/assets/a3d3.png" alt="A3D3" width="200"/>
<img src="https://github.com/ZwickyTransientFacility/scope/blob/main/assets/nsf.png" alt="NSF" width="200"/>
<img src="https://github.com/ZwickyTransientFacility/scope/raw/main/assets/a3d3.png" alt="A3D3" width="200"/>
<img src="https://github.com/ZwickyTransientFacility/scope/raw/main/assets/nsf.png" alt="NSF" width="200"/>
File renamed without changes.
9 changes: 9 additions & 0 deletions config.defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1731,6 +1731,15 @@ training:
eval_metric: 'auc'
early_stopping_rounds: 10
num_boost_round: 999
plot_params:
cm_include_count: False
cm_include_percent: True
annotate_scores: False
dnn:
dense_branch: True
conv_branch: True
loss: 'binary_crossentropy'
optimizer: 'adam'
classes:
# phenomenological classes
vnv:
Expand Down
5 changes: 5 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pytest>=6.1.2
pre-commit>=3.5.0
sphinx>=4.2
sphinx_press_theme>=0.8.0
poetry>=1.7.1
62 changes: 40 additions & 22 deletions doc/developer.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
# Installation/Developer Guidelines

## Initial steps
## Science users
- Create and activate a virtual/conda environment with Python 3.11, e.g:
```shell script
conda create -n scope-env python=3.11
conda activate scope-env
```
- Install the latest release of `scope-ml` from PyPI:
```shell script
pip install scope-ml
```
- In the directory of your choice, run the initialization script. This will create the required directories and copy the necessary files to run the code:
```shell script
scope-initialize
```
- Change directories to `scope` and modify `config.yaml` to finish the initialization process. This config file is used by default when running all scripts. You can also specify another config file using the `--config-path` argument.


## Developers/contributors

- Create your own fork the [scope repository](https://github.com/ZwickyTransientFacility/scope) by clicking the "fork" button. Then, decide whether you would like to use HTTPS (easier for beginners) or SSH.
- Following one set of instructions below, clone (download) your copy of the repository, and set up a remote called `upstream` that points to the main `scope` repository.
Expand All @@ -21,9 +38,9 @@ git clone [email protected]:<yourname>/scope.git && cd scope
git remote add upstream [email protected]:ZwickyTransientFacility/scope.git
```

## Setting up your environment (Windows/Linux/macOS)
### Setting up your environment (Windows/Linux/macOS)

### Use a package manager for installation
#### Use a package manager for installation

We currently recommend running `scope` with Python 3.11. You may want to begin your installation by creating/activating a virtual environment, for example using conda. We specifically recommend installing miniforge3 (https://github.com/conda-forge/miniforge).

Expand All @@ -34,23 +51,30 @@ conda create -n scope-env -c conda-forge python=3.11
conda activate scope-env
```

### Update your `PYTHONPATH`
#### (Optional): Update your `PYTHONPATH`

Ensure that Python can import from `scope` by modifying the `PYTHONPATH` environment variable. Use a simple text editor like `nano` to modify the appropriate file (depending on which shell you are using). For example, if using bash, run `nano ~/.bash_profile` and add the following line:
If you plan to import from `scope`, ensure that Python can import from `scope` by modifying the `PYTHONPATH` environment variable. Use a simple text editor like `nano` to modify the appropriate file (depending on which shell you are using). For example, if using bash, run `nano ~/.bash_profile` and add the following line:

```bash
export PYTHONPATH="$PYTHONPATH:$HOME/scope"
```

Save the updated file (`Ctrl+O` in `nano`) and close/reopen your terminal for this change to be recognized. Then `cd` back into scope and activate your `scope-env` again.

### Install pre-commit
### Install required packages

Ensure you are in the `scope` directory that contains `pyproject.toml`. Then, install the required python packages by running:
```bash
pip install .
```

#### Install dev requirements, pre-commit hook

We use `black` to format the code and `flake8` to verify that code complies with [PEP8](https://www.python.org/dev/peps/pep-0008/).
Please install our pre-commit hook as follows:
Please install our dev requirements and pre-commit hook as follows:

```shell script
pip install pre-commit
pip install -r dev-requirements.txt
pre-commit install
```

Expand All @@ -60,14 +84,7 @@ code.

The pre-commit hook will lint *changes* made to the source.

## Install required packages

Install the required python packages by running:
```bash
pip install -r requirements.txt
```

### Create and modify config.yaml
#### Create and modify config.yaml

From the included config.defaults.yaml, make a copy called config.yaml:

Expand All @@ -77,14 +94,15 @@ cp config.defaults.yaml config.yaml

Edit config.yaml to include Kowalski instance and Fritz tokens in the associated empty `token:` fields.

### Testing
Run `./scope.py test` to test your installation. Note that for the test to pass, you will need access to the Kowalski database. If you do not have Kowalski access, you can run `./scope.py test_limited` to run a more limited (but still useful) set of tests.
#### Testing
Run `scope-test` to test your installation. Note that for the test to pass, you will need access to the Kowalski database. If you do not have Kowalski access, you can run `scope-test-limited` to run a more limited (but still useful) set of tests.

### Troubleshooting
Upon encountering installation/testing errors, manually install the package in question using `conda install xxx` , and remove it from `.requirements/dev.txt`. After that, re-run `pip install -r requirements.txt` to continue.

### Known issues
- Across all platforms, we are currently aware of `scope` dependency issues with Python 3.11.
#### Known issues
- If using GPU-accelerated period-finding algorithms for feature generation, you will need to install [periodfind](https://github.com/ZwickyTransientFacility/periodfind) separately from the source.
- Across all platforms, we are currently aware of `scope` dependency issues with Python 3.12.
- Anaconda continues to cause problems with environment setup.
- Using `pip` to install `healpy` on an arm64 Mac can raise an error upon import. We recommend including `h5py` as a requirement during the creation of your `conda` environment.
- On Windows machines, `healpy` and `cesium` raise errors upon installation.
Expand All @@ -93,7 +111,7 @@ Upon encountering installation/testing errors, manually install the package in q

If the installation continues to raise errors, update the conda environment and try again.

## How to contribute
### How to contribute

Contributions to `scope` are made through [GitHub Pull Requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests), a set of proposed commits (or patches):

Expand Down Expand Up @@ -144,7 +162,7 @@ Developers may merge `main` into their branch as many times as they want to.

1. Once the pull request has been reviewed and approved by at least one team member, it will be merged into `scope`.

## Contributing Field Guide sections
### Contributing Field Guide sections

If you would like to contribute a Field Guide section, please follow the steps below.

Expand Down
32 changes: 17 additions & 15 deletions doc/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# Quick Start Guide

This guide is intended to facilitate quick interactions with SCoPe code after you have completed the **Installation/Developer Guidelines** section. More detailed usage info can be found in the **Usage** section. **All of the following examples assume that SCoPe is installed in your home directory. If the `scope` directory is located elsewhere, adjust the example code as necessary.**
This guide is intended to facilitate quick interactions with SCoPe code after you have completed the **Installation/Developer Guidelines** section. More detailed usage info can be found in the **Usage** section.

## Modify `config.yaml`
To start out, provide SCoPe your training set's filepath using the `training:` `dataset:` field in `config.yaml`. The path should be a partial one starting within the `scope` directory. For example, if your training set `trainingSet.parquet` is within the `tools` directory (which itself is within `scope`), provide `tools/trainingSet.parquet` in the `dataset:` field.

When running scripts, `scope` will by default use the `config.yaml` file in your current directory. You can specify a different config file by providing its path to any installed script using the `--config-path` argument.

## Training

Train an XGBoost binary classifier using the following code:

```
./scope.py train --tag=vnv --algorithm=xgb --group=ss23 --period_suffix=ELS_ECE_EAOV --epochs=30 --verbose --save --plot --skip_cv
scope-train --tag vnv --algorithm xgb --group ss23 --period-suffix ELS_ECE_EAOV --epochs 30 --verbose --save --plot --skip-cv
```

### Arguments:
Expand All @@ -20,34 +22,34 @@ Train an XGBoost binary classifier using the following code:

`--group`: if `--save` is passed, training results are saved to the group/directory named here.

`--period_suffix`: SCoPe determines light curve periods using GPU-accelerated algorithms. These algorithms include a Lomb-Scargle approach (ELS), Conditional Entropy (ECE), Analysis of Variance (AOV), and an approach nesting all three (ELS_ECE_EAOV). Periodic features are stored with the suffix specified here.
`--period-suffix`: SCoPe determines light curve periods using GPU-accelerated algorithms. These algorithms include a Lomb-Scargle approach (ELS), Conditional Entropy (ECE), Analysis of Variance (AOV), and an approach nesting all three (ELS_ECE_EAOV). Periodic features are stored with the suffix specified here.

`--min_count`: requires at least min_count positive examples to run training.
`--min-count`: requires at least min_count positive examples to run training.

`--epochs`: neural network training takes an --epochs argument that is set to 30 here.

***Notes:***
- *The above training runs the XGB algorithm by default and skips cross-validation in the interest of time. For a full run, you can remove the `--skip_cv` argument to run a cross-validated grid search of XGB hyperparameters during training.*
- *The above training runs the XGB algorithm by default and skips cross-validation in the interest of time. For a full run, you can remove the `--skip-cv` argument to run a cross-validated grid search of XGB hyperparameters during training.*

- *DNN hyperparameters are optimized using a different approach - Weights and Biases Sweeps (https://docs.wandb.ai/guides/sweeps). The results of these sweeps are the default hyperparameters in the config file. To run another round of sweeps for DNN, create a WandB account and set the `--run_sweeps` keyword in the call to `scope.py train`.*
- *DNN hyperparameters are optimized using a different approach - Weights and Biases Sweeps (https://docs.wandb.ai/guides/sweeps). The results of these sweeps are the default hyperparameters in the config file. To run another round of sweeps for DNN, create a WandB account and set the `--run-sweeps` keyword in the call to `scope-train`.*

- *SCoPe DNN training does not provide feature importance information (due to the hidden layers of the network). Feature importance is possible to estimate for neural networks, but it is more computationally expensive compared to this "free" information from XGB.*

### Train multiple classifiers with one script

Create a shell script that contains multiple calls to `scope.py train`:
Create a shell script that contains multiple calls to `scope-train`:
```
./scope.py create_training_script --filename=train_xgb.sh --min_count=1000 --algorithm=xgb --period_suffix=ELS_ECE_EAOV --add_keywords="--save --plot --group=ss23 --epochs=30 --skip_cv"
create-training-script --filename train_xgb.sh --min-count 1000 --algorithm xgb --period-suffix ELS_ECE_EAOV --add-keywords "--save --plot --group ss23 --epochs 30 --skip-cv"
```

Modify the permissions of this script by running `chmod +x train_xgb.sh`. Run the generated training script in a terminal window (using e.g. `./train_xgb.sh`) to train multiple label sequentially.
Modify the permissions of this script by running `chmod +x train_xgb.sh`. Run the generated training script in a terminal window (using e.g. `./train_xgb.sh`) to train multiple classifers sequentially.

***Note:***
- *The code will throw an error if the training script filename already exists.*
- *The code will raise an error if the training script filename already exists.*

### Running training on HPC resources

`train_algorithm_slurm.py` and `train_algorithm_job_submission.py` can be used generate and submit `slurm` scripts to train all classifiers in parallel using HPC resources.
`train-algorithm-slurm` and `train-algorithm-job-submission` can be used generate and submit `slurm` scripts to train all classifiers in parallel using HPC resources.

## Plotting Classifier Performance
SCoPe saves diagnostic plots and json files to report each classifier's performance. The below code shows the location of the validation set results for one classifier.
Expand Down Expand Up @@ -82,10 +84,10 @@ This code may also be placed in a loop over multiple labels to compare each clas

## Inference

Use `tools/inference.py` to run inference on a field (297) of features (within a directory called `generated_features`). The classifiers used for this inference are within the `ss23` directory/group specified during training.
Use `run-inference` to run inference on a field (297) of features (in this example, located in a directory called `generated_features`). The classifiers used for this inference are within the `ss23` directory/group specified during training.

```
./scope.py create_inference_script --filename=get_all_preds_xgb.sh --group_name=ss23 --algorithm=xgb --period_suffix=ELS_ECE_EAOV --feature_directory=generated_features
create-inference-script --filename get_all_preds_xgb.sh --group-name ss23 --algorithm xgb --period-suffix ELS_ECE_EAOV --feature-directory generated_features
```

Modify the permissions of this script using `chmod +x get_all_preds_xgb.sh`, then run on the desired field:
Expand All @@ -94,12 +96,12 @@ Modify the permissions of this script using `chmod +x get_all_preds_xgb.sh`, the
```

***Notes:***
- *`scope.py create_inference_script` will throw an error if the inference script filename already exists.*
- *`create-inference-script` will raise an error if the inference script filename already exists.*
- *Inference begins by imputing missing features using the strategies specified in the `features:` section of the config file.*

### Running inference on HPC resources

`run_inference_slurm.py` and `run_inference_job_submission.py` can be used generate and submit `slurm` scripts to run inference for all classifiers in parallel using HPC resources.*
`run-inference-slurm` and `run-inference-job-submission` can be used generate and submit `slurm` scripts to run inference for all classifiers in parallel using HPC resources.*

## Examining predictions

Expand Down
Loading
Loading