From 9bc50c09df0ffb3a87b930ead6bbe107e81439b0 Mon Sep 17 00:00:00 2001 From: Brian Healy Date: Wed, 10 Apr 2024 09:58:33 -0500 Subject: [PATCH] Add empty directories with README files, update docs --- doc/scripts.md | 6 +++--- hpc_files/combine_preds/logs/README | 1 + hpc_files/dnn_inference/logs/README | 1 + hpc_files/dnn_training/logs/README | 1 + hpc_files/fritzDownload/README | 1 + hpc_files/generated_features_GCN_sources/fg_sources/README | 1 + hpc_files/generated_features_GCN_sources/logs/README | 1 + hpc_files/generated_features_delta/logs/README | 1 + hpc_files/generated_features_new/logs/README | 1 + hpc_files/generated_features_underMS/logs/README | 1 + .../generated_features_underMS/underMS_ids_DR20/README | 1 + hpc_files/models_dnn/README | 1 + hpc_files/models_xgb/README | 1 + hpc_files/xgb_inference/logs/README | 1 + hpc_files/xgb_training/logs/README | 1 + 15 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 hpc_files/combine_preds/logs/README create mode 100644 hpc_files/dnn_inference/logs/README create mode 100644 hpc_files/dnn_training/logs/README create mode 100644 hpc_files/fritzDownload/README create mode 100644 hpc_files/generated_features_GCN_sources/fg_sources/README create mode 100644 hpc_files/generated_features_GCN_sources/logs/README create mode 100644 hpc_files/generated_features_delta/logs/README create mode 100644 hpc_files/generated_features_new/logs/README create mode 100644 hpc_files/generated_features_underMS/logs/README create mode 100644 hpc_files/generated_features_underMS/underMS_ids_DR20/README create mode 100644 hpc_files/models_dnn/README create mode 100644 hpc_files/models_xgb/README create mode 100644 hpc_files/xgb_inference/logs/README create mode 100644 hpc_files/xgb_training/logs/README diff --git a/doc/scripts.md b/doc/scripts.md index 9fd4fc8..e8f8c58 100644 --- a/doc/scripts.md +++ b/doc/scripts.md @@ -4,9 +4,9 @@ The `hpc_files` directory in the `scope-ml` repository contains scripts, files a Note that data files are not included in the `hpc_files` directory. The main files necessary to run the scripts detailed below are listed here and available on [Zenodo](https://zenodo.org/doi/10.5281/zenodo.8410825): - `trained_models_dnn` and `trained_models_xgb`: download on Zenodo, unzip, and place directories into `models_dnn` and `models_xgb` directories, respectively -- `training_set.parquet`: download on Zenodo and place into a directory called `fritzDownload` +- `training_set.parquet`: download on Zenodo and place into the directory called `fritzDownload` -Note also that most included scripts and directories can also be generated from scratch using the following SCoPe scripts: `train-algorithm-slurm`, `generate-features-slurm`, `run-inference-slurm`, and `combine-preds-slurm`. The directories generated by these scripts generally are populated with two subdirectories: `logs` to contain slurm logs, and `slurm` to contain slurm scripts. **Since GitHub does not track empty directories, `logs` will have to be created (on the same level as `slurm`) in each of the example HPC directories.** +Note also that most included scripts and directories can also be generated from scratch using the following SCoPe scripts: `train-algorithm-slurm`, `generate-features-slurm`, `run-inference-slurm`, and `combine-preds-slurm`. The directories generated by these scripts generally are populated with two subdirectories: `logs` to contain slurm logs, and `slurm` to contain slurm scripts. ## Configuration @@ -30,7 +30,7 @@ These two directories are generated when running `train-algorithm-slurm`. The `s ### Output: trained models in `models_dnn` and `models_xgb` Trained models are saved in these two directories. The `--group` name passed to the training code will determine the subdirectory where the models are saved. Within this, each classifier gets its own subdirectory that includes the model files, diagnostic plots, and feature importance data (XGB only). -**To run inference with the latest trained models, download `trained_dnn_models.zip` and `trained_xgb_models.zip` from Zenodo and unzip them within a corresponding `models_dnn` or `models_xgb` directory.** +**To run inference with the latest trained models, download `trained_dnn_models.zip` and `trained_xgb_models.zip` from Zenodo and unzip them within the corresponding `models_dnn` or `models_xgb` directory.** ## Generating Features diff --git a/hpc_files/combine_preds/logs/README b/hpc_files/combine_preds/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/combine_preds/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/dnn_inference/logs/README b/hpc_files/dnn_inference/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/dnn_inference/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/dnn_training/logs/README b/hpc_files/dnn_training/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/dnn_training/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/fritzDownload/README b/hpc_files/fritzDownload/README new file mode 100644 index 0000000..f1b7171 --- /dev/null +++ b/hpc_files/fritzDownload/README @@ -0,0 +1 @@ +Place training_set.parquet (from Zenodo) here diff --git a/hpc_files/generated_features_GCN_sources/fg_sources/README b/hpc_files/generated_features_GCN_sources/fg_sources/README new file mode 100644 index 0000000..d86b669 --- /dev/null +++ b/hpc_files/generated_features_GCN_sources/fg_sources/README @@ -0,0 +1 @@ +ZTF ID lists from gcn_cronjob.py go here diff --git a/hpc_files/generated_features_GCN_sources/logs/README b/hpc_files/generated_features_GCN_sources/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/generated_features_GCN_sources/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/generated_features_delta/logs/README b/hpc_files/generated_features_delta/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/generated_features_delta/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/generated_features_new/logs/README b/hpc_files/generated_features_new/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/generated_features_new/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/generated_features_underMS/logs/README b/hpc_files/generated_features_underMS/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/generated_features_underMS/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/generated_features_underMS/underMS_ids_DR20/README b/hpc_files/generated_features_underMS/underMS_ids_DR20/README new file mode 100644 index 0000000..16d272c --- /dev/null +++ b/hpc_files/generated_features_underMS/underMS_ids_DR20/README @@ -0,0 +1 @@ +Batched files containing ZTF IDs (from data wrangling notebook) go here diff --git a/hpc_files/models_dnn/README b/hpc_files/models_dnn/README new file mode 100644 index 0000000..f7414da --- /dev/null +++ b/hpc_files/models_dnn/README @@ -0,0 +1 @@ +Place unzipped trained_dnn_models directory (from Zenodo) here diff --git a/hpc_files/models_xgb/README b/hpc_files/models_xgb/README new file mode 100644 index 0000000..cc7c908 --- /dev/null +++ b/hpc_files/models_xgb/README @@ -0,0 +1 @@ +Place unzipped trained_xgb_models directory (from Zenodo) here diff --git a/hpc_files/xgb_inference/logs/README b/hpc_files/xgb_inference/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/xgb_inference/logs/README @@ -0,0 +1 @@ +Slurm logs go here diff --git a/hpc_files/xgb_training/logs/README b/hpc_files/xgb_training/logs/README new file mode 100644 index 0000000..c7f380a --- /dev/null +++ b/hpc_files/xgb_training/logs/README @@ -0,0 +1 @@ +Slurm logs go here