Add an example output from 'job_report'; add a vignette section showi…

…ng how to use 'job_report'
LieberInstitute · Oct 3, 2023 · bfd5c0e · bfd5c0e
1 parent af59890
commit bfd5c0e
Show file tree

Hide file tree

Showing 4 changed files with 77 additions and 0 deletions.
diff --git a/NAMESPACE b/NAMESPACE
@@ -2,6 +2,7 @@
 
 export(job_info)
 export(job_loop)
+export(job_report)
 export(job_single)
 export(with_wd)
 import(dplyr)

diff --git a/inst/extdata/job_report_df.rds b/inst/extdata/job_report_df.rds
diff --git a/man/job_report.Rd b/man/job_report.Rd
diff --git a/vignettes/slurmjobs.Rmd b/vignettes/slurmjobs.Rmd
@@ -135,6 +135,47 @@ job_df |>
     print()
 ```
 
+# Analyzing Finished Jobs
+
+The `job_report()` function returns in-depth information about a single queued, running, or finished job. Note that through SLURM, an array job can be referenced as a whole, or by one of its tasks.
+
+Suppose you have a workflow that operates as an array job, and you'd like to profile memory usage across the many tasks. Suppose we've done an initial trial, setting memory relatively high just to get the jobs running without issues. One use of `job_report` could be to determine a better memory request in a data-driven way-- the better settings can then be run on the larger dataset after the initial test.
+
+On an actual system with SLURM installed, you'd normally run something like `job_df = job_report(slurm_job_id)` for the `slurm_job_id` (character or integer) representing the small test. For convenience, we'll start from the output of `job_report` as available in the `slurmjobs` package.
+
+```{r "job_report_quick_look"}
+job_df = readRDS(
+    system.file("extdata", "job_report_df.rds", package = "slurmjobs")
+)
+print(job_df)
+```
+
+Now let's choose a better memory request,
+
+```{r "job_report_adjust_mem"}
+stat_df = job_df |>
+    #   This example includes tasks that fail. We're only interested in memory
+    #   for successfully completed tasks
+    filter(status != 'FAILED') |>
+    summarize(
+        mean_mem = mean(max_vmem_gb),
+        std_mem = sd(max_vmem_gb),
+        max_mem = max(max_vmem_gb)
+    )
+
+#   We could choose a new memory request as 3 standard deviations above the mean
+#   of actual memory usage
+new_limit = stat_df$mean_mem + 3 * stat_df$std_mem
+
+print(
+    sprintf(
+        "%.02fG is a better memory request than %.02fG, which was used before",
+        new_limit,
+        job_df$requested_mem_gb[1]
+    )
+)
+```
+
 # Reproducibility
 
 The `r Biocpkg("slurmjobs")` package `r Citep(bib[["slurmjobs"]])` was made possible thanks to: