Skip to content

Commit

Permalink
Add an example output from 'job_report'; add a vignette section showi…
Browse files Browse the repository at this point in the history
…ng how to use 'job_report'
  • Loading branch information
Nick-Eagles committed Oct 3, 2023
1 parent af59890 commit bfd5c0e
Show file tree
Hide file tree
Showing 4 changed files with 77 additions and 0 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

export(job_info)
export(job_loop)
export(job_report)
export(job_single)
export(with_wd)
import(dplyr)
Expand Down
Binary file added inst/extdata/job_report_df.rds
Binary file not shown.
35 changes: 35 additions & 0 deletions man/job_report.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 41 additions & 0 deletions vignettes/slurmjobs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,47 @@ job_df |>
print()
```

# Analyzing Finished Jobs

The `job_report()` function returns in-depth information about a single queued, running, or finished job. Note that through SLURM, an array job can be referenced as a whole, or by one of its tasks.

Suppose you have a workflow that operates as an array job, and you'd like to profile memory usage across the many tasks. Suppose we've done an initial trial, setting memory relatively high just to get the jobs running without issues. One use of `job_report` could be to determine a better memory request in a data-driven way-- the better settings can then be run on the larger dataset after the initial test.

On an actual system with SLURM installed, you'd normally run something like `job_df = job_report(slurm_job_id)` for the `slurm_job_id` (character or integer) representing the small test. For convenience, we'll start from the output of `job_report` as available in the `slurmjobs` package.

```{r "job_report_quick_look"}
job_df = readRDS(
system.file("extdata", "job_report_df.rds", package = "slurmjobs")
)
print(job_df)
```

Now let's choose a better memory request,

```{r "job_report_adjust_mem"}
stat_df = job_df |>
# This example includes tasks that fail. We're only interested in memory
# for successfully completed tasks
filter(status != 'FAILED') |>
summarize(
mean_mem = mean(max_vmem_gb),
std_mem = sd(max_vmem_gb),
max_mem = max(max_vmem_gb)
)
# We could choose a new memory request as 3 standard deviations above the mean
# of actual memory usage
new_limit = stat_df$mean_mem + 3 * stat_df$std_mem
print(
sprintf(
"%.02fG is a better memory request than %.02fG, which was used before",
new_limit,
job_df$requested_mem_gb[1]
)
)
```

# Reproducibility

The `r Biocpkg("slurmjobs")` package `r Citep(bib[["slurmjobs"]])` was made possible thanks to:
Expand Down

0 comments on commit bfd5c0e

Please sign in to comment.