Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_reps.jl helper to run RME #26

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/Documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ jobs:
uses: actions/checkout@v4
- name: Setup Julia
uses: julia-actions/setup-julia@v2
with:
version: '1.10'
- name: Pull Julia cache
uses: julia-actions/cache@v2
- name: Build and deploy docs
Expand Down
5 changes: 4 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,18 @@ CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
NetCDF = "30363a11-5582-574a-97bb-aa9a979735b9"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
YAXArrays = "c21b50f5-aa40-41ea-b809-c0f5e47bfa5c"

[compat]
CSV = "0.10"
DataFrames = "1"
julia = "1"
Dates = "1.11.0"
NetCDF = "0.11.8"
Random = "1"
Statistics = "1.10.0"
StatsBase = "0.33, 0.34"
YAXArrays = "0.5.5"
julia = "1"
4 changes: 4 additions & 0 deletions src/ReefModEngine.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ include("interface.jl")
include("deployment.jl")
include("io.jl")
include("ResultStore.jl")
include("run_reps.jl")

# Set up and initialization
export
Expand All @@ -55,4 +56,7 @@ export
export
ResultStore, concat_results!, save_result_store

# Run reps
export run_rme

end
134 changes: 134 additions & 0 deletions src/run_reps.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
using Dates
using Random

"""
run_rme(rme_path::String, n_threads::Int64, reps::Int64, result_path::String; start_year::Int64=2022, end_year::Int64=2099, batch_size::Int64=10, start_batch::Int64=1, RCP_scen::String="SSP 2.45", gcm::String="CNRM_ESM2_1", rnd_seed::Int64=1234)::Nothing
Run counterfactual scenarios with ReefModEngine.jl and save result set to desired dir.
Zapiano marked this conversation as resolved.
Show resolved Hide resolved
# Arguments
- `rme_path` : Path to REM folder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the RME option setting step outside this function as suggested below, we can remove this (and other) argument lines, but if we decide to keep it:

Suggested change
- `rme_path` : Path to REM folder.
- `rme_path` : Path to RME folder.

- `n_threads` : Number of threads to be used with RME.
- `reps` : Total number of repetitions to be run.
- `result_path` : Path to folder where resultset should be placed.
- `start_year` : RME run start year.
- `end_year` : RME run end year.
- `batch_size` : Number of repetitions to be run in each batch.
- `RCP_scen` : RCP scenario to be used for RME runs.
- `gcm` : GCM to be used for RME runs.
- `rnd_seed` : Random seed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to accept an RNG instead:

Suggested change
- `rnd_seed` : Random seed.
- `rng` : Random number generator (default: Globally set RNG).

rng::AbstractRNG=Random.GLOBAL_RNG

"""
function run_rme(
rme_path::String,
n_threads::Int64,
reps::Int64,
result_path::String;
start_year::Int64=2022,
end_year::Int64=2099,
batch_size::Int64=10,
RCP_scen::String="SSP 2.45",
gcm::String="CNRM_ESM2_1",
rnd_seed::Int64=1234
)::Nothing
init_rme(rme_path)
set_option("thread_count", n_threads) # Set number of threads
set_option("use_fixed_seed", 1) # Turn on use of a fixed seed value
Comment on lines +33 to +35
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these should be set outside this function by the user and the settings reused across runs if possible.

Currently, RME is re-initialized every time run_rme() is called. In recent versions it takes milliseconds, but in the past (and potentially in the future) it may take 10s of seconds to minutes.


# Reset RME to clear any previous runs
reset_rme()

# Initialize result store
result_store = ResultStore(start_year, end_year)
rme_results_dir = _resultset_dir_name()

# Use user selected seed to generate an array of seeds for each batch run
rnd_seeds::Vector{Int64} = _rnd_seeds(rnd_seed, batch_size, reps)

@info "Starting runs"
@info "Batch sizes: $batch_size"
for (batch_idx, batch_start) in enumerate(1:(batch_size):reps)
_run_batch(
batch_idx,
batch_start,
batch_size,
reps,
rnd_seeds[batch_idx],
rme_results_dir,
start_year,
end_year,
RCP_scen,
gcm,
result_store
)
end
@info "Finished running all reps."

result_path = joinpath(result_path, rme_results_dir)
save_result_store(result_path, result_store)

return nothing
end

"""
run_batch(batch_idx::Int64, batch_start::Int64, batch_size::Int64, reps::Int64, rme_results_dir::String, start_year::Int64, end_year::Int64, RCP_scen::String, gcm::String, result_store)::Nothing
Run one batch of repetitions using RME.
"""
function _run_batch(
batch_idx::Int64,
batch_start::Int64,
batch_size::Int64,
reps::Int64,
batch_seed::Int64,
rme_results_dir::String,
start_year::Int64,
end_year::Int64,
RCP_scen::String,
gcm::String,
result_store
)::Nothing
batch_end = batch_start - 1 + batch_size
batch_reps = if batch_end > reps
reps - (batch_start - 1)
else
batch_size
end

@info "Starting batch $batch_idx"

# Set distinct seed for each run
set_option("fixed_seed", batch_seed) # Set the fixed seed value

# Note: if Julia runtime crashes, check that specified data file location is correct
@RME runCreate(
rme_results_dir::Cstring,
start_year::Cint,
end_year::Cint,
RCP_scen::Cstring,
gcm::Cstring,
batch_reps::Cint
)::Cint

# Initialize RME runs as defined above
run_init()

# Run all years and all reps
@time @RME runProcess()::Cint

# Collect and store results
@info "Concatenating results of batch $batch_idx..."
concat_results!(result_store, start_year, end_year, batch_reps)

return nothing
end

function _rnd_seeds(rnd_seed::Int64, batch_size::Int64, reps::Int64)::Vector{Int64}
Random.seed!(rnd_seed)
n_seeds = Int(ceil(reps / batch_size))
return Int.(floor.(rand(n_seeds) .* 1e6))
end
Comment on lines +125 to +129
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use StatsBase and sample without replacement.
Right now there is a very (very) small possibility of a clash (multiple scenarios with same seed).

function _rnd_seeds(rnd_seed::Int64, batch_size::Int64, reps::Int64; rng::AbstractRNG=Random.GLOBAL_RNG, max_seed::Int64=1_000_000)::Vector{Int64}    
    n_seeds = Int(ceil(reps / batch_size))
    if n_seeds > max_seed
        # Protect against possibly number of seed values exceed seed range
        max_seed = max_seed + (n_seeds * 2)
    end

    return sample(rng, 1:max_seed, n_seeds; replace=false)
end


function _resultset_dir_name()::String
timestamp = Dates.format(now(), "yyyymmdd_HHMMSS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer YYYY-mm-dd format

return "rme_results_$(timestamp)"
end