Skip to content

How to add new model

Hoyoung Doh edited this page Sep 8, 2022 · 5 revisions

Before you start, you'd be better to check out How to contribute page to set up your environment.

Outline

  1. Write a YAML file for model information
    hBayesDM/commons/models/
  2. Write a Stan file for the model
    hBayesDM/commons/stan_files/
  3. (optional) Provide example data for the task
    hBayesDM/commons/extdata/
  4. Generate R and Python codes
    hBayesDM/commons/generate-codes.sh
  5. Document auto-generated R codes
    hBayesDM/R/NAMESPACE (for R)
  6. Implement preprocess_funcs to specify how to preprocess data
    hBayesDM/R/R/preprocess_funcs.R (for R)
    hBayesDM/Python/hbayesdm/preprocess_funcs.py (for Python)
  7. Install R and Python packages
    hBayesDM/R/ (for R)
    hBayesDM/Python (for Python)

Step 1) Write a YAML file for model information

The first thing to do is to specify model information into a YAML file in hBayesDM/commons/models/. It must contains information you intend to use in the code to come:

  • task information (task_name),
  • model information (model_name),
  • model type (model_type),
  • data columns in data (data_columns),
  • model parameters (parameters),
  • model regressors (regressors, optional),
  • variables for posterior predictive checks (postpreds, optional),
  • additional arguments on the function (addtional_args, optional),
  • special notes for the model (notes, optional), and
  • a list of contributors who actually wrote the code (contributors, optional).

The name of the new YAML file should follow the convention below:

# Given `task_code`, `model_code`, and `model_type` (optional),
# the filename should be defined as below:
  <task_code>_<model_code>[_<model_type>].yml

# Example 1:
  ra_prospect.yml
# ^^ ^^^^^^^^
# (task_code = 'ra', model_code = 'prospect')

# Example 2:
  choiceRT_ddm_single.yml
# ^^^^^^^^ ^^^ ^^^^^^
# (task_code = 'choiceRT', model_code = 'ddm', model_type = 'single')

You can start to write one by copying example.yml on hBayesDM/commons/. It contains detailed comments for each values, but if you have a problem with it, please let us know on the GitHub issue.

These identifiers (variable names) need to be used consistently throughout Stan/R/python codes, to make codes readable and function correctly. Think of the YAML file as a quick summary or specification for your model. We will auto-generate R and Python codes based on these YAML files in Step 4.

For details of how to use YAML, you can find further information from links below:

Step 2) Write a Stan file for the model

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

Stan official website

Stan is the workhorse of the sampling and model-fitting in hBayesDM. If you're not familiar with Stan, it is recommended to learn Stan from its documentation and reference manuals.

Your (newly written) Stan file should be stored into hBayesDM/commons/stan_files/ with the same convention above. Only the difference is that it uses .stan as its extention.

# Given `task_code`, `model_code`, and `model_type` (optional),
# the filename should be defined as below:
  <task_code>_<model_code>[_<model_type>].stan

# Example 1:
  ra_prospect.stan
# ^^ ^^^^^^^^
# (task_code = 'ra', model_code = 'prospect')

# Example 2:
  choiceRT_ddm_single.stan
# ^^^^^^^^ ^^^ ^^^^^^
# (task_code = 'choiceRT', model_code = 'ddm', model_type = 'single')

Details on the Stan model

For hierarchical type of models, it must utilize non-centered parameterizations (see more on this link). In the parameter block, mu_pr, sigma, and <parameter>_prs should be defined. For example, if there are three parameters (alpha, beta, gamma):

parameters {
    // Group-level parameters
    vector[3] mu_pr;
    vector[3] sigma;

    // Subject-level raw parameters (for Matt trick)
    vector[N] alpha_pr;
    vector[N] beta_pr;
    vector[N] gamma_pr;
}

Step 3) Provide example data for the task (optional)

Now you should provide small-sized example data on /hBayesDM/commons/extdata/. You only have to do this once per every new task, whereas YAML and Stan files need to be written for each new model.

Start off by isolating the data columns you need for the modeling. Remove all the other columns in the example file, and change the names of the columns you will use to something representative but short.

Of course, because this is a hierarchical Bayesian modeling package that fits model-parameters of multiple subjects, you will always need a reserved column to specify the subject's ID for each row of data. Make sure to name this data column subjID. (This is the convention we use in hBayesDM.) Currently, hBayesDM requires that the example file (and user data) follow a tab-separated format.

The name of the example data should follow the following convention:

# Given `task_code` and `model_type` (optional),
# the filename should be defined as below:
  <task_code>[_<model_type>]_exampleData.txt

# Example 1:
  ra_exampleData.txt
# ^^
# (task_code = 'ra')

# Example 2:
  choiceRT_single_exampleData.txt
# ^^^^^^^^ ^^^^^^
# (task_code = 'choiceRT', model_type = 'single')

Refer to the other example files in directory hBayesDM/commons/extdata/ for help.

Step 4) Generate R and Python codes

Run generate-codes.sh in hBayesDM/commons/ to automatically generate R and Python codes based on model information in hBayesDM/commons/models.

cd $HBAYESDM_GIT_DIRECTORY/commons
./generate-codes.sh

where $HBAYESDM_GIT_DIRECTORY is where you cloned the hBayesDM repository.

Note that it requires Python 3.6 or higher version with PyYAML installed.

Step 5) Document auto-generated R codes

roxygen2 helps developers with the petty details of managing an R package.
Once you've completed all the steps up to now, run roxygenize() by one the following methods:

Using R Studio (recommended)

Open the hBayesDM project via the `hBayesDM.Rproj` file in the repo directory.

roxygen2::roxygenize()
Using R console on the terminal

cd $HBAYESDM_GIT_DIRECTORY/R
R  # this opens R console

where $HBAYESDM_GIT_DIRECTORY is where you cloned the hBayesDM repository; then run:

> roxygen2::roxygenize()
Directly from terminal
cd $HBAYESDM_GIT_DIRECTORY/R
Rscript -e 'roxygen2::roxygenize()'
# opens & executes on one go

where $HBAYESDM_GIT_DIRECTORY is where you cloned the hBayesDM repository.

If it gives an error like the following:

Error in getDLLRegisteredRoutines.DLLInfo(dll, addNames = FALSE) : 
  must specify DLL via a “DLLInfo” object. See getLoadedDLLs()

Run this command first, then try again:

pkgbuild::compile_dll()

After running roxygenize() make sure that:

  • roxygenize() has not returned any errors.
  • The DESCRIPTION file has been updated to include your new model.
  • The NAMESPACE file has been updated to include your new model.
  • A new file man/<your-model-name>.Rd has been created.

Step 6) Implement preprocess_funcs to specify how to preprocess data

Now, you should define functions to preprocess data on hBayesDM/R/R/preprocess_funcs.R (for R) and hBayesDM/Python/hbayesdm/preprocess_funcs.py (for Python).

For both R and Python, The name the preprocessing function should follow the following convention:

# Given `task_code` and `model_type` (optional),
# the filename should be defined as below:
  <task_code>[_<model_type>]_preprocess_func

# Example 1:
  ra_preprocess_func
# ^^
# (task_code = 'ra')

# Example 2:
  choiceRT_single_preprocess_func
# ^^^^^^^^ ^^^^^^
# (task_code = 'choiceRT', model_type = 'single')

Step 7) Install R and Python packages

Now you can install hBayesDM on R or Python. For R:

cd $HBAYESDM_GIT_DIRECTORY/R
Rscript -e 'devtools::install()'

and for Python:

cd $HBAYESDM_GIT_DIRECTORY/Python
pip install .