Skip to content

AllenNeuralDynamics/aind-foraging-behavior-bonsai-trigger-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Foraging Behavior Pipeline for Bonsai

(Han Hou @ Aug 2023)

This is still a temporary workaround until AIND behavior pipeline is implemented.

Pipeline structure

image

1. (On Han's PC) Upload raw behavior data to cloud (github)

  • From all behavior rigs, fetch raw behavior files (.json) generated by the foraging-bonsai GUI
  • Turn .json files into .nwb files, which contain both data and metadata
  • Upload all .nwb files to a single S3 bucket s3://aind-behavior-data/foraging_nwb_bonsai/

2. (In Code Ocean, this repo) Trigger computation (CO capsule: foraging_behavior_bonsai_pipeline_trigger, github)

3. (In Code Ocean) Visualization by Streamlit app (CO capsule: foraging-behavior-browser, github)

The Streamlit app fetches data from the above S3 bucket and generates data viz. You could run the app either on Code Ocean (recommended) or on Streamlit public cloud

Automatic training

See this repo

How to add more rigs

  • On the rig PC, share the data folder to the Windows Network.

  • Make sure the data folder is accessible through typing the network address like \\W10DT714033\behavior_data\447-1-D in Windows Explorer on another PC.

  • Share with Han the PC name (e.g. "W10DT714033"), the data folder (e.g., "C:\behavior_data"), and the Windows username & passcode (we'll typically need this if the data folder is not on VAST)

    (Han will take care of the remaining steps)

  • Add the new rig info here.

    • If it is a standard training rig, simply add it to rig_mapper.json
    • Else, add it here by providing local cache folder, remote folder, and username (if needs username and passcode to access the data)
  • SSH to the server machine (currently W10DT714033), and

    • pull the above changes.
    • If the passcode is required, add the passcode to passcode.json in the local repo.

How to add more analyses

The pipeline is still a prototype at this moment. As you can see in the Streamlit app, so far I only implemented two basic analyses:

  • compute essential session-wise stats
  • generate a simple plot of choice-reward history

To add more analyses to the pipeline, just plug in your own function here. Your function should take nwb as an input and generate plots or any other results with filename starting with session_id.

If you would like to access the .nwb files directly or do analysis outside Code Ocean (not recommended though), check out this bucket s3://aind-behavior-data/foraging_nwb_bonsai/. For details, see below.

Pipeline-ready checklist

Checklist before the pipeline is ready to run:

  1. CO pipeline Han_pipeline_foraging_behavior_bonsai:

    • No yellow warning sign (otherwise, do a Reproducible Run of that capsule first)

    • Check the argument of foraging_behavior_bonsai_pipeline_assign_job that controls the number of capsule instances

    • Check the argument of foraging_behavior_bonsai_nwb that controls the number of multiprocessing cores of each instance.

      • This number should match the core number of "Adjust Resources for capsule in pipeline"
    • Make sure the pipeline is set to use "Spot instances" (otherwise it takes too long to start) and "without cache" (otherwise the input S3 bucket will not be updated)

  2. Make sure these capsules are not running (Status is four gray dots; VSCode are held or terminated)

    • foraging_behavior_bonsai_pipeline_assign_job
    • foraging_behavior_bonsai_nwb
    • foraging_behavior_bonsai_pipeline_collect_and_upload_results
  3. Make sure one and only one instance of foraging_behavior_bonsai_pipeline_trigger is running.

  4. Make sure one and only one instance of foraging-behavior-bonsai-automatic-training is running.

Notes on manually re-process all nwbs and overwrite S3 database (and thus the Streamlit app)

Important

I should do this after work hours, as it will be disruptive to the AutoTrain system. (see this issue)

  1. Stop the triggering capsule and the AutoTraining capsule.
  2. (optional) Re-generate all nwbs
    • Backup nwb folder on my PC and S3
    • On S3, move the old /foraging_nwb_bonsai to a backup folder and create a new /foraging_nwb_bonsai
    • Re-generate nwbs from jsons on my PC
  3. Backup and clear /foraging_nwb_bonsai_processed bucket
    • On S3, copy the folder to a backup folder
    • Clear the old folder
      • If you don't clear it, at least you should delete df_sessions.pkl, error_files.json, and pipeline.log (they will be appended, not overwritten)
      • Troubleshooting: when attaching a S3 folder to a capsule, the folder must not be empty (otherwise a "permission denied" error)

Case A: still use the pipeline (recommended)

  1. Make sure to assign 10 or more workers and set CPU number = 16 (for spot machine) and argument = 16. In this case, you'll have > 10 * 16 = 160 total cores!

  1. Trigger the pipeline as usual. In this case, only diff of nwb and nwb_processed will be processed. (it works well if you have already cleaned up the processed folder)

Case B: manually run each capsule (obsoleted)

  1. Manually trigger the batch computation in capsule foraging_behavior_bonsai_nwb:
    • Make sure the CPU number of the environment is 16 or more :)
    • Run processing_nwb.py manually in parallel (with LOCAL_MANUAL_OVERRIDE = True)
  2. Manually trigger the collect_and_upload capsule:
    • Manually register a data asset:
      • Use any name, but mount must be data/foraging_behavior_bonsai_pipeline_results
      • The data asset cannot be registered in VSCode?? @20240303 I can only create data asset outside VSCode.
    • In the capsule collect_and_upload_restuls, manually attach the data asset just created, and press Reproducible Run.
      • I have adapted collect_and_upload_restuls so that it can also accept data that are not in /1, /2, ... like those from the pipeline run.
  3. To restore the pipeline, follow above "Pipeline-ready checklist"

Accessing foraging .nwbs for off-pipeline analysis

.nwb datasets Dataset 1 Dataset 2 (old)
Where are the data collected? AIND Janelia and AIND
Behavior hardware Bonsai-Harp Bpod
Size 1423 sessions / 92 mice 4327 sessions / 157 mice
Modality behavior only 3803 sessions / 157 mice: pure behavior
35 sessions / 8 mice: ephys + DLC outputs
Still growing? Yes; updating daily (by the current repo) No longer updating
NWB format New bonsai nwb format Compatible with the new bonsai nwb format
Raw NWBs - S3 bucket: s3://aind-behavior-data/foraging_nwb_bonsai/
- CO data asset: foraging_nwb_bonsai
(id=f908dd4d-d7ed-4d52-97cf-ccd0e167c659)
- S3 bucket: s3://aind-behavior-data/foraging_nwb_bpod/
- CO data asset: foraging_nwb_bpod
(id=4ba57838-f4b7-4215-a9d9-11bccaaf2e1c)
Processed results - S3 bucket: s3://aind-behavior-data/foraging_nwb_bonsai_processed/
- CO data asset: foraging_nwb_bonsai_processed
(id=4ad1364f-6f67-494c-a943-1e957ab574bb)
- S3 bucket: s3://aind-behavior-data/foraging_nwb_bpod_processed/
- CO data asset: foraging_nwb_bpod_processed
(id=7f869b24-9132-43d3-8313-4b481effeead)
Code Ocean example capsule foraging_behavior_bonsai_nwb foraging_behavior_bonsai_nwb
Streamlit visualization The Streamlit behavior browser Click "include old Bpod sessions" in the app
How to access the master table showing in the app? the df_sessions.pkl file in the "Processed results" path above same as left (except the "Processed results" path for bpod)
Notes Some sessions have fiber photometry data or ephys data collected at the same time, but they have not been integrated to the .nwbs yet. Some sessions have fiber photometry data collected at the same time, but they have not been integrated to the .nwbs yet.

What's next

We will likely be refactoring the pipeline after we figure out the AIND behavior metadata schema, but the core ideas and data analysis code developed here will remain. Stay tuned.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published