(Han Hou @ Aug 2023)
This is still a temporary workaround until AIND behavior pipeline is implemented.
1. (On Han's PC) Upload raw behavior data to cloud (github)
- From all behavior rigs, fetch raw behavior files (.json) generated by the foraging-bonsai GUI
- Turn .json files into .nwb files, which contain both data and metadata
- Upload all .nwb files to a single S3 bucket
s3://aind-behavior-data/foraging_nwb_bonsai/
2. (In Code Ocean, this repo) Trigger computation (CO capsule: foraging_behavior_bonsai_pipeline_trigger
, github)
- Identify unprocessed .nwb files (github)
- Send unprocessed .nwb files to
CO pipeline: Han_pipeline_foraging_behavior_bonsai
.
In the CO pipeline:- Distribute .nwb files to parallel workers (
CO capsule: foraging_behavior_bonsai_pipeline_assign_job
, github) - Do real analysis on each .nwb file (
CO capsule: foraging_behavior_bonsai_nwb
, github), where arbitrary dataframes and figures are generated.
- Distribute .nwb files to parallel workers (
- Collect and combine results from the workers (
CO capsule: foraging_behavior_bonsai_pipeline_collect_and_upload_results
, github) - Upload results to this S3 bucket
s3://aind-behavior-data/foraging_nwb_bonsai_processed/
3. (In Code Ocean) Visualization by Streamlit app (CO capsule: foraging-behavior-browser
, github)
The Streamlit app fetches data from the above S3 bucket and generates data viz. You could run the app either on Code Ocean (recommended) or on Streamlit public cloud
See this repo
-
On the rig PC, share the data folder to the Windows Network.
-
Make sure the data folder is accessible through typing the network address like
\\W10DT714033\behavior_data\447-1-D
in Windows Explorer on another PC. -
Share with Han the PC name (e.g. "W10DT714033"), the data folder (e.g., "C:\behavior_data"), and the Windows username & passcode (we'll typically need this if the data folder is not on VAST)
(Han will take care of the remaining steps)
-
Add the new rig info here.
- If it is a standard training rig, simply add it to
rig_mapper.json
- Else, add it here by providing local cache folder, remote folder, and username (if needs username and passcode to access the data)
- If it is a standard training rig, simply add it to
-
SSH to the server machine (currently W10DT714033), and
- pull the above changes.
- If the passcode is required, add the passcode to
passcode.json
in the local repo.
The pipeline is still a prototype at this moment. As you can see in the Streamlit app, so far I only implemented two basic analyses:
- compute essential session-wise stats
- generate a simple plot of choice-reward history
To add more analyses to the pipeline, just plug in your own function here. Your function should take nwb
as an input and generate plots or any other results with filename starting with session_id
.
If you would like to access the .nwb files directly or do analysis outside Code Ocean (not recommended though), check out this bucket s3://aind-behavior-data/foraging_nwb_bonsai/
. For details, see below.
Checklist before the pipeline is ready to run:
-
CO pipeline
Han_pipeline_foraging_behavior_bonsai
:-
No yellow warning sign (otherwise, do a
Reproducible Run
of that capsule first) -
Check the argument of
foraging_behavior_bonsai_pipeline_assign_job
that controls the number of capsule instances -
Check the argument of
foraging_behavior_bonsai_nwb
that controls the number of multiprocessing cores of each instance.- This number should match the core number of "Adjust Resources for capsule in pipeline"
-
Make sure the pipeline is set to use "Spot instances" (otherwise it takes too long to start) and "without cache" (otherwise the input S3 bucket will not be updated)
-
-
Make sure these capsules are not running (
Status
is four gray dots; VSCode are held or terminated)foraging_behavior_bonsai_pipeline_assign_job
foraging_behavior_bonsai_nwb
foraging_behavior_bonsai_pipeline_collect_and_upload_results
-
Make sure one and only one instance of
foraging_behavior_bonsai_pipeline_trigger
is running. -
Make sure one and only one instance of
foraging-behavior-bonsai-automatic-training
is running.
Important
I should do this after work hours, as it will be disruptive to the AutoTrain system. (see this issue)
- Stop the triggering capsule and the AutoTraining capsule.
- (optional) Re-generate all nwbs
- Backup nwb folder on my PC and S3
- On S3, move the old
/foraging_nwb_bonsai
to a backup folder and create a new/foraging_nwb_bonsai
- Re-generate nwbs from jsons on my PC
- Backup and clear
/foraging_nwb_bonsai_processed
bucket- On S3, copy the folder to a backup folder
- Clear the old folder
- If you don't clear it, at least you should delete
df_sessions.pkl
,error_files.json
, andpipeline.log
(they will be appended, not overwritten) - Troubleshooting: when attaching a S3 folder to a capsule, the folder must not be empty (otherwise a "permission denied" error)
- If you don't clear it, at least you should delete
- Make sure to assign 10 or more workers and set
CPU number = 16
(for spot machine) andargument = 16
. In this case, you'll have > 10 * 16 = 160 total cores!
- Trigger the pipeline as usual. In this case, only diff of
nwb
andnwb_processed
will be processed. (it works well if you have already cleaned up theprocessed
folder)
- Manually trigger the batch computation in capsule
foraging_behavior_bonsai_nwb
:- Make sure the CPU number of the environment is 16 or more :)
- Run
processing_nwb.py
manually in parallel (withLOCAL_MANUAL_OVERRIDE = True
)
- Manually trigger the collect_and_upload capsule:
- Manually register a data asset:
- Use any name, but
mount
must bedata/foraging_behavior_bonsai_pipeline_results
- The data asset cannot be registered in VSCode?? @20240303 I can only create data asset outside VSCode.
- Use any name, but
- In the capsule
collect_and_upload_restuls
, manually attach the data asset just created, and pressReproducible Run
.- I have adapted
collect_and_upload_restuls
so that it can also accept data that are not in /1, /2, ... like those from the pipeline run.
- I have adapted
- Manually register a data asset:
- To restore the pipeline, follow above "Pipeline-ready checklist"
.nwb datasets | Dataset 1 | Dataset 2 (old) |
---|---|---|
Where are the data collected? | AIND | Janelia and AIND |
Behavior hardware | Bonsai-Harp | Bpod |
Size | 1423 sessions / 92 mice | 4327 sessions / 157 mice |
Modality | behavior only | 3803 sessions / 157 mice: pure behavior 35 sessions / 8 mice: ephys + DLC outputs |
Still growing? | Yes; updating daily (by the current repo) | No longer updating |
NWB format | New bonsai nwb format | Compatible with the new bonsai nwb format |
Raw NWBs | - S3 bucket: s3://aind-behavior-data/foraging_nwb_bonsai/ - CO data asset: foraging_nwb_bonsai (id=f908dd4d-d7ed-4d52-97cf-ccd0e167c659) |
- S3 bucket: s3://aind-behavior-data/foraging_nwb_bpod/ - CO data asset: foraging_nwb_bpod (id=4ba57838-f4b7-4215-a9d9-11bccaaf2e1c) |
Processed results | - S3 bucket: s3://aind-behavior-data/foraging_nwb_bonsai_processed/ - CO data asset: foraging_nwb_bonsai_processed (id=4ad1364f-6f67-494c-a943-1e957ab574bb) |
- S3 bucket: s3://aind-behavior-data/foraging_nwb_bpod_processed/ - CO data asset: foraging_nwb_bpod_processed (id=7f869b24-9132-43d3-8313-4b481effeead) |
Code Ocean example capsule | foraging_behavior_bonsai_nwb |
foraging_behavior_bonsai_nwb |
Streamlit visualization | The Streamlit behavior browser | Click "include old Bpod sessions" in the app |
How to access the master table showing in the app? | the df_sessions.pkl file in the "Processed results" path above |
same as left (except the "Processed results" path for bpod ) |
Notes | Some sessions have fiber photometry data or ephys data collected at the same time, but they have not been integrated to the .nwbs yet. | Some sessions have fiber photometry data collected at the same time, but they have not been integrated to the .nwbs yet. |
We will likely be refactoring the pipeline after we figure out the AIND behavior metadata schema, but the core ideas and data analysis code developed here will remain. Stay tuned.