Skip to content

The python data pipeline defined with DataJoint for U19 projects

Notifications You must be signed in to change notification settings

datajoint-company/U19-pipeline_python

 
 

Repository files navigation

U19 Python Pipeline

The U19-pipeline_python repository defines the DataJoint tables for the U19 projects. There is a companion MATLAB DataJoint pipeline where much of this repository has mirrored table definitions.

Installation

  • The following instructions will detail two types of installation methods.
    1. User installation to access and fetch data from the database.
    2. Developer installation to set up the pipeline for running analysis and fetching data.

Recommended prerequisites

  • The following prerequisites are recommended for both installation methods.

    Click to expand details

    Install an integrated development environment

    • DataJoint development and use can be done with a plain text editor in the terminal. However, an integrated development environment (IDE) can improve your experience. Several IDEs are available.

    • In this setup example, we will use Microsoft's Visual Studio Code. Installation instructions here.

    • Install the Jupyter extension for VS Code.

    Connect to PNI resources

    • The Princeton Neuroscience Institute (PNI) provides computing resources. You can optionally use these resources or setup the pipeline on your local machine.

    • Spock is the high performance computational cluster

    • Scotty is used for interactive sessions

    Install a virtual environment

    • A virtual environment allows you to install the packages required for a specific project within an isolated environment on your computer.

    • It is highly recommended to create a virtual environment to run the workflow.

    • Conda and virtualenv are virtual environment managers and you can use either option. Below are the commands for Conda.

    • If you are setting up the pipeline on your local machine follow the instructions below for Conda. If you are using spock.pni.princeton.edu or scotty.pni.princeton.edu, Conda is preinstalled and you can access it by running module load anacondapy/2021.11.

    • We will install Miniconda which is a minimal installer for conda.

      • Select the Miniconda installer link for your operating system and follow the instructions.

      • You may need to add the Miniconda directory to the PATH environment variable

        • First locate the Miniconda directory

        • Then modify and run the following command

          export PATH="<absolute-path-to-miniconda-directory>/bin:$PATH"
    • Create a new conda environment

      • Type the following command into a terminal window

        conda create -n <environment_name> python=<version>
      • Example command to create a conda environment

        conda create -n U19-pipeline_python python=3.8.11
    • Activate the conda environment

      conda activate <environment_name>

    Install git

    • Linux and Mac operating systems come preinstalled with Git. If running in Windows get Git.

    Install graphviz

User installation

  • The following instructions will allow a user to access and fetch data from the database.

    Click to expand details

    Install DataJoint

    • Activate the conda environment

      conda activate <environment_name>
    • Install DataJoint

    pip install datajoint

    Access the database

    • In a new Jupyter notebook, run the following commands.

      import getpass
      import datajoint as dj
      
      dj.config['database.host'] = 'datajoint00.pni.princeton.edu'
      dj.config['database.user'] = '<username>'
      dj.config['database.password'] = getpass.getpass() # enter the password securily
      
      scan = dj.create_virtual_module('scan', 'u19_scan_element')
      imaging = dj.create_virtual_module('imaging', 'u19_imaging_element')
      
      probe = dj.create_virtual_module('probe', 'u19_probe_element')
      ephys = dj.create_virtual_module('ephys', 'u19_ephys_element')
      
    • Now that the virtual modules are created to access the tables in the database, you can query and fetch from the database.

Developer installation

  • The following instructions will allow a user to set up the pipeline for running analysis and fetching data.

    Click to expand details

    Fork and clone the repository

    • In a broswer, navigate to the BrainCOGS/U19-pipeline_python repository and fork this repository.

    • In a terminal window, clone your fork of the repository to your local machine.

      git clone https://github.com/<GitHub username>/U19-pipeline_python.git
      
    • If you cannot clone repositories with ssh, set keys.

    Install the repository

    • Activate the conda environment
    conda activate <environment_name>
    • Change directory to this repository
    cd U19-pipeline_python
    • Install this repository in editable mode
    pip install -e .

    Configure the DataJoint connection to the database

    • See the following Jupyter notebook to configure DataJoint. notebooks/00-datajoint-configuration.ipynb

    • Ephys element and imaging element require root paths for ephys and imaging data. Here are the notebooks showing how to set up the configurations properly.

Tutorials

We have created some tutorial notebooks to help you start working with DataJoint.

  1. Querying data

    • jupyter notebook notebooks/tutorials/1-Explore U19 data pipeline with DataJoint.ipynb
  2. Building analysis pipelines

    • Recommended if you are going to create new databases or tables for analysis.
    • jupyter notebook notebooks/tutorials/2-Analyze data with U19 pipeline and save results.ipynb
    • jupyter notebook notebooks/tutorials/3-Build a simple data pipeline.ipynb

Accessing data files on your system

Major schemas in the pipeline

Click to expand details

lab

Lab Diagram

reference

Reference Diagram

subject

Subject Diagram

action

Action Diagram

acquisition

Acquisition Diagram

task

Task Diagram

behavior

Behavior data for Towers task.

Behavior Diagram

ephys_element

  • Ephys related tables were created with DataJoint Element Array Ephys, processing ephys data aquired with SpikeGLX and pre-processed by Kilosort2. For this pipeline we are using the (acute) ephys module from element-array-ephys.

Ephys Diagram

imaging

  • Imaging pipeline processed with customized algorithm for motion correction and CNMF for cell segmentation in matlab.

Imaging Diagram

scan_element and imaging_element

Scan element and imaging element Diagram

Datajoint features

Import datajoint as follows:

import datajoint as dj

Update a table entry

dj.Table._update(schema.Table & key, 'column_name', 'new_data')

Get list of all column names in a table (without having to issue a query or fetch)

table.heading.attributes.keys()

This also works on a query object:

schema = dj.create_virtual_module("some_schema","some_schema")
query_object = schema.Sample() & 'sample_name ="test"'
query_object.heading.attributes.keys()

About

The python data pipeline defined with DataJoint for U19 projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.8%
  • Python 2.2%