diff --git a/tutorials/01-DataJoint Basics.ipynb b/tutorials/01-DataJoint Basics.ipynb index 8523a9f..2ea6198 100644 --- a/tutorials/01-DataJoint Basics.ipynb +++ b/tutorials/01-DataJoint Basics.ipynb @@ -11,21 +11,44 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that you have successfully connected to DataJoint (if not, please visit [Connecting to DataBase](00-ConnectingToDatabase.ipynb) first), let's dive into using DataJoint! In this notebook, we will:\n", + "Congratulations! If you are reading this, you have successfully connected to the first DataJoint tutorial notebook: `01-DataJoint Basics`. \n", "\n", - "1. learn what a data pipeline is\n", - "2. create our first simple data pipeline in DataJoint\n", - "3. insert some data into the pipeline\n", - "4. basic queries to flexibly explore the data pipeline\n", - "5. fetch the data from the pipeline\n", - "6. delete entries from tables" + "This tutorial will walk you through the major concepts and steps to use DataJoint: \n", + "\n", + "- Essential concepts and setup\n", + " - Data pipelines\n", + " - Concept\n", + " - Practical examples\n", + " - Schemas and tables\n", + " - Concept\n", + " - Practical examples\n", + " - Basic relational operators\n", + " - Create tables with dependencies\n", + " - Querying data\n", + "- Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Essential concepts and setup\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both interactive and local environments come with the latest DataJoint Python package pre-installed, along with many other popular [Python](https://www.python.org/) packages for scientific computations such as [NumPy](http://www.numpy.org/), [SciPy](https://www.scipy.org/), and [Matplotlib](https://matplotlib.org/). \n", + "\n", + "Like any other package, to start using [DataJoint](https://datajoint.com/docs/), you must first import the package `datajoint`. The convention is to alias the package to `dj`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "As always, let's start by importing the `datajoint` library." + "*NOTE: Run code cells by clicking on the left-top corner bottom of the cell or using Ctrl+Enter shortcut.*" ] }, { @@ -41,43 +64,55 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# So... What is a data pipeline?" + "### Data pipelines\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Concept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "If you visit the [documentation for DataJoint](https://docs.datajoint.io/introduction/Data-pipelines.html), we define a data pipeline as follows:\n", - "> A data pipeline is a sequence of steps (more generally a directed acyclic graph) with integrated storage at each step. These steps may be thought of as nodes in a graph.\n", "\n", - ">* Nodes in this graph are represented as database **tables**. Examples of such tables include `Subject`, `Session`, `Implantation`, `Experimenter`, `Equipment`, but also `OptoWaveform`, `OptoStimParams`, or `NeuronalSpikes`. \n", + ">* A data pipeline is a collection of processes and steps for organizing the data and computations. Data pipelines perform complex data acquisition sequences, processing, and analysis with integrated storage at each step. These steps may be thought of as nodes in a directed graph that defines their order of execution.\n", + "\n", + ">* Nodes in this graph are represented as database **tables**. Examples of such tables include \"Subject\", \"Session\", \"Implantation\", \"Experimenter\", \"Equipment\", but also \"OptoWaveform\", \"OptoStimParams\", or \"Neuronal spikes\". \n", "\n", ">* The data pipeline is formed by making these tables interdependent (as the nodes are connected in a network). A **dependency** is a situation where a step of the data pipeline is dependent on a result from a sequentially previous step before it can complete its execution. A dependency graph forms an entire cohesive data pipeline. \n", "\n", - "In order to create a data pipeline, you need to know the \"things\" in your experiments\n", - "and the relationship between them. Within the pipeline you will then:\n", + "A [DataJoint pipeline](https://datajoint.com/docs/core/datajoint-python/0.14/concepts/terminology/) contains database table definitions, dependencies, and associated computations, together with the transformations underlying a DataJoint workflow. \n", "\n", - "1. define these \"things\" as tables in which you can store the information about them.\n", - "2. define the relationships (in particular the dependencies) between the \"things\".\n", + "The following figure is an example pipeline using [DataJoint Element for Multi-photon Calcium Imaging](https://datajoint.com/docs/elements/element-calcium-imaging/):\n", "\n", - "The data pipeline can then serve as a map that describes everything that goes on in your experiment, capturing what is collected, what is processed, and what is analyzed/computed. A well designed data pipeline not only let's you organize your data well, but can bring out logical clarity to your experiment, and may even bring about new insights by making how everything in your experiment relates together obvious.\n", + "![pipeline](../images/pipeline-calcium-imaging.svg)\n", "\n", - "Let's go ahead and build together a pipeline from scratch to better understand what a data pipeline is all about." + "A **well-designed data pipeline**: \n", + "- Collects, organizes, and stores **every relevant piece of information during the scientific research.**\n", + "- Integrates, processes, and connects these pieces of information through **several steps**.\n", + "- Analyzes and transforms the input data into **valuable insights for the research**, bringing together logical clarity to the experiments.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Practical examples" + "##### Practical examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's build a pipeline to collect, store and process data and analysis for our hypothetical single electrode recording or calcium imaging recording in mice. To help us understand the project better, here is a brief description:" + "The practical examples that will be used in the next tutorials will allow you to design and compute a data pipeline for a scientific project of two experiments on rodents: \n", + "- Single-electrode recording\n", + "- Calcium imaging recording\n", + "\n", + "Let's start with a brief description of this project's context:" ] }, { @@ -85,31 +120,32 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> * Your lab houses many mice, and each mouse is identified by a unique ID. You also want to keep track of information about each mouse such as their date of birth, and gender.\n", - "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with more than one mouse in a day! However, on any given day, a mouse undergoes at most one recording session.\n", - "> * For each experimental session, you would like to record what mouse you worked with and when you performed the experiment. You would also like to keep track of other helpful information such as the experimental setup you worked on. \n", + "> * Your lab houses many mice, and a unique ID identifies each mouse. You also want to keep track of other information about each mouse, such as their date of birth and gender.\n", + "> * As a hard-working neuroscientist, you perform experiments every day, sometimes working with more than one mouse daily. However, a mouse undergoes at most one recording session on any given day.\n", + "> * For each experimental session, you want to record what mouse you worked with and when you performed the experiment. You also want to keep track of other helpful information, such as the experimental setup you used. \n", "\n", - "> * In a session of electrophysiology\n", - ">> * you record electrical activity from a single neuron. You use recording equipment that produces separate data files for each neuron you recorded.\n", - ">> * Neuron's activities are recorded as raw traces. Neuron's spikes needs to be detected for further analysis to be performed.\n", - "> * In a session of calcium imaging\n", - ">> * you scan a brain region containing a number of neurons. You use recording equipment that produces separate data files for each scan you performed.\n", - ">> * you would like to segment the frames and get the regions of interest (ROIs), and save a mask for each ROI\n", - ">> * finally you would like to extract the trace from each segmented ROI" + "> * In a session of electrophysiology:\n", + ">> * You record electrical activity from a single neuron. You use recording equipment that produces separate data files for each neuron you record.\n", + ">> * Neuron's activities are recorded as raw traces. Neuron's spikes need to be detected for further analysis to be perform.\n", + "\n", + "> * In a session of calcium imaging:\n", + ">> * You scan a brain region containing several neurons. You use recording equipment that produces separate data files for each scan you performed.\n", + ">> * You need to segment the frames and get the regions of interest (ROIs), and save a mask for each ROI\n", + ">> * In addition, you need to extract the trace from each segmented ROI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Pipeline design starts by identifying **things** or **entities** in your project. Common entities includes experimental subjects (e.g. mouse), recording sessions, and two-photon scans." + "The design of a data pipeline starts by identifying the **entities** or **tables** in your research project. Common entities include experimental subjects (e.g. mouse), recording sessions, and two-photon scans." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's revisit the project description, this time paying special attention to **what** (e.g. nouns) about your experiment. Here I have highlighted some nouns in particular." + "Let's revisit the project description, this time paying special attention to **what** (e.g. nouns or entities) about your experiment. Here, some particular entities are highlighted." ] }, { @@ -117,45 +153,46 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> * Your lab houses many **mice**, and each mouse is identified by a unique ID. You also want to keep track of information about each mouse such as their date of birth, and gender.\n", - "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with more than one mouse in a day! However, on an any given day, a mouse undergoes at most one recording session.\n", - "> * For each **experimental session**, you would like to record what mouse you worked with and when you performed the experiment. You would also like to keep track of other helpful information such as the experimental setup you worked on. \n", + "> * Your lab houses many **mice**, and a unique ID identifies each mouse. You also want to keep track of other information about each mouse, such as their date of birth and gender.\n", + "> * As a hard-working neuroscientist, you perform experiments every day, sometimes working with more than one mouse daily. However, a mouse undergoes at most one recording session on any given day.\n", + "> * For each **experimental session**, you want to record what mouse you worked with and when you experimented. You also want to keep track of other helpful information, such as the experimental setup you worked on. \n", + "\n", + "> * In a session of electrophysiology:\n", + ">> * You record electrical activity from a **single neuron**. You use recording equipment that produces separate data files for each neuron you record.\n", + ">> * Neuron's activities are recorded as raw traces. **Neuron's spikes** need to be detected for further analysis to be perform.\n", "\n", - "> * In a session of electrophysiology\n", - ">> * you record electrical activity from a **single neuron**. You use recording equipment that produces separate data files for each neuron you recorded.\n", - ">> * Neuron's activities are recorded as raw traces. **Neuron's spikes** needs to be detected for further analysis to be performed.\n", - "> * In a session of calcium imaging\n", - ">> * you scan a brain region containing a number of neurons. You use recording equipment that produces separate data files for each **scan** you performed.\n", - ">> * you would like to segment the frames and get the **regions of interest (ROIs)**, and save a mask for each ROI\n", - ">> * finally you would like to extract the **trace** from each segmented ROI" + "> * In a session of calcium imaging:\n", + ">> * You **scan** a brain region containing several neurons. You use recording equipment that produces separate data files for each scan you performed.\n", + ">> * You need to segment the frames and get the **regions of interest (ROIs)**, and save a mask for each ROI\n", + ">> * In addition, you need to extract the **trace** from each segmented ROI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Just by going through the description, we can start to identify **entities** that need to be stored and represented in our data pipeline:\n", + "Just by going through the description, we can start to identify **entities** that needs to be stored and represented in our data pipeline:\n", "\n", - "* mouse\n", - "* experimental session\n", + ">* Mouse\n", + ">* Experimental session\n", "\n", - "For ephys:\n", + "For Ephys:\n", "\n", - ">* neuron\n", - ">* spikes\n", + ">* Neuron\n", + ">* Spikes\n", "\n", - "For calcium imaging:\n", + "For Calcium Imaging:\n", "\n", - ">* scan\n", - ">* regions of interest\n", - ">* trace" + ">* Scan\n", + ">* Regions of interest (ROI)\n", + ">* Trace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In the current notebook, we will design the tables for mouse and experimental sessions, the rest of the pipeline will be designed in the subdirectory `electrophysiology` and `calcium_imaging`" + "In the next section, you will learn to design the tables and manipulate the data for `Mouse` and `Experimental sessions`. The rest of the pipeline (`Ephys` and `Calcium Imaging` will be addressed in the subsequent tutorials." ] }, { @@ -182,12 +219,9 @@ "\n", "It is essential to think about what information will **uniquely identify** each entry. \n", "\n", - "In this case, the information that uniquely identifies the `Mouse` table is their\n", - "**mouse ID** - a unique ID number assigned to each animal in the lab. This attribute is\n", - "named the **primary key** of the table. By convention, table attributes are lower case\n", - "and do not contain spaces.\n", + "In this case, the information that uniquely identifies the `Mouse` table is their **mouse IDs** - a unique ID number assigned to each animal in the lab. This attribute is named the **primary key** of the table.\n", "\n", - "| `mouse_id*` (*Primary key attribute*)|\n", + "| Mouse_ID (*Primary key attribute*)|\n", "|:--------: | \n", "| 11234 |\n", "| 11432 |" @@ -197,13 +231,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "After some thought, we might conclude that each mouse can be uniquely identified by knowing its **mouse ID** - a unique ID number assigned to each mouse in the lab. The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. Such attribute is called the **primary key** of the table.\n", + "After some thought, we might conclude that each mouse can be uniquely identified by knowing its **mouse ID** - a unique ID number assigned to each mouse in the lab. \n", "\n", "The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. \n", "\n", "Such an attribute is called the **primary key** of the table: the subset of table attributes uniquely identifying each entity in the table. The **secondary attribute** refers to any field in a table, not in the primary key.\n", "\n", - "| `mouse_id*` (*Primary key attribute*) \n", + "| Mouse_ID (*Primary key attribute*) \n", "|:--------:| \n", "| 11234 (*Secondary attribute*)\n", "| 11432 (*Secondary attribute*)" @@ -220,14 +254,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For the case of mouse, what other information about the mouse you might want to store? Based on the project description, we would probably want to store information such as the mouse's **date of birth** and **gender**." + "For the case of `Mouse`, what other information about the mouse might you want to store? \n", + "\n", + "Based on the project description, we would probably want to store information such as the mouse's **date of birth** (DOB) and **sex**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "| `mouse_id*` | `dob` | `sex` |\n", + "| Mouse_ID | DOB | sex |\n", "|:--------:|------------|--------|\n", "| 11234 | 2017-11-17 | M |\n", "| 11432 | 2018-03-04 | F |" @@ -258,7 +294,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Every table lives inside a schema - a logical collection of one or more tables in your pipeline. Your final pipeline may consists of many tables spread across one or more schemas. Let's go ahead and create the first schema to house our table." + "Every table lives inside a schema - a logical collection of one or more tables in your pipeline. Your final pipeline will consist of many tables spread across one or more schemas. Let's go ahead and create the first schema to house our `Mouse` table using DataJoint." ] }, { @@ -266,7 +302,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We create the schema using `dj.schema()` function, passing in the name of the schema. For this workshop, you are given the database privilege to use any schema name. Let's create a schema called `tutorial`." + "We create the schema using `dj.schema()` function, passing in the schema's name. For this tutorial, we create a schema called `tutorial`." ] }, { @@ -282,7 +318,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that we have a schema to place our table into, let's go ahead and define our first table!" + "Now that we have a schema to place our table into let's go ahead and define our first table. " ] }, { @@ -296,7 +332,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In DataJoint, you define each table as a `class`, and provide the table definition (e.g. attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." + "In DataJoint, you define each table as a `class`, and provide the table definition (e.g., attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." ] }, { @@ -320,7 +356,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's take a look at our brand new table" + "Let's take a look at our brand-new table" ] }, { @@ -350,14 +386,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The table was successfully defined, but without any content, the table is not too interesting. Let's go ahead and insert some **mouse** into the table, one at a time using the `insert1` method." + "The table was successfully defined, but with content, the table will be more interesting. Let's go ahead and **insert some mouse information** into the table, one at a time, using the `insert1` method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's insert a mouse with the following information:\n", + "Let's use the `insert1` method to enter the following information into the table:\n", "* mouse_id: 0\n", "* date of birth: 2017-03-01\n", "* sex: male" @@ -385,7 +421,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You could also insert1 as a dictionary" + "You can also `insert1` as a dictionary:" ] }, { @@ -423,7 +459,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can also insert multiple **mice** together using the `insert` method, passing in a list of data." + "We can also insert multiple **mice** together using the `insert` method, passing in a list of data:" ] }, { @@ -452,7 +488,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Of course, you can insert a list of dictionaries" + "Of course, you can `insert` a list of dictionaries:" ] }, { @@ -490,7 +526,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "DataJoint checks for data integrity, and ensures that you don't insert a duplicate by mistake. Let's try inserting another mouse with `mouse_id: 0` and see what happens!" + "DataJoint checks for data integrity and ensures you don't insert a duplicate by mistake. Let's try inserting another mouse with `mouse_id: 0` and see what happens!\n", + "\n", + "*Note that the following code cell is intended to give an error code.* " ] }, { @@ -510,7 +548,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Go ahead and insert a few more mice into your table before moving on." + "Let's insert a few more mice into your table before moving on:" ] }, { @@ -553,33 +591,33 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Create tables with dependencies" + "### Create tables with dependencies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Congratulations! We have successfully created your first table! We are now ready to tackle and include other **entities** in the project into our data pipeline. \n", + "Congratulations! We have successfully created your first table! We are ready to tackle and include other **entities** into the project's data pipeline. \n", "\n", - "Let's now take a look at representing an **experimental session**." + "Let's now have a look at representing an `experimental session`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "As with `mouse`, we should consider **what information (i.e. attributes) is needed to identify an experimental `session`** uniquely. Here is the relevant section of the project description:\n", + "As with `mouse`, we should consider **what information (i.e., attributes) is needed to identify an `experimental session`** uniquely. Here is the relevant section of the project description:\n", "\n", - "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n", - "> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on." + "> * As a hard-working neuroscientist, you perform experiments daily, sometimes working with **more than one mouse in a day**. However, on any given day, **a mouse undergoes at most one recording session**.\n", + "> * For each **experimental session**, you want to record **what mouse you worked with** and **when you performed the experiment**. You also want to keep track of other helpful information, such as the **experimental setup** you worked on. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Based on the above, it seems that you need to know the following data to uniquely identify a single experimental session:\n", + "Based on the above, it seems that you need to know these two data to uniquely identify a single experimental session:\n", "\n", "* the date of the session\n", "* the mouse you recorded from in that session" @@ -589,7 +627,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot exist without a corresponding mouse! \n", + "Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n", "\n", "With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)." ] @@ -598,7 +636,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Thus we will need both **mouse** and a new attribute **session_date** to uniquely identify a single session. \n", + "Thus, we will need both the **mouse** and the new attribute **session_date** to identify a single `session` uniquely. \n", "\n", "Remember that a **mouse** is uniquely identified by its primary key - **mouse_id**. In DataJoint, you can declare that **session** depends on the mouse, and DataJoint will automatically include the mouse's primary key (`mouse_id`) as part of the session's primary key, alongside any additional attribute(s) you specify." ] @@ -626,7 +664,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can actually generate something similar to an entity relationship diagram (ERD) on the fly by calling `dj.Diagram` with the schema object. Many of the symbols and features are the same as the ERD standard." + "You can generate something similar to an entity relationship diagram (ERD) on the fly by calling `dj.Diagram` with the schema object. Many of the symbols and features are the same as the ERD standard." ] }, { @@ -642,7 +680,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's try inserting a few sessions manually." + "Let's insert a few sessions manually:" ] }, { @@ -674,7 +712,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's insert another session for `mouse_id = 0` but on a different date." + "Let's insert another session for `mouse_id = 0` but on a different date:" ] }, { @@ -699,7 +737,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And another session done on the same date but on a different mouse" + "And another session done on the same date but on a different mouse:" ] }, { @@ -732,7 +770,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "What happens if we try to insert a session for a mouse that doesn't exist?" + "What happens if we try to insert a session for a mouse that doesn't exist?:" ] }, { @@ -749,6 +787,13 @@ "}" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Note: the following code line is intended to give an error code:*" + ] + }, { "cell_type": "code", "execution_count": null, @@ -764,16 +809,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Querying data" + "### Querying data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Often times, you don't want all data but rather work with **a subset of entities** matching specific criteria. Rather than fetching the whole data and writing your own parser, it is far more efficient to narrow your data to the subset before fetching.\n", + "Oftentimes, you don't need to use all of the data but rather work with **a subset of entities** matching specific criteria. Rather than fetching the whole data and writing your parser, narrowing your data to the subset before fetching is far more efficient.\n", "\n", - "For this, DataJoint offers very powerful yet intuitive **querying** syntax that let's you select exactly the data you want before you fetch it.\n", + "For this, DataJoint offers a very powerful yet intuitive **querying syntax** that lets you select the data you want before you fetch it.\n", "\n", "It is also critical to note that the result of any DataJoint query represents a valid entity." ] @@ -782,45 +827,41 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will introduce the major types of queries used in DataJoint:\n", - "1. Restriction (`&`) and negative restriction (`-`): filter the data with certain conditions\n", - "2. Join (`*`): bring fields from different tables together\n", - "3. Projection (`.proj()`): focus on a subset of attributes\n", - "\n", - "Following the query operations, you might work with one or more of the following\n", - "data manipulation operations supported by DataJoint:\n", - " \n", - "1. Fetch (`.fetch()`): pull the data from the database\n", - "2. Deletion (`.delete()`): delete entries and their dependencies\n", - "3. Drop (`.drop()`): drop the table from the schema" + "We will introduce significant types of queries used in DataJoint:\n", + "* 1. Restriction (`&`) and negative restriction (`-`): filter the data with certain conditions\n", + "* 2. Join (`*`): bring fields from different tables together\n", + "* 3. Projection (`.proj()`): focus on a subset of attributes\n", + "* 4. Fetch (`.fetch()`): pull the data from the database\n", + "* 5. Deletion (`.delete()`): delete entries and their dependencies\n", + "* 6. Drop (`.drop()`): drop the table from the schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Restrictions (`&`) - filter data with certain conditions" + "### 1. Restrictions (`&` or `-`): filter the data with certain conditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The **restriction** operation, `&`, let's you specify the criteria to narrow down the table on the left." + "The **restriction** operation, `&`, allows you to specify the criteria to narrow down the table on the left." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Exact match" + "##### Exact match" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Mouse with id 0" + "Mouse with `ID = 0`:" ] }, { @@ -836,7 +877,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "All male mice (`'sex = \"M\"'`)" + "All the male (`M`) mice:" ] }, { @@ -852,7 +893,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "All female mice (`'sex=\"F\"'`)" + "All the female (`F`) mice:" ] }, { @@ -868,7 +909,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can also use as a dictionary as a restrictor, with one field or multiple fields" + "We can also use a dictionary as a restrictor, with one field or multiple fields:" ] }, { @@ -884,7 +925,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Inequality" + "### Inequality" ] }, { @@ -898,7 +939,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Mouse that is born **after 2017-01-01**" + "Mouse that is born `after 2017-01-01`:" ] }, { @@ -914,7 +955,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Mouse that is born within a range of dates" + "Mouse that is born within a range of dates:" ] }, { @@ -930,7 +971,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Mouse that is **not** male" + "Mice that are `not male`:" ] }, { @@ -946,14 +987,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can easily combine multiple restrictions to narrow down the entities based on multiple attributes." + "You can easily combine multiple restrictions to narrow the entities based on various attributes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's find all mice that **are not male** AND **born after 2017-01-01**." + "Let's find all mice that `are not male` and born `after 2017-01-01`:" ] }, { @@ -995,7 +1036,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "and among these mice, find ones with **mouse_id > 10**" + "It's your turn! Find and store the mice with a `mouse_id > 10`:" ] }, { @@ -1011,51 +1052,44 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In computer science/math lingo, DataJoint operations are said to **satisfy closure property**. Practically speaking, this means that the result of a query can immediately be used in another query, allowing you to build more complex queries from simpler ones. " + "In Computer Science and Math lingo, DataJoint operations are said to **satisfy closure property**. Practically speaking, this means that the result of a query can immediately be used in another query, allowing you to build more complex queries from simpler ones. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Restrict one table with another" + "### Restriction operator (`&`): all entities from one table for which there exist a matching entity in other table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "All mice that have a session" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Mouse & Session " + "Note that when restricting, for example, table A with table B (written A & B), the two tables must have common attributes (join-compatible). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Combine restrictions" + "To select all the `mice` that have a `session`:" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "All the above queries could be combined " + "Mouse & Session " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Male mice that had a session" + "All the above queries can be combined, for example, based on the `male mice` that are in a `session`:" ] }, { @@ -1071,7 +1105,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Give me all mice that have had an experimental session done on or before 2017-05-19" + "Another example of how to select the `mice` that participated in an `experimental session` done `on or before 2017-05-19`:" ] }, { @@ -1087,14 +1121,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Negative restriction: with the `-` operator" + "### Negative restriction (`-`): subset of entities from one table for which there are no matching entities in other table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "All mice that do not have any session" + "All the `mice` that do `not have any session`:" ] }, { @@ -1110,7 +1144,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Male mice that do not have any session" + "It's your turn! Find and store the male mice that do not have any session:" ] }, { @@ -1126,25 +1160,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Joining (*) - bring fields from different tables together" + "### 2. Join (`*`): bring fields from different tables together" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Sometimes you want to see information from multiple tables combined together to be viewed (and queried!) simultaneously. You can do this using the join `*` operator." + "Sometimes you want to view and query information simultaneously from multiple tables combined. You can do this using the join `*` operator." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Behavior of join:\n", + "The Join operator works as follows:\n", "\n", - "1. Match the common field(s) of the primary keys in the two tables.\n", - "2. Do a combination of the non-matched part of the primary key.\n", - "3. Listing out the secondary attributes for each combination.\n", + "1. Match the common field(s) of the primary keys in the two tables\n", + "2. Do a combination of the non-matched part of the primary key\n", + "3. List out the secondary attributes for each combination\n", "4. If two tables have secondary attributes that share a same name, it will throw an error. To join, we need to rename that attribute for at least one of the tables." ] }, @@ -1162,7 +1196,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here each row represents a unique (and valid!) combination of a mouse and a session." + "Each row represents a unique (and valid!) combination of a mouse and a session." ] }, { @@ -1195,15 +1229,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Projection .proj(): focus on attributes of interest\n", - "Beside restriction (`&`) and join (`*`) operations, DataJoint offers another type of operation: projection (`.proj()`). Projection is used to select attributes (columns) from a table, to rename them, or to create new calculated attributes. " + "### 3. Projection (`.proj()`): focus on attributes of interest\n", + "Besides restriction (`&`) and join (`*`) operations, DataJoint offers another type of operation: projection (`.proj()`). Projection is used to select attributes (columns) from a table, rename them, or create new calculated attributes. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "From the ***Mouse*** table, suppose we want to focus only on the `sex` attribute and ignore the others, this can be done as:" + "From the **Mouse** table, suppose we want to focus only on the `sex` attribute and ignore the others. This can be done as:" ] }, { @@ -1219,7 +1253,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that `.proj()` will always retain all attributes that are part of the primary key" + "Note that `.proj()` will always retain all attributes that are part of the primary key." ] }, { @@ -1228,7 +1262,7 @@ "metadata": {}, "source": [ "### Rename attribute with proj()\n", - "Say we want to rename the existing attribute `dob` of the `Mouse` table to `date_of_birth`, this can be done using `.proj()`" + "Say we want to rename the existing attribute `dob` of the `Mouse` table to `date_of_birth`. This can be done using `.proj()`:" ] }, { @@ -1245,7 +1279,7 @@ "metadata": {}, "source": [ "### Perform simple computations with proj()\n", - "Projection is perhaps most useful to perform simple computations on the attributes, especially on attributes from multiple tables by using in conjunction with the join (`*`) operation" + "Projection is perhaps most useful to perform simple computations on the attributes, especially on attributes from multiple tables, by using it in conjunction with the join (`*`) operation." ] }, { @@ -1262,7 +1296,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note: as you can see, the projection results keep the primary attributes from the `Mouse * Session` joining operation, while removing all other non-primary attributes. To Keep all other attributes, you can use the `...` syntax" + "Note: As you can see, the projection results keep the primary attributes from the `Mouse * Session` joining operation while removing all other non-primary attributes. To keep all the other attributes, you can use the `...` syntax." ] }, { @@ -1278,28 +1312,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fetch data" + "### 4. Fetch (`.fetch()`): pull the data from the database" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Once you have successfully narrowed down to the entities you want, you can fetch the query results just by calling fetch on it!" + "Once you have narrowed down to the entities you want, you can fetch the query results just by calling fetch on it!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Fetch one or multiple entries: `fetch()`" + "### Fetch one or multiple entries: `fetch()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "All male mouse" + "All the `male mice`:" ] }, { @@ -1332,7 +1366,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "or all in one step" + "Or all in one step:" ] }, { @@ -1348,7 +1382,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Fetch as a list of dictionaries" + "Fetch as a list of dictionaries:" ] }, { @@ -1364,7 +1398,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Fetch as a pandas dataframe" + "Fetch as a [Pandas dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) if needed:" ] }, { @@ -1380,7 +1414,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Fetch the primary key" + "Fetch the primary key:" ] }, { @@ -1396,7 +1430,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Fetch specific fields" + "Fetch specific fields:" ] }, { @@ -1430,7 +1464,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Or fetch them together as a list of dictionaries" + "Or fetch them together as a list of dictionaries:" ] }, { @@ -1447,14 +1481,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Fetch data from only one entry: `fetch1()`" + "### Fetch data from only one entry: `fetch1()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "When knowing there's only 1 result to be fetched back, we can use `.fetch1()`. `fetch1` will always return the fetched result in a dictionary format" + "When there is only one result to be fetched back, we can use `.fetch1()`. `fetch1` will always return the fetched result in a dictionary format:" ] }, { @@ -1471,7 +1505,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`fetch1()` could also fetch the primary key" + "`fetch1()` can also fetch the primary key:" ] }, { @@ -1487,7 +1521,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "or fetch specific fields:" + "Or fetch specific fields:" ] }, { @@ -1521,21 +1555,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Deletion (`.delete()`) - deleting entries and their dependencies" + "### 5. Deletion (`.delete()`): delete entries in the table " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now we have a good idea on how to restrict table entries, this is a good time to introduce how to **delete** entries from a table." + "Now that we have a good idea of how to restrict table entries, this is an excellent time to introduce how to **delete** entries from a table." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "To delete a specific entry, you restrict the table down to the target entry, and call `delete` method." + "To delete a specific entry, you restrict the table to the target entry, and call the `delete` method. Note that after running the following code line, you will have to confirm to commit the delete. \n", + "\n" ] }, { @@ -1551,7 +1586,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `delete` method on an *unrestricted* table will attempt to delete the whole table!" + "Calling `.delete()` method on an *unrestricted* table will attempt to delete the whole table!" ] }, { @@ -1649,32 +1684,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Congratulations! You have successfully created your first DatJoint pipeline, using dependencies to establish the link between the tables. You have also learned to query and fetch the data.\n", + "Congratulations! You have successfully created your first DataJoint pipeline, using dependencies to establish the link among the tables. You have also learned to query, fetch and delete the data.\n", "\n", - "In the next session, we are going to extend our data pipeline with tables to represent **imported data** and define new tables to **compute and hold analysis results**.\n", + "In the next session, we will extend our data pipeline with tables to represent **imported data** and define new tables to **compute and hold analysis results**.\n", "\n", - "We will use both ephys and calcium imaging as example pipelines:\n", - "+ [02-Calcium Imaging Imported Tables](./02-Calcium%20Imaging%20Imported%20Tables.ipynb)\n", - "+ [03-Calcium Imaging Computed Tables](./03-Calcium%20Imaging%20Computed%20Tables.ipynb)\n", - "+ [04-Electrophysiology Imported Tables](./04-Electrophysiology%20Imported%20Tables.ipynb)\n", - "+ [05-Electrophysiology Computed Tables](./05-Electrophysiology%20Computed%20Tables.ipynb)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# schema.drop()" + "\n", + "Please, continue to the next notebook `02-Calcium Imaging.ipynb`." ] } ], @@ -1697,7 +1712,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.16" + "version": "3.9.6" }, "vscode": { "interpreter": { diff --git a/tutorials/02-Calcium Imaging Imported Tables.ipynb b/tutorials/02-Calcium Imaging Imported Tables.ipynb index d5004c1..e7eede2 100644 --- a/tutorials/02-Calcium Imaging Imported Tables.ipynb +++ b/tutorials/02-Calcium Imaging Imported Tables.ipynb @@ -4,21 +4,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Working with automated computations: Imported tables" + "# Working with automated computations: Imported tables\n", + "# Application to Calcium Imaging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Welcome back! The practical example of this session is Calcium Imaging! \n", + "Welcome back! The practical example of this session is calcium imaging! \n", "\n", - "In this session, we will learn to:\n", + "![pipeline](../images/pipeline-calcium-imaging.svg)\n", "\n", "During this session you will learn:\n", "\n", "* To import neuron imaging data from data files into an `Imported` table\n", - "* To automatically trigger data importing and computations for all the missing entries with `populate`" + "* To automatically trigger data importing and computations for all the missing entries with `Populate`" ] }, { @@ -32,7 +33,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First thing first, let's import `datajoint` again." + "First thing first, let's import `DataJoint` again." ] }, { @@ -48,7 +49,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As we are going to perform some computations, let's go ahead and import NumPy and Matplotlib" + "As we are going to perform some computations, let's go ahead and import `NumPy` and `Matplotlib`." ] }, { @@ -73,34 +74,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `data` folder in this repository contains a small dataset of three different calcium imaging scans: `example_scan_01.tif`, `example_scan_02.tif`and `example_scan_03.tif`.\n", + "In the `data` folder in this `DataJoint-Tutorials`, you can find a small dataset of three different cases of calcium imaging scans: `example_scan_02.tif`, `example_scan_03.tif`and `example_scan_01.tif`.\n", "\n", "As you might know, calcium imaging scans (raw data) are stored as *.tif* files. \n", "\n", - "*NOTE: For this tutorial you do not need to explore this dataset thoroughly. It simply\n", - "serves as an example to populate our data pipeline with example data.*" + "*NOTE: For this tutorial there is no need to deeper explore this small dataset. Nevertheless, if you are curious about visualizing these example scans, we recommend you to open the TIFF with [ImageJ](https://imagej.nih.gov/ij/download.html).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Pipeline design: `Mouse` & `Session`" + "## First steps of the pipeline design: Schema, Mouse & Session" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can continue working with the tables we defined in the previous notebook in one of\n", - "two ways such that the classes for each table, `Mouse` and `Session`, are declared here: \n", - "* We can redefine them here. \n", - "* Import them from an existing file containing their table definitions.\n", - "\n", - "Here, for your convenience, we have included the schema and table\n", - "class definitions in a package called `tutorial_pipeline.mouse_session`, from which you\n", - "can import the classes as well as the schema object. We will use the schema object again\n", - "to define more tables." + "The DataJoint pipeline commonly starts with a `schema` and the following classes for each table: `Mouse` and `Session`. Let's quickly create this pipeline's first steps as we learned it in the previous session:" ] }, { @@ -109,14 +101,76 @@ "metadata": {}, "outputs": [], "source": [ - "from tutorial_pipeline.mouse_session import schema, Mouse, Session" + "schema = dj.schema('tutorial')\n", + "\n", + "@schema\n", + "class Mouse(dj.Manual):\n", + " definition = \"\"\"\n", + " # Experimental animals\n", + " mouse_id : int # Unique animal ID\n", + " ---\n", + " dob=null : date # date of birth\n", + " sex=\"unknown\" : enum('M','F','unknown') # sex\n", + " \"\"\"\n", + "\n", + "@schema\n", + "class Session(dj.Manual):\n", + " definition = \"\"\"\n", + " # Experiment session\n", + " -> Mouse\n", + " session_date : date # date\n", + " ---\n", + " experiment_setup : int # experiment setup ID\n", + " experimenter : varchar(100) # experimenter name\n", + " data_path='' : varchar(255) # relative path\n", + " \"\"\"\n", + "\n", + "mouse_data = [\n", + " {'dob': \"2017-03-01\", 'mouse_id': 0, 'sex': 'M'},\n", + " {'dob': \"2016-11-19\", 'mouse_id': 1, 'sex': 'M'},\n", + " {'dob': \"2016-11-20\", 'mouse_id': 2, 'sex': 'unknown'},\n", + " {'dob': \"2016-12-25\", 'mouse_id': 5, 'sex': 'F'},\n", + " {'dob': \"2017-01-01\", 'mouse_id': 10, 'sex': 'F'},\n", + " {'dob': \"2017-01-03\", 'mouse_id': 11, 'sex': 'F'},\n", + " {'dob': \"2017-05-12\", 'mouse_id': 100, 'sex': 'F'}\n", + "]\n", + "\n", + "session_data = [\n", + " {'experiment_setup': 0,\n", + " 'experimenter': 'Edgar Y. Walker',\n", + " 'mouse_id': 0,\n", + " 'session_date': \"2017-05-15\",\n", + " 'data_path': '../data/'\n", + " },\n", + " {'experiment_setup': 0,\n", + " 'experimenter': 'Edgar Y. Walker',\n", + " 'mouse_id': 0,\n", + " 'session_date': \"2017-05-19\",\n", + " 'data_path': '../data/'\n", + " },\n", + " {'experiment_setup': 1,\n", + " 'experimenter': 'Fabian Sinz',\n", + " 'mouse_id': 5,\n", + " 'session_date': \"2017-01-05\",\n", + " 'data_path': '../data/'\n", + " },\n", + " {'experiment_setup': 100,\n", + " 'experimenter': 'Jacob Reimer',\n", + " 'mouse_id': 100,\n", + " 'session_date': \"2017-05-25\",\n", + " 'data_path': '../data/'\n", + " }\n", + "]\n", + "\n", + "Mouse.insert(mouse_data, skip_duplicates=True)\n", + "Session.insert(session_data, skip_duplicates=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Take a quick look at the tables Mouse and Session" + "Take a quick look at the tables `Mouse` and `Session`:" ] }, { @@ -141,21 +195,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `mouse_session.py` also fills each table with data to make sure we are all on the same page." + "## Define the Scan table and its primary key attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Define table `Scan` for meta information of each calcium imaging scan" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's define a table `Scan` that describes a scanning in an experimental session that stores the meta information of a particular scan." + "First we define a table named `Scan` that describes a scanning in an experimental session of Calcium Imaging. This table will store the scans' metadata." ] }, { @@ -186,7 +233,7 @@ "\n", "One session might contain multiple scans - This is another example of **one-to-many** relationship. \n", "\n", - "Take a look at the `Diagram` again:" + "Take a look at the `dj.Diagram` again:" ] }, { @@ -202,26 +249,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The thin solid line connecting `Mouse`-`Session`, and `Session`-`Scan` indicates **one-to-many relationship**. \n", + "The thin solid line connecting [`Mouse`-`Session`] and [`Session`-`Scan`] indicates **one-to-many relationship**. \n", "\n", - "The `____` indicates **additional primary key attribute(s)** apart from the ones inherited from its parents." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here we have prepared two tif files of scanning in the `data` folder `example_scan_01.tif` and `example_scan_02.tif` " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from tutorial_pipeline import data_dir\n", - "data_dir" + "The underline `____` indicates **additional primary key attribute(s)** apart from the ones inherited from its parents." ] }, { @@ -230,15 +260,14 @@ "metadata": {}, "outputs": [], "source": [ - "for f in data_dir.glob('*.tif'):\n", - " print(f)" + "Scan()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's insert these meta information manually." + "Now we manually `insert` the metadata of the datasets and their file names in the table `Scan`:" ] }, { @@ -252,6 +281,8 @@ " 'depth': 150, 'wavelength': 920, 'laser_power': 26, 'fps': 15, 'file_name': 'example_scan_01.tif'},\n", " {'mouse_id': 0, 'session_date': '2017-05-15', 'scan_idx': 2, \n", " 'depth': 200, 'wavelength': 920, 'laser_power': 24, 'fps': 15, 'file_name': 'example_scan_02.tif'},\n", + " {'mouse_id': 0, 'session_date': '2017-05-15', 'scan_idx': 3, \n", + " 'depth': 200, 'wavelength': 920, 'laser_power': 24, 'fps': 15, 'file_name': 'example_scan_03.tif'} \n", "])" ] }, @@ -268,14 +299,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Looking at the raw data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's first load one raw data and take a look at the data:" + "### Calculation of the average frame\n", + "\n", + "Let's first load and look at the number of frames of the calcium imaging TIF files:" ] }, { @@ -286,15 +312,17 @@ "source": [ "import os\n", "from skimage import io\n", - "im = io.imread(data_dir / 'example_scan_01.tif')\n", - "print(im.shape)" + "im = io.imread('../data/example_scan_01.tif')\n", + "print('Number of frames = ',im.shape[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "This example contains 100 frames. Let's calculate the average of the images over the frames and plot the result." + "Particularly, this example contains 100 frames. \n", + "\n", + "Let's calculate the average of the images over the frames and plot the result.\n" ] }, { @@ -303,25 +331,32 @@ "metadata": {}, "outputs": [], "source": [ - "# ENTER YOUR CODE! - compute the avg frame with np.mean of axis=0\n", - "avg_image = np.mean(im, axis=0)\n", - "plt.imshow(avg_image, cmap=plt.cm.gray)" + "# ENTER YOUR CODE! \n", + "av_frame = np.mean(im,axis=0)\n", + "plt.imshow(av_frame)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*TIP: compute the `average frame` of the `im` image using the mean function from NumPy (np.mean) with axis = 0. Then, plot the result with `imshow`*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Defining table for average fluorescence across frames" + "## Define the table for the average fluorescence " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's create a table `AverageFrame` to compute and save the average fluorescence. \n", + "Now let's create a table `AverageFrame` to compute and save the average fluorescence across the frames. \n", "\n", - "For each scan, we have one average frame. Therefore, the table shares the exact same primary key as the table `Scan`." + "For each scan, we have one average frame. Therefore, the table shares the exact same primary key as the table `Scan`" ] }, { @@ -366,7 +401,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We defined `average_frame` as a `longblob`, which allows us to store a NumPy array. This NumPy array will be imported and computed from the file corresponding to each scan." + "We defined `average_frame` as a `longblob`, which allow us to store a NumPy array. This NumPy array will be imported and computed from the file corresponding to each scan." ] }, { @@ -414,7 +449,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Rather than filling out the content of the table manually using `insert1` or `insert` methods, we are going to make use of the `make` and `populate` logic that comes with `Imported` tables. These two methods automatically figure out what needs to be imported, and perform the import." + "Rather than filling out the content of the table manually using `insert1` or `insert` methods, we are going to make use of the `make` and `populate` logic that comes with `Imported` tables. These two methods automatically figure it out what needs to be imported, and perform the import." ] }, { @@ -436,12 +471,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ - "# ENTER YOUR CODE! - call `populate` on the table\n", "AverageFrame.populate()" ] }, @@ -449,7 +481,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Notice that `populate` call complained that a method called `make` is not implemented. Let me show a simple `make` method that will help elucidate what this is all about." + "Notice that the `populate` call complained that a method called `make` is not implemented. Let me show you a simple `make` method that will help elucidate what this is all about." ] }, { @@ -482,7 +514,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ENTER YOUR CODE! - call `populate` on the table\n", + "# ENTER YOUR CODE! - call `populate` on the table AverageFrame\n", "AverageFrame.populate()" ] }, @@ -571,14 +603,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Notice that we added the missing attribute information `average_frame` into the `key` dictionary, and finally **inserted the entry** into `self` = `AverageFrame` table. The `make` method's job is to create and insert a new entry corresponding to the `key` into this table!" + "Notice that we added the missing attribute information `average_frame` into the `key` dictionary, and finally **inserted the entry** into `self` (in this case, `self` corresponds to the `AverageFrame` table). The `make` method's job is to create and insert a new entry. This new entry corresponds to the `key` into this table." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Finally, let's go ahead and call `populate` to actually populate the `AverageFrame` table, filling it with data loaded and computed from data files!" + "Finally, let's go ahead and call `populate` to actually populate the `AverageFrame` table with the new content, i.e. filling the table with the data loaded and computed from data files!" ] }, { @@ -658,7 +690,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can find all `Scan` without corresponding `AverageFrame` entry with the **negative restriction operator** `-`" + "We can find all the `Scan` entries without their corresponding `AverageFrame` entries using the **negative restriction operator** `-`" ] }, { @@ -667,7 +699,7 @@ "metadata": {}, "outputs": [], "source": [ - "# select all Scan entries *without* a corresponding entry in AverageFrame\n", + "# select all the `Scan` entries *without* a corresponding entry in `AverageFrame`\n", "Scan - AverageFrame" ] }, @@ -700,7 +732,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now instead of loading from the raw tif file, we are able fetch the average fluorescence image from this table." + "Now instead of loading from the raw *.tif* file, we are able to fetch the average fluorescence image from this table." ] }, { @@ -725,7 +757,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Congratulations! You have successfully extended your pipeline with a table to represent processed data (`AverageFrame` as `Imported` table), learned and implemented the `make()` and `populate()` call to load external data to your tables." + "Congratulations! You have successfully:\n", + "- Extended your pipeline with a new table to represent the processed data (`AverageFrame` as `Imported` table)\n", + "- Learned and implemented the `make()` and `populate()` calls to load the external data to your tables" ] }, { @@ -741,15 +775,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "At this point, our pipeline contains the core elements with data populated, ready for further downstream analysis.\n", + "At this point, our pipeline contains the core elements with the data populated, ready for further downstream analysis.\n", "\n", - "In the next [session](./03-Calcium%20Imaging%20Computed%20Tables.ipynb), we are going to introduce the concept of `Computed` table, and `Lookup` table, as well as learning to set up a automated computation routine." + "In the next session `03-Calcium Imaging Computed Tables`:\n", + "- We will introduce the concept of `Computed` table and `Lookup` table\n", + "- We will also set up an automated computation routine, essential to develop advanced data analyses in your experiments!" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] } ], "metadata": { @@ -768,7 +799,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.16" + "version": "3.9.17" }, "vscode": { "interpreter": {