Skip to content

Commit

Permalink
Merge pull request #4 from pykoala/a3d/tut0
Browse files Browse the repository at this point in the history
Data container tutorial
  • Loading branch information
miguelglezb authored Nov 20, 2024
2 parents 15c138f + fccb8bc commit 7ac2b01
Show file tree
Hide file tree
Showing 3 changed files with 384 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ venv/
ENV/
env.bak/
venv.bak/
venv_koala/

# Spyder project settings
.spyderproject
Expand Down
379 changes: 379 additions & 0 deletions tutorials/0-introduction/data_containers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,379 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyKOALA Data Containers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyKOALA uses data containers to organise, read, and manage input/output. `PyKOALA` data containers can be imported from `data_container` module:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pykoala.data_container "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, users can import explicitly the requires classes. In this tutorial, we'll demonstrate examples using this approach."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `HistoryRecord` class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `HistoryRecord` class represents a structured record in a log. This is typically useful for logging, note-taking, or any system that keeps track of messages or events."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pykoala.data_container import HistoryRecord\n",
"\n",
"hist_record = HistoryRecord(title=\"Error log\",comments=\"This is line one.\\nThis is line two.\")\n",
"print('Title of history record: ', hist_record.title)\n",
"print('Comments in history record: ', hist_record.comments)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`HistoryRecord` can convert the record into a string with `to_str`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hist_record.to_str()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, this output is hard to read. Since it is an iterable, a better way to show the comments is: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for record in hist_record.comments:\n",
" print(record)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `DataContainerHistory` class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `DataContainerHistory` class is designed to store and manage the history of data reduction steps by creating a log of entries, which can be used to trace the steps applied to a dataset. It can log a sequence of entries where each entry records details of a data processing step."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pykoala.data_container import DataContainerHistory\n",
"initial_entries = [\n",
" (\"Step 1\", \"Loaded data\"),\n",
" (\"Step 2\", \"Filtered outliers\"),\n",
" HistoryRecord(\"Step 3\", \"Normalized data\", \"preprocessing\")\n",
"]\n",
"\n",
"history_log = DataContainerHistory(list_of_entries=initial_entries)\n",
"\n",
"# Accessing the recorded entries\n",
"for record in history_log.record_entries:\n",
" print(record.to_str())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Functions "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** Make sure to run the following cells in order to ensure correct execution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`DataContainerHistory` class use several methods to process data. For example, we can start explicitly a new log from the class with `initialise_record`. Let's redo the previous cell with this method: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `initialise_record` "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"initial_entries = [\n",
" (\"Step 1\", \"Loaded data\"),\n",
" (\"Step 2\", \"Filtered outliers\"),\n",
" HistoryRecord(\"Step 3\", \"Normalized data\", \"preprocessing\")\n",
"]\n",
"\n",
"history_log = DataContainerHistory()\n",
"history_log.initialise_record(list_of_entries=initial_entries)\n",
"\n",
"history_log.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the cell above we also used the `show()` method to present the content of the history log. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `log_record`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can add a new record in an existing history log by using the `log_record` method:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#We use the same history log as before\n",
"initial_entries = [\n",
" (\"Step 1\", \"Loaded data\"),\n",
" (\"Step 2\", \"Filtered outliers\"),\n",
" HistoryRecord(\"Step 3\", \"Normalized data\", \"preprocessing\")\n",
"]\n",
"\n",
"history_log = DataContainerHistory()\n",
"history_log.initialise_record(list_of_entries=initial_entries)\n",
"\n",
"history_log.log_record(title='Step 4', comments='Additional data added with log_record method') \n",
"\n",
"history_log.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `is_record` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can find if a certain record (title) is present in the history log with the `is_record` method. Following the previous cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(history_log.is_record(title='Step 1'))\n",
"print(history_log.is_record(title='Step 42'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the first case, the output is `True` because there is a record with the title `Step 1`. However, there is no record with the title `Step 42`, so the second line returns `False`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `find_record`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can find, save and display the content using `find_record`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"search_results = history_log.find_record(title='Step 1')\n",
"print('Number of found items:', len(search_results))\n",
"\n",
"for result in search_results:\n",
" print(result.to_str())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output of this method is given in a list. This is useful when several entries have the same `title` record:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history_log_repeated_titles = DataContainerHistory()\n",
"history_log_repeated_titles.initialise_record(list_of_entries=initial_entries)\n",
"\n",
"history_log_repeated_titles.log_record(title='Step 1', comments='Rinse and repeat') #We add a record with the same title as an existing one.\n",
"\n",
"search_results = history_log_repeated_titles.find_record(title='Step 1')\n",
"print('Number of found items:', len(search_results))\n",
"\n",
"for result in search_results:\n",
" print(result.to_str())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `dump_to_header`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This method writes the log into a astropy.fits.Header"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from astropy.io import fits\n",
"history_log.dump_to_header()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `dump_to_text`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This method writes the log into a text file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"import os\n",
"\n",
"os.system('pwd')\n",
"path_to_file = './output/history_log.txt'\n",
"history_log.dump_to_text(file=path_to_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can confirm that the file was created in the `output` directory."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv_koala",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
4 changes: 4 additions & 0 deletions tutorials/0-introduction/output/history_log.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Step 1: Loaded data
Step 2: Filtered outliers
Step 3: Normalized data
Step 4: Additional data added with log_record method

0 comments on commit 7ac2b01

Please sign in to comment.