Skip to content

Commit

Permalink
Data source expansion (#6)
Browse files Browse the repository at this point in the history
This pull request covers the inclusion of running the swarm from local data files and addresses the challenges in running the code from an EC2 instance that cannot reach the COSMOS-UK database.

* NEW: A class for looping through as .csv file with pandas
* NEW: A class for looping through a SQLite database (assumes that it's been sorted first)
* NEW: Updated CLI to allow new data sources to be selected.
* NEW: Updated documentation
  • Loading branch information
lewis-chambers authored Jun 20, 2024
1 parent d0dd449 commit ebb5185
Show file tree
Hide file tree
Showing 22 changed files with 1,456 additions and 411 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ jobs:
python -m pip install --upgrade pip
pip install flake8
pip install .[test]
- name: Build test DB
run: |
python src/tests/data/build_database.py
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
Expand Down
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ dist/*
config.cfg
aws-auth
*.log
*.certs
*.certs*
!.github/*
docs/_*
conf.sh
*.sh
**/*.csv
**/*.db
30 changes: 28 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
:caption: Contents:

self
source/cli
source/modules
genindex
modindex
Expand Down Expand Up @@ -57,7 +58,7 @@ intialise a swarm with data sent every 30 minutes like so:
.. code-block:: shell
iot-swarm cosmos --dsn="xxxxxx" --user="xxxxx" --password="*****" \
mqtt "aws" LEVEL_1_SOILMET_30MIN "client_id" \
mqtt LEVEL_1_SOILMET_30MIN "client_id" \
--endpoint="xxxxxxx" \
--cert-path="C:\path\..." \
--key-path="C:\path\..." \
Expand All @@ -84,7 +85,7 @@ Then the CLI can be called more cleanly:

.. code-block:: shell
iot-swarm cosmos mqtt "aws" LEVEL_1_SOILMET_30MIN "client_id" --sleep-time=1800 --swarm-name="my-swarm"
iot-swarm cosmos mqtt LEVEL_1_SOILMET_30MIN "client_id" --sleep-time=1800 --swarm-name="my-swarm"
------------------------
Using the Python Modules
Expand Down Expand Up @@ -163,6 +164,31 @@ The system expects config credentials for the MQTT endpoint and the COSMOS Oracl
.. include:: example-config.cfg


-------------------------------------------
Looping Through a local database / csv file
-------------------------------------------

This package now supports using a CSV file or local SQLite database as the data source.
There are 2 modules to support it: `db.LoopingCsvDB` and `db.LoopingSQLite3`. Each of them
initializes from a local file and loops through the data for a given site id. The database
objects store an in memory cache of each site ID and it's current index in the database.
Once the end is reached, it loops back to the start for that site.

For use in FDRI, 6 months of data was downloaded to CSV from the COSMOS-UK database, but
the files are too large to be included in this repo, so they are stored in the `ukceh-fdri`
`S3` bucket on AWS. There are scripts for regenerating the `.db` file in this repo:

* `./src/iotswarm/__assets__/data/build_database.py`
* `./src/tests/data/build_database.py`

To use the 'official' data, it should be downloaded from the `S3` bucket and placed in
`./src/iotswarm/__assets__/data` before running the script. This will build a `.db` file
sorted in datetime order that the `LoopingSQLite3` class can operate with.

.. warning::
The looping database classes assume that their data files are sorted, and make no
attempt to sort it themselves.

Indices and tables
==================

Expand Down
6 changes: 6 additions & 0 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
CLI
===

.. click:: iotswarm.scripts.cli:main
:prog: iot-swarm
:nested: full
2 changes: 1 addition & 1 deletion docs/source/iotswarm.scripts.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
iotswarm.scripts package
==================================
========================

Submodules
----------
Expand Down
22 changes: 18 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
[build-system]
requires = ["setuptools >= 61.0", "autosemver"]
build-backend = "setuptools.build_meta"
# build-backend = "setuptools.build_meta"

[project]
dependencies = [
"platformdirs",
"boto3",
"autosemver",
"config",
"click",
"docutils<0.17",
"awscli",
"awscrt",
"awsiotsdk",
"oracledb",
"backoff",
"click",
"pandas",
]
name = "iot-swarm"
dynamic = ["version"]
Expand All @@ -26,14 +28,18 @@ docs = ["sphinx", "sphinx-copybutton", "sphinx-rtd-theme", "sphinx-click"]

[project.scripts]
iot-swarm = "iotswarm.scripts.cli:main"

[tool.setuptools.dynamic]
version = { attr = "iotswarm.__version__" }


[tool.setuptools.packages.find]
where = ["src"]
include = ["iotswarm*"]
exclude = ["tests*"]

[tool.setuptools.package-data]
"*" = ["*.*"]
"iotswarm.__assets__" = ["loggers.ini"]

[tool.pytest.ini_options]

Expand All @@ -49,4 +55,12 @@ markers = [
]

[tool.coverage.run]
omit = ["*example.py", "*__init__.py", "queries.py", "loggers.py", "cli.py"]
omit = [
"*example.py",
"*__init__.py",
"queries.py",
"loggers.py",
"**/scripts/*.py",
"**/build_database.py",
"utils.py",
]
41 changes: 41 additions & 0 deletions src/iotswarm/__assets__/data/build_database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""This script is responsible for building an SQL data file from the CSV files used
by the cosmos network.
The files are stored in AWS S3 and should be downloaded into this directory before continuing.
They are:
* LEVEL_1_NMDB_1HOUR_DATA_TABLE.csv
* LEVEL_1_SOILMET_30MIN_DATA_TABLE.csv
* LEVEL_1_PRECIP_1MIN_DATA_TABLE.csv
* LEVEL_1_PRECIP_RAINE_1MIN_DATA_TABLE.csv
Once installed, run this script to generate the .db file.
"""

from iotswarm.utils import build_database_from_csv
from pathlib import Path
from glob import glob
from iotswarm.queries import CosmosTable


def main(
csv_dir: str | Path = Path(__file__).parent,
database_output: str | Path = Path(Path(__file__).parent, "cosmos.db"),
):
"""Reads exported cosmos DB files from CSV format. Assumes that the files
look like: LEVEL_1_SOILMET_30MIN_DATA_TABLE.csv
Args:
csv_dir: Directory where the csv files are stored.
database_output: Output destination of the csv_data.
"""
csv_files = glob("*.csv", root_dir=csv_dir)

tables = [CosmosTable[x.removesuffix("_DATA_TABLE.csv")] for x in csv_files]

for table, file in zip(tables, csv_files):
file = Path(csv_dir, file)
build_database_from_csv(file, database_output, table.value, sort_by="DATE_TIME")


if __name__ == "__main__":
main()
4 changes: 1 addition & 3 deletions src/iotswarm/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import autosemver

try:
__version__ = autosemver.packaging.get_current_version(
project_name="iot-device-simulator"
)
__version__ = autosemver.packaging.get_current_version(project_name="iot-swarm")
except:
__version__ = "unknown version"
Loading

0 comments on commit ebb5185

Please sign in to comment.