Skip to content

Commit

Permalink
complete merge
Browse files Browse the repository at this point in the history
  • Loading branch information
danielfromearth committed Sep 19, 2023
2 parents c298709 + 9c607c7 commit f0cbf61
Show file tree
Hide file tree
Showing 11 changed files with 102 additions and 71 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/lint_and_test_and_bump.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
echo "software_version=$(poetry version | awk '{print $2}')" >> $GITHUB_ENV
echo "venue=ops" >> $GITHUB_ENV
- name: Install bumblebee
- name: Install stitchee
run: poetry install

- name: Lint
Expand All @@ -87,7 +87,8 @@ jobs:
- name: Test with pytest
run: |
poetry run pytest
poetry run pytest tests/test_group_handling.py
# TODO: expand tests to include full concatenation runs, i.e., don't just run test_group_handling.py

# - name: Commit Version Bump
# # If building develop, a release branch, or main then we commit the version bump back to the repo
Expand All @@ -96,8 +97,8 @@ jobs:
# github.ref == 'refs/heads/main' ||
# startsWith(github.ref, 'refs/heads/release')
# run: |
# git config --global user.name 'bumblebee bot'
# git config --global user.email 'bumblebee@noreply.github.com'
# git config --global user.name 'stitchee bot'
# git config --global user.email 'stitchee@noreply.github.com'
# git commit -am "/version ${{ env.software_version }}"
# git push
#
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/lint_and_test_on_pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
with:
poetry-version: 1.3.2

- name: Install bumblebee
- name: Install stitchee
run: poetry install

- name: Lint
Expand All @@ -35,4 +35,5 @@ jobs:
- name: Test with pytest
run: |
poetry run pytest
poetry run pytest tests/test_group_handling.py
# TODO: expand tests to include full concatenation runs, i.e., don't just run test_group_handling.py
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- [PR #1](https://github.com/danielfromearth/stitchee/pull/1): An initial GitHub Actions workflow
### Changed
- [PR #12](https://github.com/danielfromearth/stitchee/pull/12): Changed name to "stitchee"
### Deprecated
### Removed
### Fixed
- [PR #4](https://github.com/danielfromearth/stitchee/pull/4): Error with TEMPO ozone profile data because of duplicated dimension names
45 changes: 29 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# bumblebee
[<img src="https://github.com/danielfromearth/stitchee/assets/114174502/58052dfa-b6e1-49e5-96e5-4cb1e8d14c32" width="250"/>](stitchee_9_hex)

Tool for concatenating netCDF data *along an existing dimension*,
which is deigned as both a standalone utility and
for use as a service in [Harmony](https://harmony.earthdata.nasa.gov/).
# Overview
_____

_STITCHEE_ (STITCH by Extending a dimEnsion) is used for concatenating netCDF data *along an existing dimension*,
and it is deigned as both a standalone utility and for use as a service in [Harmony](https://harmony.earthdata.nasa.gov/).

## Getting started, with poetry

1. Follow the instructions for installing `poetry` [here](https://python-poetry.org/docs/).
2. Install `bumblebee`, with its dependencies, by running the following from the repository directory:
2. Install `stitchee`, with its dependencies, by running the following from the repository directory:

```shell
poetry install
```

## How to test `bumblebee` locally
## How to test `stitchee` locally

```shell
poetry run pytest tests/
Expand All @@ -22,25 +24,36 @@ poetry run pytest tests/
## Usage (with poetry)

```shell
$ poetry run bumblebee --help
usage: bumblebee [-h] [--make_dir_copy] [-v] data_dir output_path
$ poetry run stitchee --help
usage: stitchee [-h] -o output_path [--concat_dim concat_dim] [--make_dir_copy] [--keep_tmp_files] [-O] [-v]
path/directory or path list [path/directory or path list ...]

Run the along-existing-dimension concatenator.

positional arguments:
data_dir The directory containing the files to be merged.
output_path The output filename for the merged output.

options:
-h, --help show this help message and exit
--make_dir_copy Make a duplicate of the input directory to avoid modification of input files. This is useful for testing, but uses more disk space.
-v, --verbose Enable verbose output to stdout; useful for debugging
-h, --help show this help message and exit
--concat_dim concat_dim
Dimension to concatenate along, if possible.
--make_dir_copy Make a duplicate of the input directory to avoid modification of input files. This is useful for testing, but
uses more disk space.
--keep_tmp_files Prevents removal, after successful execution, of (1) the flattened concatenated file and (2) the input
directory copy if created by '--make_dir_copy'.
-O, --overwrite Overwrite output file if it already exists.
-v, --verbose Enable verbose output to stdout; useful for debugging

Required:
path/directory or path list
Files to be concatenated, specified via a (1) single directory containing the files to be concatenated, (2)
single text file containing linebreak-separated paths of the files to be concatenated, or (3) multiple
filepaths of the files to be concatenated.
-o output_path, --output_path output_path
The output filename for the merged output.
```
For example:
```shell
poetry run bumblebee /path/to/netcdf/directory/ /path/to/output.nc
poetry run stitchee /path/to/netcdf/directory/ /path/to/output.nc
```
## Usage (without poetry)
Expand Down
2 changes: 1 addition & 1 deletion concatenator/concat_with_nco.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import netCDF4 as nc # type: ignore
from nco import Nco # type: ignore

from concatenator.bumblebee import _validate_workable_files
from concatenator.stitchee import _validate_workable_files

default_logger = getLogger(__name__)

Expand Down
2 changes: 1 addition & 1 deletion concatenator/concat_with_nco_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import sys

from concatenator.concat_with_nco import concat_netcdf_files
from concatenator.run_bumblebee import parse_args
from concatenator.run_stitchee import parse_args


def run_nco_concat(args: list) -> None:
Expand Down
20 changes: 10 additions & 10 deletions concatenator/run_bumblebee.py → concatenator/run_stitchee.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
from pathlib import Path
from typing import Tuple, Union

from concatenator.bumblebee import bumblebee
from concatenator.file_ops import add_label_to_path
from concatenator.stitchee import stitchee


def parse_args(args: list) -> Tuple[list[str], str, str, bool, Union[str, None]]:
Expand All @@ -21,7 +21,7 @@ def parse_args(args: list) -> Tuple[list[str], str, str, bool, Union[str, None]]
tuple
"""
parser = ArgumentParser(
prog='bumblebee',
prog='stitchee',
description='Run the along-existing-dimension concatenator.')

# Required arguments
Expand Down Expand Up @@ -132,19 +132,19 @@ def _get_list_of_filepaths_from_dir(data_dir: Path):
return input_files


def run_bumblebee(args: list) -> None:
def run_stitchee(args: list) -> None:
"""
Parse arguments and run subsetter on the specified input file
"""
input_files, output_path, concat_dim, keep_tmp_files, temporary_dir_to_remove = parse_args(args)
num_inputs = len(input_files)

logging.info('Executing bumblebee concatenation on %d files...', num_inputs)
bumblebee(input_files, output_path,
write_tmp_flat_concatenated=keep_tmp_files,
keep_tmp_files=keep_tmp_files,
concat_dim=concat_dim)
logging.info('BUMBLEBEE complete. Result in %s', output_path)
logging.info('Executing stitchee concatenation on %d files...', num_inputs)
stitchee(input_files, output_path,
write_tmp_flat_concatenated=keep_tmp_files,
keep_tmp_files=keep_tmp_files,
concat_dim=concat_dim)
logging.info('STITCHEE complete. Result in %s', output_path)

if not keep_tmp_files and temporary_dir_to_remove:
shutil.rmtree(temporary_dir_to_remove)
Expand All @@ -157,7 +157,7 @@ def main() -> None:
format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
level=logging.DEBUG
)
run_bumblebee(sys.argv[1:])
run_stitchee(sys.argv[1:])


if __name__ == '__main__':
Expand Down
12 changes: 6 additions & 6 deletions concatenator/bumblebee.py → concatenator/stitchee.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@
default_logger = logging.getLogger(__name__)


def bumblebee(files_to_concat: list[str],
output_file: str,
write_tmp_flat_concatenated: bool = False,
keep_tmp_files: bool = True,
concat_dim: str = "",
logger: Logger = default_logger) -> str:
def stitchee(files_to_concat: list[str],
output_file: str,
write_tmp_flat_concatenated: bool = False,
keep_tmp_files: bool = True,
concat_dim: str = "",
logger: Logger = default_logger) -> str:
"""Concatenate netCDF data files along an existing dimension.
Parameters
Expand Down
4 changes: 2 additions & 2 deletions entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import logging
import sys

from concatenator.run_bumblebee import run_bumblebee
from concatenator.run_stitchee import run_stitchee


def main() -> None:
Expand All @@ -12,7 +12,7 @@ def main() -> None:
format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
level=logging.DEBUG
)
run_bumblebee(sys.argv[1:])
run_stitchee(sys.argv[1:])


if __name__ == '__main__':
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.poetry]
name = "bumblebee"
name = "stitchee"
version = "0.1.0"
description = "NetCDF4 Along-existing-dimension Concatenation Service"
authors = ["Daniel Kaufman <[email protected]>"]
Expand All @@ -26,7 +26,7 @@ black = "^23.9.1"
nco = "^1.1.0"

[tool.poetry.scripts]
bumblebee = 'concatenator.run_bumblebee:main'
stitchee = 'concatenator.run_stitchee:main'

[build-system]
requires = ["poetry-core"]
Expand Down
54 changes: 27 additions & 27 deletions tests/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import pytest

from concatenator import concat_with_nco
from concatenator.bumblebee import bumblebee
from concatenator.stitchee import stitchee


@pytest.mark.usefixtures("pass_options")
Expand All @@ -30,10 +30,10 @@ def tearDownClass(cls):
if not cls.KEEP_TMP: # pylint: disable=no-member
rmtree(cls.__output_path)

def run_verification_with_bumblebee(self,
data_dir,
output_name,
record_dim_name: str = 'mirror_step'):
def run_verification_with_stitchee(self,
data_dir,
output_name,
record_dim_name: str = 'mirror_step'):
output_path = str(self.__output_path.joinpath(output_name)) # type: ignore
data_path = self.__test_data_path.joinpath(data_dir) # type: ignore

Expand All @@ -44,11 +44,11 @@ def run_verification_with_bumblebee(self,
shutil.copyfile(filepath, copied_input_new_path)
input_files.append(str(copied_input_new_path))

output_path = bumblebee(files_to_concat=input_files,
output_file=output_path,
write_tmp_flat_concatenated=True,
keep_tmp_files=True,
concat_dim=record_dim_name)
output_path = stitchee(files_to_concat=input_files,
output_file=output_path,
write_tmp_flat_concatenated=True,
keep_tmp_files=True,
concat_dim=record_dim_name)

merged_dataset = nc.Dataset(output_path)

Expand Down Expand Up @@ -83,34 +83,34 @@ def run_verification_with_nco(self, data_dir, output_name, record_dim_name='mirr
length_sum += len(nc.Dataset(file).variables[record_dim_name])
assert length_sum == len(merged_dataset.variables[record_dim_name])

# def test_tempo_no2_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('tempo/no2', 'tempo_no2_bee_concatenated.nc')
def test_tempo_no2_concat_with_stitchee(self):
self.run_verification_with_stitchee('tempo/no2', 'tempo_no2_bee_concatenated.nc')

# def test_tempo_hcho_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('tempo/hcho', 'tempo_hcho_bee_concatenated.nc')
def test_tempo_hcho_concat_with_stitchee(self):
self.run_verification_with_stitchee('tempo/hcho', 'tempo_hcho_bee_concatenated.nc')

# def test_tempo_cld04_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('tempo/cld04', 'tempo_cld04_bee_concatenated.nc')
def test_tempo_cld04_concat_with_stitchee(self):
self.run_verification_with_stitchee('tempo/cld04', 'tempo_cld04_bee_concatenated.nc')

# def test_tempo_o3prof_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('tempo/o3prof', 'tempo_o3prof_bee_concatenated.nc')
def test_tempo_o3prof_concat_with_stitchee(self):
self.run_verification_with_stitchee('tempo/o3prof', 'tempo_o3prof_bee_concatenated.nc')

# def test_icesat_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('icesat', 'icesat_concat_with_bumblebee.nc')
# def test_icesat_concat_with_stitchee(self):
# self.run_verification_with_stitchee('icesat', 'icesat_concat_with_stitchee.nc')
#
# def test_ceres_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('ceres-subsetter-output',
# def test_ceres_concat_with_stitchee(self):
# self.run_verification_with_stitchee('ceres-subsetter-output',
# 'ceres_bee_concatenated.nc',
# record_dim_name='time')
#
# def test_ceres_flash_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('ceres_flash-subsetter-output',
# def test_ceres_flash_concat_with_stitchee(self):
# self.run_verification_with_stitchee('ceres_flash-subsetter-output',
# 'ceres_flash_bee_concatenated.nc',
# record_dim_name='time')
#
# def test_ceres_flash_concat_with_bumblebee(self):
# self.run_verification_with_bumblebee('ceres_flash-subsetter-output',
# 'ceres_flash_concat_with_bumblebee.nc',
# def test_ceres_flash_concat_with_stitchee(self):
# self.run_verification_with_stitchee('ceres_flash-subsetter-output',
# 'ceres_flash_concat_with_stitchee.nc',
# record_dim_name='time')

# def test_tempo_no2_concat_with_nco(self):
Expand Down

0 comments on commit f0cbf61

Please sign in to comment.