Stactools-Pipelines is a large scale, turnkey processing framework to create STAC metadata and cloud optimized formats for data stored on S3 using stactools packages.
- Python>=3.9
- Docker
- tox
- awscli
- An IAM role with sufficient permissions for creating, destroying and modifying the relevant stack resources.
A template pipeline structure for Sentinel 1 is included in the repo here.
To develop a new pipeline, create a directory in pipelines using a simple name for your pipeline dataset.
At a minimum include a
requirements.txt
With your application's dependencies.config.yaml
Your pipeline's configuration settings.app.py
A Lambda application with ahandler
function defined which consumes anSQSEvent
creates a STAC Item and posts it to theingestor
.test_app.py
Apytest
based unit test file which exercisesapp.py
.collection.py
A Lambda application with ahandler
function which creates a STAC Collection and posts it to theingestor
.test_collection.py
Apytest
based unit test file which exercisescollection.py
.
-
id
Required The pipeline name. This should be the same as the pipeline's parent folder and should use_
s for separators to support Python module discovery. -
compute
Required Currently only theawslambda
value is supported. -
ingestor_url
Required The ingestor API's root path with the stage value included. -
secret_arn
Required The secret manager ARN for using a Cognito JWKS implementation with the ingestor API. -
sns
Optional The SNS topic to listen to for new granule notifications. -
inventory_location
Optional The location of an S3 inventory that can be used by thepipeline
to process and ingest existing granules. Include ahistoric.py
file (and atest_historic.py
) in your pipeline which implements aquery_inventory
,row_to_message_body
andhandler
method to query the inventory and send the results to the processing queue. If provided, an athena table is created. Default isNone
. If provided,file_list
can't be provided. -
historic_frequency
Optional If aninventory_location
is included thehistoric_frequency
(how often in hours thehistoric.py
is run) must also be included. A value of0
indicates that thehistoric.py
function will run a single time on deployment and process the entire inventory. If a value of >0
is specified then aninitial_chunk
must also be specified. The pipeline will build a stack which uses these values to incrementally chunk through the inventory file withcron
executions to process until the entire inventory has been processed. -
file_list
Optional Location of a non-standard AWS inventory file. Default isNone
. If provided, no Athena table is created and the 'historic' lambda processes that list. If provided,inventory_location
can't be provided.
Create an environment setting using your pipline name.
$ export PIPELINE=<Your pipeline name>
To run your pipeline unit tests
$ tox
Deploying a pipeline will use the pipeline's config.yaml to deploy all the necessary resources for running the pipeline. Including STAC Collection and Item creation Lambdas and any queues or Athena tables that are required. If an sns
was specified it will begin processing notifications as soon as deployment is complete .
Create a development virtual environment with
$ tox -e dev
$ source devenv/bin/activate
Create environment settings for your pipeline deployment
$ export PROJECT=<The project name for resource cost tracking>
$ export PIPELINE=<Your pipeline name>
$ export STAGE=<Environment stage>
With an AWS profile enabled with sufficient permissions build and push your pipeline image with
$ python image_builder.py
Deploy the infrastructure for your pipeline with
$ cdk deploy
To create a development virtual environment for core repository development use
$ tox -e dev
$ source devenv/bin/activate