Skip to content

MAAP-Project/maap-eoapi

Repository files navigation

MAAP eoapi

Tests Status

Overview

This repository contains the AWS CDK code (written in typescript) used to deploy the MAAP project eoapi infrastructure. It is based on the eoapi-template example. For the MAAP use case, we use a subset of the eoapi CDK constructs to define a database, an ingestion API, a STAC API, a raster API (i.e a tiling API) and a bastion host for direct connections to the database. Here, we deploy all these components into a custom VPC.

Automated Deployment

Deployment happens through a github workflow manually triggered and defined in .github/workflows/deploy.yaml.

Networking and accessibility of the database.

Because of security requirements, the networking set up imposes the following constraints :

  • For security reasons, the database is in a private subnet of the VPC. As such, only instances running inside of the same VPC can access the database. This means that, for example, even if a user has the password and her IP is allowed inbound connections to the database, access will not be allowed.

This has three consequences :

  1. The APIs that need access to the database (the STAC API, the tiling API, the ingestion API) need to be deployed in that same VPC.
  2. In addition, because these APIs also sometimes need access to the internet, a NAT gateway must in addition be deployed in that VPC.
  3. For direct, administrative connections to the database, one must go through an instance placed in the same VPC as the database.

We approach (3) by re-using the 'bastion host' eoAPI construct, which deploys an EC2 instance that can connect to the database, and can be used to create a tunnel from a user's machine to the database. See the eoAPI docs for more information.

Ingestion

The term "ingestion" refers to the process of cataloging data in the STAC catalog associated with this deployment.

Direct ingestion

For a small record ingestion (for example a collection record or just one item), one can directly connect to the database and perform loading. This can be done using the pypgstac library. For example, to load an item stored locally in test_item.json, with pypgstac installed, you can run the following command :

pypgstac load --table items test_item.json

or for a collection

pypgstac load --table collections test_collection.json

This requires

  1. that you are allowed to connect to the database. Because of the security requirements mentioned above, you must go through an SSH tunnel using the bastion host EC2 instance. For this to work, you must be allowed to SSH into this EC@ instance See these docs for more details
  2. the configuration for the database connection is present in your environment.

Indirect ingestion through the ingestion pipeline deployment

For larger scale ingestions, in MAAP we rely on a fork of the stactools-pipelines repository. If you want to ingest a collection in MAAP using this tool, you should develop a 'pipeline'. Details of this procedure can be found in the linked repository. You can follow an example that was developed for maap here.