From bb4f8ee8241990302f5985f80135169fd9305790 Mon Sep 17 00:00:00 2001 From: "raphael.wcosta@gmail.com" Date: Mon, 19 Sep 2022 15:52:09 -0300 Subject: [PATCH] :books: Review docs and setup (close #163) --- DEPLOY.rst | 123 +++++++++++++++++++++++++++++++++++++++++----------- INSTALL.rst | 59 +++++++++---------------- setup.py | 6 +-- 3 files changed, 120 insertions(+), 68 deletions(-) diff --git a/DEPLOY.rst b/DEPLOY.rst index 33d7b8a..4fb884a 100644 --- a/DEPLOY.rst +++ b/DEPLOY.rst @@ -6,57 +6,128 @@ under the terms of the MIT License; see LICENSE file for more details. -Deploying -========= +Deploy +====== +This section explains how to get the ``Cube-Builder-AWS`` up and running on `Amazon Web Services `_. +If you do not read yet the :doc:`installation`, take a look at this tutorial on how to install it in your system in devmode +and be familiar with Python module. -Create infrastructure ---------------------- -.. code-block:: shell +.. warning:: + + Make sure to identify which region the dataset is available. + For example, most of `GEO Earth datasets `_ like ``Sentinel-2``, ``Landsat-8`` are + stored in ``Oregon`` (``us-west-2``). In this tutorial, we are going to use ``us-west-2``. + + If you generate any data cube that are in a different region of BDC services, you may face high cost charges in the billing. + + + +.. requirements: + +Requirements +------------ + +- `RDS PostgreSQL `_: A minimal instance of PostgreSQL database with PostGIS support. + The ``instance_type`` depends essentially how many parallel processing ``Lambdas`` are running. For this example, + we can use the minimal instance ``db.t2.micro``. For a Brazil territory, considerer more robust instances like ``db.t2.large`` + which supports aroung ``600`` concurrent connections. + + After the instance up and running, you must initialize `BDC-Catalog `_. + Please, refer to ``Compatibility Table`` in :doc:`installation` for supported versions. + +- `S3 - Simple Storage Service `_: A bucket to store ``Lambda codes`` and another bucket for ``data storage``. + +- `Kinesis `_: a Kinesis instance to streaming data cube step metadata be transfered along ``Lambdas`` and ``DynamoDB``. + For this example, minimal instance to support ``1000`` records (Default lambda parallel executions) is enough. + +- `DynamoDB `_: a set of dynamo tables to store data cube metadata. + + +Prepare environment +------------------- + +The ``Cube-Builder-AWS`` command utilities uses `NodeJS `_ module named `serverless `_ +to deploy the stack of data cubes on Amazon Web Services. +First you need to install ``NodeJS``. We recommend you to use `nvm `_ which can be easily installed with +single command line and its supports to have multiple versions of nodejs installed. You can install it with command:: + + curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash - $ cd deploy/step_1/ - $ sh start.sh -1. access https://console.aws.amazon.com/rds/home by browser +Set the following entry into ``~/.bashrc``:: -2. select region used to create RDS + export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")" + [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm -3. select databases -4. Wait until the created database has a status of 'Available' (~10min) +Install ``NodeJS 12+``:: + + nvm install 12 + nvm use 12 # Activate the version as current + + +After that, use the following command to install ``serverless`` and their dependencies:: + + npm install -g serverless + + +The second part is to have `AWS Identity and Access Management (IAM) `_ credentials with right access to deploy +the `requirements`_ section. + + +Prepare the infrastructure +-------------------------- + +We have prepared a script to set up a RDS PostgreSQL instance up and running. Use the following script:: + + cd deploy/step_1/ + sh start.sh + + +The AWS RDS database set up takes aroung 10 minutes to launch. You can monitore the status following +https://console.aws.amazon.com/rds/home. + +.. note:: + + Make sure you are in region ``us-west-2 (Oregon)``. -5. click on database Create database structure ------------------------- -Create initial database structure to catalog the cubes to be generated +Once RDS database is up and running, we need to create the ``BDC-Catalog`` model:: -.. code-block:: shell - - $ cd ../../deploy/step_2/ - $ sh start.sh + cd ../../deploy/step_2/ + sh start.sh Deploy Lambda service --------------------- -** create file *.env* based on *example.env* in cube-builder-aws folder. Then set the environment variables with your information in *.env* +Before to proceed in ``Cube-Builder`` service, we need to create a ``cube-builder-aws/.env``. +We have prepared a minimal example ``cube-builder-aws/example.env`` and the following variables are available: -then: +- ``PROJECT_NAME``: A name for the given project set up. This name will be set as ``prefix`` in Lambdas. +- ``STAGE``: A type of service environment context. Use ``dev`` or ``prod``. +- ``REGION``: AWS region to launch services. +- ``KEY_ID``: AWS Access Key. +- ``SECRET_KEY``: AWS Access Secret Key. +- ``SQLALCHEMY_DATABASE_URI``: URI for PostgreSQL instance. It have the following structure: ``postgresql://USER:PASSWD@HOST/DB_NAME`` -.. code-block:: shell +Once ``cube-builder-aws/.env`` is set, you can run then following script to launch Lambda into AWS:: - $ cd ../../deploy/step_3/ - $ sh deploy.sh + cd ../../deploy/step_3/ + sh deploy.sh -Get service status ---------------------- +The script helper will generate an URI for the Lambda location. +You can access this resource and check if everything is running. -.. code-block:: shell - $ curl {your-lambda-endpoint}/ +Next steps +---------- +After ``Cube-Builder-AWS`` backend is up and running, we recommend you to install the `Data Cube Manager GUI `_ diff --git a/INSTALL.rst b/INSTALL.rst index 1b53a13..e9c9647 100644 --- a/INSTALL.rst +++ b/INSTALL.rst @@ -32,6 +32,23 @@ The ``Cube Builder AWS`` depends essentially on: - `Rio-cogeo `_ + +Compatibility ++++++++++++++ + ++------------------+-------------+ +| Cube-Builder-AWS | BDC-Catalog | ++==================+=============+ +| 0.8.2 | 0.8.2 | ++------------------+-------------+ +| 0.8.0 ~ 0.8.1 | 0.8.1 | ++------------------+-------------+ +| 0.6.x | 0.8.1 | ++------------------+-------------+ +| 0.4.x | 0.8.1 | ++------------------+-------------+ + + Clone the software repository +++++++++++++++++++++++++++++ @@ -51,16 +68,16 @@ Go to the source code folder:: Install in development mode:: - $ pip3 install -e .[all] + $ pip3 install -e .[docs,tests] .. note:: If you want to create a new *Python Virtual Environment*, please, follow this instruction: - *1.* Create a new virtual environment linked to Python 3.7:: + *1.* Create a new virtual environment linked to Python 3.8:: - python3.7 -m venv venv + python3.8 -m venv venv **2.** Activate the new environment:: @@ -94,39 +111,3 @@ You can open the above documentation in your favorite browser, as:: firefox docs/sphinx/_build/html/index.html -Prepare environment to deploy -+++++++++++++++++++++++++++++ - -Prepare your AWS account and HOST to deploy application. - - -1) in AWS Console ------------------ - - - create AWS account - - - Login with AWS account created - - - create IAM user - - - set full permissions (fullAccess) to IAM user created - - - generate credentals to IAM user - - -2) in your HOST (command line): -------------------------------- - - - install *AWS CLI* - - - configure credentials - - e.g: aws configure --profile *iam-user-name* - - - install *Node.js* (global) - - `Download Nodejs `_ - - - install *serverless* - - .. code-block:: shell - - $ npm install -g serverless \ No newline at end of file diff --git a/setup.py b/setup.py index eccb3d0..4bb9648 100644 --- a/setup.py +++ b/setup.py @@ -52,7 +52,7 @@ 'bdc-catalog @ git+https://github.com/brazil-data-cube/bdc-catalog.git@v0.8.2#egg=bdc-catalog', 'Flask>=1.1.1,<2', 'Flask-SQLAlchemy==2.4.1', - 'psycopg2-binary==2.8.5', + 'psycopg2-binary>=2.8,<3', 'boto3==1.14.49', 'botocore==1.17.49', 'marshmallow-sqlalchemy==0.25.0', @@ -62,7 +62,7 @@ 'rasterio==1.2.1', 'requests>=2.23.0', 'rio-cogeo==3.0.2', - 'shapely==1.7.0', + 'shapely>=1.7,<2', 'stac.py==0.9.0.post5', 'cloudpathlib[s3]==0.4.0', ] @@ -108,4 +108,4 @@ 'Topic :: Scientific/Engineering :: GIS', 'Topic :: Software Development :: Libraries :: Python Modules', ], -) \ No newline at end of file +)