Skip to content

Latest commit

 

History

History
107 lines (73 loc) · 3.9 KB

File metadata and controls

107 lines (73 loc) · 3.9 KB

"Fetch" Data Migration / Backfill

Fetch Migration provides an easy-to-use tool that simplifies the process of moving indices and their data from a "source" cluster (either Elasticsearch or OpenSearch) to a "target" OpenSearch cluster. It automates the process of comparing indices between the two clusters and only creates index metadata (settings and mappings) that do not already exist on the target cluster. Internally, the tool uses Data Prepper to migrate data for these created indices.

The Fetch Migration tool is implemented in Python. A Docker image can be built using the included Dockerfile.

Components

The tool consists of 3 components:

  • A "metadata migration" module that handles metadata comparison between the source and target clusters. This can output a human-readable report as well as a Data Prepper pipeline yaml file.
  • A "migration monitor" module that monitors the progress of the migration and shuts down the Data Prepper pipeline once the target document count has been reached
  • An "orchestrator" module that sequences these steps as a workflow and manages the kick-off of the Data Prepper process between them.

The orchestrator module is the Docker entrypoint for the tool, though each component can be executed separately via Python. Help text for each module can be printed by supplying the -h / --help flag.

Current Limitations

  • Fetch Migration runs as a single instance and does not support vertical scaling or data slicing
  • The tool does not support customizing the list of indices included for migration
  • Metadata migration only supports basic auth
  • The migration does not filter out red indices
  • In the event that the migration fails or the process dies, the created indices on the target cluster are not rolled back

Execution

Python

Navigate to the cloned GitHub repo. Then, install the required Python dependencies by running:

pipenv install

The Fetch Migration workflow can then be kicked off via the orchestrator module:

pipenv run python python/fetch_orchestrator.py --help

Docker

First build the Docker image from the Dockerfile:

docker build -t fetch-migration .

Then run the fetch-migration image. Replace <pipeline_yaml_path> in the command below with the path to your Data Prepper pipeline yaml file:

docker run -p 4900:4900 -v <pipeline_yaml_path>:/code/input.yaml fetch-migration

AWS deployment

Refer to AWS Deployment to deploy this solution to AWS.

Development

The source code for the tool is located under the python/ directory, with unit test in the tests/ subdirectory. Please refer to the Setup section to ensure that the necessary dependencies are installed prior to development.

Additionally, you'll also need to install development dependencies by running:

pipenv install

Unit Tests

Unit tests can be run from the python/ directory using:

pipenv run python -m coverage run -m unittest

Coverage

Code coverage metrics can be generated after a unit-test run. A report can either be printed on the command line:

pipenv run python -m coverage report --omit "*/tests/*"

or generated as HTML:

pipenv run python -m coverage report --omit "*/tests/*"
pipenv run python -m coverage html --omit "*/tests/*"

Note that the --omit parameter must be specified to avoid tracking code coverage on unit test code itself.