Fetch Migration provides an easy-to-use tool that simplifies the process of moving indices and their data from a "source" cluster (either Elasticsearch or OpenSearch) to a "target" OpenSearch cluster. It automates the process of comparing indices between the two clusters and only creates index metadata (settings and mappings) that do not already exist on the target cluster. Internally, the tool uses Data Prepper to migrate data for these created indices.
The Fetch Migration tool is implemented in Python. A Docker image can be built using the included Dockerfile.
The tool consists of 3 components:
- A "metadata migration" module that handles metadata comparison between the source and target clusters.
This can output a human-readable report as well as a Data Prepper pipeline
yaml
file. - A "migration monitor" module that monitors the progress of the migration and shuts down the Data Prepper pipeline once the target document count has been reached
- An "orchestrator" module that sequences these steps as a workflow and manages the kick-off of the Data Prepper process between them.
The orchestrator module is the Docker entrypoint for the tool, though each component can be executed separately
via Python. Help text for each module can be printed by supplying the -h / --help
flag.
- Fetch Migration runs as a single instance and does not support vertical scaling or data slicing
- The tool does not support customizing the list of indices included for migration
- Metadata migration only supports basic auth
- The migration does not filter out
red
indices - In the event that the migration fails or the process dies, the created indices on the target cluster are not rolled back
- Clone this GitHub repo
- Install Python
- Ensure that pip is installed
- (Optional) Set up and activate a virtual environment
Navigate to the cloned GitHub repo. Then, install the required Python dependencies by running:
pipenv install
The Fetch Migration workflow can then be kicked off via the orchestrator module:
pipenv run python python/fetch_orchestrator.py --help
First build the Docker image from the Dockerfile
:
docker build -t fetch-migration .
Then run the fetch-migration
image.
Replace <pipeline_yaml_path>
in the command below with the path to your Data Prepper pipeline yaml
file:
docker run -p 4900:4900 -v <pipeline_yaml_path>:/code/input.yaml fetch-migration
Refer to AWS Deployment to deploy this solution to AWS.
The source code for the tool is located under the python/
directory, with unit test in the tests/
subdirectory.
Please refer to the Setup section to ensure that the necessary dependencies are installed prior to development.
Additionally, you'll also need to install development dependencies by running:
pipenv install
Unit tests can be run from the python/
directory using:
pipenv run python -m coverage run -m unittest
Code coverage metrics can be generated after a unit-test run. A report can either be printed on the command line:
pipenv run python -m coverage report --omit "*/tests/*"
or generated as HTML:
pipenv run python -m coverage report --omit "*/tests/*"
pipenv run python -m coverage html --omit "*/tests/*"
Note that the --omit
parameter must be specified to avoid tracking code coverage on unit test code itself.