Table of Contents
For this data engineering exercise, I wrote a small python application that reads data from an AWS SQS Queue, transforms the data, and then writes to a Postgres database. To setup the infracture required for this application, see the README provided in the data-engineering-take-home directory located in this repository.
After following the directions contained in the README inside the data-engineering-take-home directory, there is only a few additional steps required to run the application.
To allign with security best practices, normally you would add the config.ini file to .gitignore to hide sensative data. However, as the instructions provide username/password, I chose to provide it in the repository for simplicity. The file should contain the following structure:
[sqs]
region_name=<value>
queue_url=<value>
aws_access_key_id=<value>
aws_secret_access_key=<value>
[postgresql]
host=<value>
database=<value>
user=<value>
password=<value>
In order to connect and write to the postgres database, I utilized a python module called psycopg2. To install said module, simply type:
pip3 install psycopg2-binary
To run the application, ensure that all the prerequisites in this README and the README located in the data-engineering-take-home directory have been followed. After validating all required software is installed and the infrastructure is running, simply run the program with the following command:
python3 data_engineering_project.py
The program will continue to run until the user exits it by executing a ctr-c
key combination. Note that the program will return data to the screen about the actions it is performing.
- Enhance logging functionality
- The logging in this application can be improved in a few categories:
- Writing to syslog facility for a centralized logging location
- More verbose logging to give enhanced details to simplify troubleshooting efforts
- The logging in this application can be improved in a few categories:
- Better error handling
- The application could use an overhaul to error handling in general
- Improve the current error handling to include additional information
- Build in more error handlers
- etc.
- The application could use an overhaul to error handling in general
- Implement a daemonized approach
- Implementing a daemon or cron approach will allow a more seamless experience
- The main function could be redesigned for better maintainability and readability