Skip to content

Latest commit

 

History

History
98 lines (78 loc) · 4.06 KB

README.md

File metadata and controls

98 lines (78 loc) · 4.06 KB

Data Engineer Project

Table of Contents
  1. About The Project
  2. Getting Started
  3. Running the application
  4. Next Steps
  5. Contact

About The Project

For this data engineering exercise, I wrote a small python application that reads data from an AWS SQS Queue, transforms the data, and then writes to a Postgres database. To setup the infracture required for this application, see the README provided in the data-engineering-take-home directory located in this repository.

Built with

  • Python
  • Docker
  • AWS
  • PostgreSQL

Getting Started

After following the directions contained in the README inside the data-engineering-take-home directory, there is only a few additional steps required to run the application.

Set up config file

To allign with security best practices, normally you would add the config.ini file to .gitignore to hide sensative data. However, as the instructions provide username/password, I chose to provide it in the repository for simplicity. The file should contain the following structure:

[sqs]
region_name=<value>
queue_url=<value>
aws_access_key_id=<value>
aws_secret_access_key=<value>

[postgresql]
host=<value>
database=<value>
user=<value>
password=<value>

Install psycopg2

In order to connect and write to the postgres database, I utilized a python module called psycopg2. To install said module, simply type:

pip3 install psycopg2-binary

Running the Application

To run the application, ensure that all the prerequisites in this README and the README located in the data-engineering-take-home directory have been followed. After validating all required software is installed and the infrastructure is running, simply run the program with the following command:

python3 data_engineering_project.py

The program will continue to run until the user exits it by executing a ctr-c key combination. Note that the program will return data to the screen about the actions it is performing.

Next Steps

  • Enhance logging functionality
    • The logging in this application can be improved in a few categories:
      • Writing to syslog facility for a centralized logging location
      • More verbose logging to give enhanced details to simplify troubleshooting efforts
  • Better error handling
    • The application could use an overhaul to error handling in general
      • Improve the current error handling to include additional information
      • Build in more error handlers
      • etc.
  • Implement a daemonized approach
    • Implementing a daemon or cron approach will allow a more seamless experience
  • The main function could be redesigned for better maintainability and readability

Contact

LinkedIn