Data Engineer Project

Table of Contents

About The Project
- Built With
Getting Started
- Set up config file
- Install psycopg2
Running the application
Next Steps
Contact

About The Project

For this data engineering exercise, I wrote a small python application that reads data from an AWS SQS Queue, transforms the data, and then writes to a Postgres database. To setup the infracture required for this application, see the README provided in the data-engineering-take-home directory located in this repository.

Built with

Getting Started

After following the directions contained in the README inside the data-engineering-take-home directory, there is only a few additional steps required to run the application.

Set up config file

To allign with security best practices, normally you would add the config.ini file to .gitignore to hide sensative data. However, as the instructions provide username/password, I chose to provide it in the repository for simplicity. The file should contain the following structure:

[sqs]
region_name=<value>
queue_url=<value>
aws_access_key_id=<value>
aws_secret_access_key=<value>

[postgresql]
host=<value>
database=<value>
user=<value>
password=<value>

Install psycopg2

In order to connect and write to the postgres database, I utilized a python module called psycopg2. To install said module, simply type:

pip3 install psycopg2-binary

Running the Application

To run the application, ensure that all the prerequisites in this README and the README located in the data-engineering-take-home directory have been followed. After validating all required software is installed and the infrastructure is running, simply run the program with the following command:

python3 data_engineering_project.py

The program will continue to run until the user exits it by executing a ctr-c key combination. Note that the program will return data to the screen about the actions it is performing.

Next Steps

Enhance logging functionality
- The logging in this application can be improved in a few categories:
  - Writing to syslog facility for a centralized logging location
  - More verbose logging to give enhanced details to simplify troubleshooting efforts
Better error handling
- The application could use an overhaul to error handling in general
  - Improve the current error handling to include additional information
  - Build in more error handlers
  - etc.
Implement a daemonized approach
- Implementing a daemon or cron approach will allow a more seamless experience
The main function could be redesigned for better maintainability and readability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Engineer Project

About The Project

Built with

Getting Started

Set up config file

Install psycopg2

Running the Application

Next Steps

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Engineer Project

About The Project

Built with

Getting Started

Set up config file

Install psycopg2

Running the Application

Next Steps

Contact