Pipeline Breakdown:
- ETL Job
- Twitter API Python Library
- Steam API
- Postgres Data Warehouse
- Data Dashboard
Rust cheater profiles are collected every hour from @rusthackreport with the use of a Airflow Python Operator and Twitter API Python Library Cheater Steam profiles are collected from Steam using the Steam Web API with the use of a Custom Airflow Operator Data is collected and stored in a raw S3 bucket. Raw S3 Bucket data is then transformed and stored in a staging bucket on S3. Lastly, staging S3 bucket dim and fact data is loaded with Custom Airflow Operators LoadDimOperator and LoadFactOperator
- Data collected from the Twitter API is moved to raw s3 bucket.
- Twitter data is read from raw s3 bucket then profile urls are extracted and stored in a temp s3.
- Steam data collected from Steam Web API is moved to a raw s3 bucket.
- Raw S3 Steam data undergoes transformations and data checks then stored in a staging s3 bucket.
- Data is transferred from staging S3 buckets into temp tables then into the data warehouse.
- Dashboard can be used to gain insights about cheaters with the Data Studio Dashboard.
The US has the most accounts banned for cheating with Russia trailing behind.
Most cheaters have a level 1 steam account.
The top 3 cheater names
The most common profile picture is the default steam profile picture.
The majority of cheaters get banned between 0 and 10 hours.
The top 3 games that cheaters own
Counter-Strike: Global Offensive
Apex Legends.
Top 3 Steam Groups
Cheaters use Archi's SC Farm to boost their accounts. It's a cheater's attempt to make their account look more legitimate to normal players.
Profile Visibility - A lot of people believe if a profile is private it's a cheater. More cheaters have public profiles than private profiles.
Friends of Friends - 2,565
Private - 824
Friends Only - 133
1.) Why not uses Spark? The data that is processed every hour is between 1-5MB.
2.) Why stage the Fact and Dim tables pre load? Easier to debug the pipeline in event that the pipeline fails.
Emily(mod#1073) from Data Engineering Discord -Answered questions I had about my initial data warehouse architecture. Emily was very helpful in my adventure to building a data warehouse!