Using various Azure services, such as Azure Databricks, Azure Synapse Analytics, and Azure Data Factory, the Tokyo Olympic Data Engineering Project is a comprehensive data engineering solution that collects, processes, and analyzes data related to the Tokyo Olympic Games.
Data: https://github.com/rashmi0007/Olympic_Data_engineering_project/tree/main/Transformed_Olympic_DataSet
Data ingestion code: https://github.com/rashmi0007/Olympic_Data_engineering_project/blob/main/data_ingestion_pipelines_datafactory.JSON
The project uses Azure Data Factory to manage and automate the data integration and workflow processes. It extracts, transforms, and loads (ETL) data from different sources and stores the data in Data Lake. Then, Azure Databricks is used for data processing and transformation tasks. Databricks enables scalable and distributed data processing, allowing for effective data manipulation, cleaning, and aggregation. It also offers a collaborative environment for data engineers and data scientists to work together smoothly.
Azure Synapse Analytics, a powerful analytics service, is used for data warehousing and advanced analytics. It enables the storage and analysis of large volumes of structured and unstructured data.
After the data is transformed it can be used for visualization and analysis using Tableau or PowerBI.