spark-cluster

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 12, 2024
Python

karamolegkos / Diastema

Star

This is my contribution in the project Diastema

api kubernetes spark kubernetes-api openstack-heat spark-cluster spark-on-kubernetes microstack diastema openstack-heat-api

Updated Sep 2, 2022
Python

dazayzeh / Installing-spark-standalone-to-a-cluster-manually

Star

I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.

spark-cluster spark-standalone spark-monitor

Updated Jul 28, 2020

shuaicj / spark-cluster-zk

Star

A spark cluster containing multiple spark masters based on docker-compose.

docker spark docker-compose zookeeper spark-cluster spark-master

Updated Mar 23, 2018
Shell

ayseirmak / DistributedFraudDetection

Star

In this study, we propose to use a distributed storage and computation system in order to track money transfers instantly. In particular, we keep our transaction history in a distributed file system as a graph data structure. We try to detect illegal activities by using Graph Neural Networks (GNN) in distributed manner.

apache-spark python3 keras-tensorflow graph-convolutional-networks spark-cluster bigdl-orca

Updated Jan 30, 2024
Python

matthieuvion / spark-cluster

Star

Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset

docker-compose pyspark-notebook spark-cluster jupyter-docker-stacks randomforestregressor

Updated Jul 10, 2023
Jupyter Notebook

itsayushthada / SVD-on-Spark

Star

kubernetes-cluster pyspark pyspark-notebook spark-cluster

Updated Jul 11, 2021
Jupyter Notebook

aimanamri / raspberry-pi4-hadoop-spark-cluster

Star

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

big-data yarn pyspark hdfs distributed-storage hadoop-cluster parallel-processing spark-shell spark-cluster raspberry-pi-4

Updated Jul 13, 2024
Shell

harshkavdikar1 / GeoSpatial-DataAnalysis-With-Spark

Star

A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.

scala spark ec2-instance spark-sql spark-cluster hot-cell-analysis hot-zone-analysis

Updated May 6, 2020
Scala

kumarvna / terraform-azurerm-hdinsight

Sponsor

Star

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.