Last Updated: 03/24/2019
Pilot-Streaming is a tool to manage Streaming environments, e.g., Kafka, Spark Streaming, Flink and Dask on HPC systems. Further, it is able to deploy auxiliary components for HPC/Cloud-based streaming to edge resources (via SSH access).
Requirements:
* PBS/Torque cluster
* Working directory should be on a shared filesystem
* Setup password-less documentation
* JAVA needs to be installed and in PATH
Anaconda is the preferred distribution
Requirement (in case a manual installation is required):
The best way to utilize Pilot-Streaming is Anaconda, which provides an easy way to install important dependencies (such as PySpark and Dask). Make sure the PySpark version is compabitible with the Pilot-Streaming version (currently 2.4.4).
conda install paramiko
conda install -c conda-forge boto3 python-openstackclient pykafka pyspark dask distributed python-confluent-kafka pexpect redis-py
pip install hostlist
To install Pilot-Streaming type:
pip install --upgrade .
or (if pip issues, e.g. on Stampede2)
python setup.py install
Try to run a Hadoop cluster inside a PBS/Torque job:
psm --resource pbs+ssh://india.futuregrid.org --number_cores 8
Some Blog Posts about SAGA-Hadoop:
* <http://randomlydistributed.blogspot.com/2011/01/running-hadoop-10-on-distributed.html>
see hadoop1
for setting up a Hadoop 1.x.x cluster
see hadoop2
for setting up a Hadoop 2.7.x cluster
see spark
for setting up a Spark 2.2.x cluster
see kafka
for setting up a Kafka 1.0.x cluster
see flink
for setting up a Flink 1.1.4 cluster
see dask
for setting up a Dask Distributed 1.20.2 cluster
Stampede:
psm --resource=slurm://localhost --queue=normal --walltime=239 --number_cores=256 --project=xxx
Gordon:
psm --resource=pbs://localhost --walltime=59 --number_cores=16 --project=TG-CCR140028 --framework=spark
Wrangler
export JAVA_HOME=/usr/java/jdk1.8.0_45/
psm --resource=slurm://localhost --queue=normal --walltime=59 --number_cores=24 --project=xxx
- Testing Kafka Config:
bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --all --entity-type brokers