#spark-tk-jupyter
- Docker image for Spark-tk in a Jupyter notebook.
- Contain pandas-cookbook example notebooks from this repository: https://github.com/jvns/pandas-cookbook
##What's new
This is the initial release of the spark-tk-jupyter
repo.
##Known issues None.
- Pull all the submodules:
git submodule update --init --recursive
sudo docker build .
- Or if you are behind a proxy use this:
sudo docker build --build-arg HTTP_PROXY=$http_proxy --build-arg HTTPS_PROXY=$http_proxy --build-arg NO_PROXY=$no_proxy --build-arg http_proxy=$http_proxy --build-arg https_proxy=$http_proxy --build-arg no_proxy=$no_proxy .
sudo docker run -p 8900:8888 YOUR_JUPYTER_IMAGE_TAG
##Features
- PySpark, Spark Shell
- Jupyter REST API's for upload and running PySpark/SparkTK scripts
- ATK libraries
- Spark-TK libraries
- Anaconda 2.7
- TAP Help menus
- Examples notebooks from the project jupyter-default-notebooks
-
currently the only way to upload files to Jupyter is using the Upload Form. after each attempt to upload, the file(s) are loaded into a directory format like "uploads/dddd" where d is a digit.
-
curl http://JUPYTER_NOTEBOOK_URL/upload -F "filearg=@/home/ashahba/frame-basics.py"
-
curl http://JUPYTER_NOTEBOOK_URL/upload -F "filearg=@/home/ashahba/frame-basics.py" -F "filearg=@/home/ashahba/frame-advanced.py"
- curl http://JUPYTER_NOTEBOOK_URL/delete -d "app-path=uploads/0001"
- curl http://JUPYTER_NOTEBOOK_URL/rename -d "app-path=uploads/0001" -d "dst-path=uploads/myapp"
- curl http://JUPYTER_NOTEBOOK_URL/spark-submit -d "driver-path=uploads/0001/frame-basics.py"
- curl http://JUPYTER_NOTEBOOK_URL/logs -d "app-path=uploads/0001" -d "offset=1" -d "n=100"
- curl http://JUPYTER_NOTEBOOK_URL/status -d "app-path=uploads/0001"