SMV training track. Follow the training tasks below to gain familiarity with SMV and Spark. This is not meant as an exhaustive introduction to SMV and Spark, but it is a quick way to learn the essentials.
Upon completion of this training, user will have basic understanding of SMV + Spark and should have a fully setup environment to do further development.
By forking this project and commiting changes to user project, we can help with individual questions on training progress.
- Learn basic Git/Github operations
- Create a github account.
- Fork this project (do NOT just clone it) into your own github account.
- Clone the forked project into local machine.
- Create an issue for this task
- Make some changes on this readme file on your local project dir, commit the change, and push (no need for pull request)
- Close the issue for this task. All the following tasks need to have an "issue" created
- Follow Smv User Guide - Installation to setup SMV
- Follow Smv User Guide - Get Start to setup this project.
- Project Name should be SmvTraining and FQN should be
org.tresamigos.smvtraining
- Move everything from the created project directory to this cloned project directory and commit and push
- Update the SMV config param smv.dataDir in the file conf/smv-user-conf.props in the project directory to reflect the new location after the project directory has been moved. See Smv Application Configuration for further details.
- Make some small changes on the example modules, save it, compile (mvn package), and run with either smv-run or smv-shell. May need to learn a little about Maven at this stage
- Pickup some basic Scala: Scala For The Impatiant
- Go through Spark SQL and Dataframe programing guide and Spark Scala API doc, majorly the methods in the DataFrame Class and functions in the functions package
- Make some changes on the example modules by trying out some of the Spark functions
- Go through the rest of SMV User Guide
- Write a UDF function and apply it to a column of a Spark DataFrame