Spark Modularized View (SMV)

Spark Modularized View enables users to build enterprise scale applications on Apache Spark platform.

Why SMV

Scales with DATA size
Scales with CODE size
Scales with TEAM size

In addition to the data scalability inherited from Spark, SMV also provides code and team scalability through the following features:

Multi-level modular design allow developers to work on large scale projects, and enable easy code and data reuse
Multi-grain traceability to support full scope knowledge transparency to developers and data users
Provides interfaces to multiple languages(Scala and R for now) for easy integrating to existing code and leverage existing developer experiences
Pure text code, can utilized modern CM (Configuration Management) tool to track and merge changes among team members
Automatic Data and Code version synchronization to enable coordination on both code and data level
Data publishing mechanism to support inter-team coordination
Build-in data quality management to ensure data quality in a continuous bases
High level helper functions and tools for quick data App development

Please refer to User Guide and API docs for details.

Note: The sections below were extracted from the User Guide and it should be consulted for more detailed instructions.

Install SMV with Docker

Docker

Install Docker. An installation guide for your machine may be found here.

SMV

Pull this repository, navigate to the docker directory, and build the SMV docker image with

docker build -t smv .

Now run SMV with

docker run -rm -it -v /path/to/projects:/projects -v /path/to/data:/data smv

SMV Getting Started

Create example App

SMV provides a shell script to easily create an example application. The example app can be used for exploring SVM and it can also be used as an initialization script for a new project.

$ _SMV_HOME_/tools/smv-init MyApp com.mycompany.myapp

Build and Run Example App

$ mvn clean install
$ _SMV_HOME_/tools/smv-run --run-app

The output csv file and schema can be found in the data/output directory (as configured in the conf/smv-user-conf.props files).

$ cat data/output/com.mycompany.myapp.stage1.EmploymentByState_XXXXXXXX.csv/part-* | head -5
"32",981295
"33",508120
"34",3324188
"35",579916
"36",7279345

$ cat data/output/com.mycompany.myapp.stage1.EmploymentByState_XXXXXXXX.schema/part-*
FIRST('ST): String
EMP: Long

See Getting Started section of User Guide for further details.

Generate dependency graph

If smv-run is provided the -g flag, instead of running and persisting the module, the module dependency graph will be created as a dot file. It can be converted to png using the dot command.

$ _SMV_HOME_/tools/smv-run -g -m com.mycompany.myapp.stage1.EmploymentByState
$ dot -Tpng com.mycompany.MyApp.stage1.EmploymentByState.dot -o graph.png

See Run SMV Application for further details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,027 Commits
R		R
docker		docker
docs		docs
project		project
python		python
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
log4j.properties		log4j.properties
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Modularized View (SMV)

Why SMV

Install SMV with Docker

Docker

SMV

SMV Getting Started

Create example App

Build and Run Example App

Generate dependency graph

About

Releases

Packages

Languages

License

sancychen/SMV

Folders and files

Latest commit

History

Repository files navigation

Spark Modularized View (SMV)

Why SMV

Install SMV with Docker

Docker

SMV

SMV Getting Started

Create example App

Build and Run Example App

Generate dependency graph

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages