Skip to content
Devender Yadav edited this page Jul 27, 2015 · 2 revisions

Apache Spark

Apache Spark is a fast and general-purpose cluster computing system. Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including [Spark SQL] (http://spark.apache.org/docs/latest/sql-programming-guide.html) for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.

##Support Being a JPA provider, Kundera provides support for Spark. It allows to perform write data to dabases/hdfs/csv/json files and read & query operations over the data using JPA specifications.

Kundera provides 3 modules with Spark:

  • spark-core : It deals with HDFS and FS(CSV & JSON) part. You can perform read, write operations & query data over there.
  • spark-cassandra : This module is designed for Cassandra. Similarly, you can perform read, write operations & query data over there.
  • spark-mongodb : This module is designed for MongoDB. In the same way, you can perform read, write operations & query data over there.

Home

Clone this wiki locally