All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
auth.conf.factory |
DefaultAuthConfFactory | Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.compression |
Compression to use (LZ4, SNAPPY or NONE) | |
connection.connections_per_executor_max |
None | Maximum number of connections per Host set on each Executor JVM. Will be updated to DefaultParallelism / Executors for Spark Commands. Defaults to 1 if not specifying and not in a Spark Env |
connection.factory |
DefaultConnectionFactory | Name of a Scala module or class implementing CassandraConnectionFactory providing connections to the Cassandra cluster |
connection.host |
localhost | Contact point to connect to the Cassandra cluster. A comma separated list may also be used. ("127.0.0.1,192.168.0.1") |
connection.keep_alive_ms |
5000 | Period of time to keep unused connections open |
connection.local_dc |
None | The local DC to connect to (other nodes will be ignored) |
connection.port |
9042 | Cassandra native connection port |
connection.reconnection_delay_ms.max |
60000 | Maximum period of time to wait before reconnecting to a dead node |
connection.reconnection_delay_ms.min |
1000 | Minimum period of time to wait before reconnecting to a dead node |
connection.timeout_ms |
5000 | Maximum period of time to attempt connecting to a node |
query.retry.count |
60 | Number of times to retry a timed-out query, Setting this to -1 means unlimited retries |
read.timeout_ms |
120000 | Maximum period of time to wait for a read to return |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
sql.pushdown.additionalClasses |
A comma separated list of classes to be used (in order) to apply additional pushdown rules for Cassandra Dataframes. Classes must implement CassandraPredicateRules | |
table.size.in.bytes |
None | Used by DataFrames Internally, will be updated in a future release to retrieve size from Cassandra. Can be set manually now |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
sql.cluster |
default | Sets the default Cluster to inherit configuration from |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.ssl.clientAuth.enabled |
false | Enable 2-way secure connection to Cassandra cluster |
connection.ssl.enabled |
false | Enable secure connection to Cassandra cluster |
connection.ssl.enabledAlgorithms |
Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA) | SSL cipher suites |
connection.ssl.keyStore.password |
None | Key store password |
connection.ssl.keyStore.path |
None | Path for the key store being used |
connection.ssl.keyStore.type |
JKS | Key store type |
connection.ssl.protocol |
TLS | SSL protocol |
connection.ssl.trustStore.password |
None | Trust store password |
connection.ssl.trustStore.path |
None | Path for the trust store being used |
connection.ssl.trustStore.type |
JKS | Trust store type |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
dev.customFromDriver |
None | Provides an additional class implementing CustomDriverConverter for those clients that need to read non-standard primitive Cassandra types. If your Cassandra implementation uses a Java Driver which can read DataType.custom() you may need it this. If you are using OSS Cassandra this should never be used. |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
concurrent.reads |
512 | Sets read parallelism for joinWithCassandra tables |
input.consistency.level |
LOCAL_ONE | Consistency level to use when reading |
input.fetch.size_in_rows |
1000 | Number of CQL rows fetched per driver request |
input.join.throughput_query_per_sec |
2147483647 | **Deprecated** Please use input.reads_per_sec. Maximum read throughput allowed per single core in query/s while joining RDD with Cassandra table |
input.metrics |
true | Sets whether to record connector specific metrics on write |
input.reads_per_sec |
2147483647 | Sets max requests per core per second for joinWithCassandraTable and some Enterprise integrations |
input.split.size_in_mb |
64 | Approx amount of data to be fetched into a Spark partition. Minimum number of resulting Spark partitions is 1 + 2 * SparkContext.defaultParallelism |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
output.batch.grouping.buffer.size |
1000 | How many batches per single Spark task can be stored in memory before sending to Cassandra |
output.batch.grouping.key |
Partition | Determines how insert statements are grouped into batches. Available values are
|
output.batch.size.bytes |
1024 | Maximum total size of the batch in bytes. Overridden by spark.cassandra.output.batch.size.rows |
output.batch.size.rows |
None | Number of rows per single batch. The default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row |
output.concurrent.writes |
5 | Maximum number of batches executed in parallel by a single Spark task |
output.consistency.level |
LOCAL_QUORUM | Consistency level for writing |
output.ifNotExists |
false | Determines that the INSERT operation is not performed if a row with the same primary key already exists. Using the feature incurs a performance hit. |
output.ignoreNulls |
false | In Cassandra >= 2.2 null values can be left as unset in bound statements. Setting this to true will cause all null values to be left as unset rather than bound. For finer control see the CassandraOption class |
output.metrics |
true | Sets whether to record connector specific metrics on write |
output.throughput_mb_per_sec |
2.147483647E9 | *(Floating points allowed)* Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability |
output.timestamp |
0 | Timestamp (microseconds since epoch) of the write. If not specified, the time that the write occurred is used. A value of 0 means time of write. |
output.ttl |
0 | Time To Live(TTL) assigned to writes to Cassandra. A value of 0 means no TTL |