-
Notifications
You must be signed in to change notification settings - Fork 307
How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260
Comments
Is there a // Json version that is already available.
// Avro version that is not yet available.
|
I would also be interested in this, in the context of reading Avro from DynamoDB instead. Is there a way to mix input sources such as Kafka/DynamoDB/etc. with |
need this badly as well |
Looking forward for this feature |
There's a databricks page (https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#avro-dataframe) that claims there is a from_avro import com.databricks.spark.avro._ |
I just checked again. It doesn’t exist |
I think Databricks has not open sourced it. It must be working on their plateform |
This project seems more up-to-date with Kafka support https://github.com/AbsaOSS/ABRiS |
This post shows example usage with Kafka and Spark 2.4 Avro support https://databricks.com/blog/2018/11/30/apache-avro-as-a-built-in-data-source-in-apache-spark-2-4.html |
Does it support confluent Kafka avro format ? |
@sterkh66 The abris library above does. The Spark library is just whatever was available in here, AFAIK |
@Cricket007 Thanks for quick reply. This article gets me confused https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html, so far I don't see support for schema registry described in there's example |
Again. It doesn't. Just because there's Avro in Kafka doesn't mean you need to use a Schema Registry. The messages in the blog will need to have the schema as part of the message |
@Cricket007 There is an explicit example of schema registry usage in databricks blog. |
@dbolshak I didn't write the article and have no affiliation with Databricks. I can only speculate that because Confluent Platform is one of the main enterprise deployments of Kafka and people kept filing issues about being unable to use "Confluent encoded" Avro and/or how to integrate this library with the Schema Registry |
If you mean it's confusing to see Databricks have one example that's not in the Spark documentation, then I agree, and I've voiced my opinions in the Spark JIRA, but that's not an issue to discuss here as well |
@Cricket007 There's still more. The |
From what I understand, Databricks platform maintains their own Avro functions that include the Schema Registry support, and those methods that allow for the url are not open sourced. The remainder of this repo is now merged with Spark 2.4 |
This seems to be the only reasonable explanation and non open-sourced version has been already supposed in one of the comments above. Anyway, Jordan, thanks for participating in this tricky "investigation". |
@Cricket007 @sterkh66 @dbolshak The Schema Registry support is Databricks Runtime only. @kant111 The function is already in Spark 2.4. |
The below code reads the messages from Kafka and the messages are in Avro so how do I parse the message and put it into a dataframe in Spark 2.2.0?
The text was updated successfully, but these errors were encountered: