How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260

kant111 · 2017-12-16T04:17:50Z

The below code reads the messages from Kafka and the messages are in Avro so how do I parse the message and put it into a dataframe in Spark 2.2.0?

Dataset<Row> df = sparkSession.readStream()
            .format("kafka")
            .option("kafka.bootstrap.servers", "localhost:9092")
            .option("subscribe", "topic1")
            .load();

The text was updated successfully, but these errors were encountered:

kant111 · 2017-12-18T20:37:10Z

Is there a from_avro function just like from_json function that is already available ?

// Json version that is already available.

StructType jsonSchema = new StructType()......;
df.select(from_json(new Column("value").cast("string"), jsonSchema).as("payload"));

// Avro version that is not yet available.

StructType avroSchema = new StructType()......;
df.select(from_avro(new Column("value").cast("string"), avroSchema).as("payload"));

kant111 · 2017-12-18T20:39:09Z

@gengliangwang

peay · 2018-01-25T09:14:07Z

I would also be interested in this, in the context of reading Avro from DynamoDB instead. Is there a way to mix input sources such as Kafka/DynamoDB/etc. with spark-avro? This would be very useful.

bobbui · 2018-04-25T00:57:50Z

need this badly as well

devsaik · 2018-07-15T15:29:24Z

Looking forward for this feature

mushgrant · 2018-07-18T18:38:25Z

There's a databricks page (https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#avro-dataframe) that claims there is a from_avro
But, it returns "error: not found: value from_avro" after importing:

import com.databricks.spark.avro._
import org.apache.avro.SchemaBuilder

kant111 · 2018-07-19T05:52:52Z

I just checked again. It doesn’t exist

samklr · 2018-08-28T12:54:45Z

I think Databricks has not open sourced it. It must be working on their plateform

OneCricketeer · 2018-08-30T19:32:03Z

This project seems more up-to-date with Kafka support https://github.com/AbsaOSS/ABRiS

OneCricketeer · 2018-12-10T22:36:48Z

This post shows example usage with Kafka and Spark 2.4 Avro support https://databricks.com/blog/2018/11/30/apache-avro-as-a-built-in-data-source-in-apache-spark-2-4.html

sterkh66 · 2018-12-11T13:58:21Z

Does it support confluent Kafka avro format ?

OneCricketeer · 2018-12-11T14:02:13Z

@sterkh66 The abris library above does. The Spark library is just whatever was available in here, AFAIK

sterkh66 · 2018-12-11T14:15:35Z

@Cricket007 Thanks for quick reply. This article gets me confused https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html, so far I don't see support for schema registry described in there's example

OneCricketeer · 2018-12-11T14:17:31Z

Again. It doesn't. Just because there's Avro in Kafka doesn't mean you need to use a Schema Registry. The messages in the blog will need to have the schema as part of the message

dbolshak · 2018-12-19T15:33:37Z

@Cricket007 There is an explicit example of schema registry usage in databricks blog.
Could you explain why schema registry is mentioned there? I believe it confuses a lot of people.

OneCricketeer · 2018-12-19T15:39:23Z

@dbolshak I didn't write the article and have no affiliation with Databricks.

I can only speculate that because Confluent Platform is one of the main enterprise deployments of Kafka and people kept filing issues about being unable to use "Confluent encoded" Avro and/or how to integrate this library with the Schema Registry

OneCricketeer · 2018-12-19T15:42:15Z

If you mean it's confusing to see Databricks have one example that's not in the Spark documentation, then I agree, and I've voiced my opinions in the Spark JIRA, but that's not an issue to discuss here as well

sterkh66 · 2018-12-19T15:58:11Z

@Cricket007 There's still more. The from_avro and to_avro functions mentioned in the blog databricks blog have never been included to 4.0 release and remained unmerged as this PR

OneCricketeer · 2018-12-19T16:05:07Z

From what I understand, Databricks platform maintains their own Avro functions that include the Schema Registry support, and those methods that allow for the url are not open sourced. The remainder of this repo is now merged with Spark 2.4

sterkh66 · 2018-12-19T17:51:25Z

This seems to be the only reasonable explanation and non open-sourced version has been already supposed in one of the comments above. Anyway, Jordan, thanks for participating in this tricky "investigation".

gengliangwang · 2018-12-19T18:03:36Z

@Cricket007 @sterkh66 @dbolshak The Schema Registry support is Databricks Runtime only.

@kant111 The function is already in Spark 2.4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260

How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260

kant111 commented Dec 16, 2017

kant111 commented Dec 18, 2017 •

edited

Loading

kant111 commented Dec 18, 2017

peay commented Jan 25, 2018

bobbui commented Apr 25, 2018

devsaik commented Jul 15, 2018

mushgrant commented Jul 18, 2018 •

edited

Loading

kant111 commented Jul 19, 2018

samklr commented Aug 28, 2018

OneCricketeer commented Aug 30, 2018

OneCricketeer commented Dec 10, 2018

sterkh66 commented Dec 11, 2018

OneCricketeer commented Dec 11, 2018

sterkh66 commented Dec 11, 2018

OneCricketeer commented Dec 11, 2018

dbolshak commented Dec 19, 2018 •

edited

Loading

OneCricketeer commented Dec 19, 2018

OneCricketeer commented Dec 19, 2018

sterkh66 commented Dec 19, 2018

OneCricketeer commented Dec 19, 2018

sterkh66 commented Dec 19, 2018

gengliangwang commented Dec 19, 2018 •

edited

Loading

How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260

How to parse Avro messages while read a stream of messages from Kakfa in Spark 2.2.0? #260

Comments

kant111 commented Dec 16, 2017

kant111 commented Dec 18, 2017 • edited Loading

kant111 commented Dec 18, 2017

peay commented Jan 25, 2018

bobbui commented Apr 25, 2018

devsaik commented Jul 15, 2018

mushgrant commented Jul 18, 2018 • edited Loading

kant111 commented Jul 19, 2018

samklr commented Aug 28, 2018

OneCricketeer commented Aug 30, 2018

OneCricketeer commented Dec 10, 2018

sterkh66 commented Dec 11, 2018

OneCricketeer commented Dec 11, 2018

sterkh66 commented Dec 11, 2018

OneCricketeer commented Dec 11, 2018

dbolshak commented Dec 19, 2018 • edited Loading

OneCricketeer commented Dec 19, 2018

OneCricketeer commented Dec 19, 2018

sterkh66 commented Dec 19, 2018

OneCricketeer commented Dec 19, 2018

sterkh66 commented Dec 19, 2018

gengliangwang commented Dec 19, 2018 • edited Loading

kant111 commented Dec 18, 2017 •

edited

Loading

mushgrant commented Jul 18, 2018 •

edited

Loading

dbolshak commented Dec 19, 2018 •

edited

Loading

gengliangwang commented Dec 19, 2018 •

edited

Loading