Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Add support to convert to Avro IndexedRecord #275

Open
mquraishi opened this issue Mar 17, 2018 · 0 comments
Open

Add support to convert to Avro IndexedRecord #275

mquraishi opened this issue Mar 17, 2018 · 0 comments

Comments

@mquraishi
Copy link

mquraishi commented Mar 17, 2018

I can extract an avro schema using SchemaConverters.convertStructToAvro. It would be nice if I can take a dataframe and get an avro indexedrecord or genericrecord using that generated avro schema. The feature to create containered-avro objects will be very useful. Today, there is code that needs to be written. Adding to this package will make it very easy! Thanks.
A method that implements something like this:

    val df = spark.read.parquet(from_a_location)
    val ds = df.toJSON
    //These two next lines get around the lack of support of Date and Timestamp in Avro
    val jsonDF = spark.read.json(ds) // schema after going through all records
    val jsonSchema = jsonDF.schema // make it json

    val rdd = ds.rdd
    rdd.foreach(rec => {
      val json = rec.getBytes
      // Get the StructType Schema using the json schema with StringType for everything
      val avroSchema = SchemaConverters.convertStructToAvro(jsonSchema, SchemaBuilder.record("client").namespace("com.cigna.bigdata"), "com.cigna.bigdata")

      val converter = new JsonAvroConverter()
      //Get the Avro Generic Record - This blows up if any one record has a missing field.  Will need to fix that too.
      val record = converter.convertToGenericDataRecord(json, avroSchema)
	}
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant