Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

'Unknown datum type' exception generated by ArrayType elements #285

Open
florin1288 opened this issue Jul 12, 2018 · 1 comment
Open

'Unknown datum type' exception generated by ArrayType elements #285

florin1288 opened this issue Jul 12, 2018 · 1 comment

Comments

@florin1288
Copy link

florin1288 commented Jul 12, 2018

Hello,

Caused by: org.apache.avro.AvroRuntimeException: Unknown datum type [Ljava.lang.Object;: [Ljava.lang.Object;@616f81b5 at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:636) at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:601) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)

The above exception is triggered in the following scenario:
case class Model(params: Option[List[String]])

The above case class generates the following schema:
{"type":"record","name":"Model","namespace":"Test","fields":[{"name":"params","type":[{"type":"array","items":["string","null"]},"null"]}]}

Now, when I create my converterToAvro:
val structType: StructType = Encoders.product[Model].schema
val converter = createConverterToAvro(structType, recordName, recordNamespace)

...and try to generate my genericRecord:
val record: GenericRecord = converter(item).asInstanceOf[GenericRecord]

...I get the above exception!

This happens because in the implementation of AvroOutputWriter.createConverterToAvro, in the case ArrayType, we have the following:
val targetArray = new Array[Any](sourceArraySize)

...and GenericData.getSchemaName does this check:
if (isArray(datum)) return Type.ARRAY.getName();
protected boolean isArray(Object datum) { return datum instanceof Collection;}

Now scala.Array is not an instance of Collection and it will fail gracefully.
In order to fix this, we can use java.util.ArrayList!!!

@florin1288 florin1288 changed the title Unknown datum type generated by ArrayType elements 'Unknown datum type' exception generated by ArrayType elements Jul 12, 2018
@a10y
Copy link

a10y commented Sep 6, 2018

We also bumped into this, it appears to be fixed since the merge in upstream Spark by using ArrayList instead of Array apache/spark@9603087#diff-01fea32e6ec6bcf6f34d06282e08705aR103

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants