Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core
: Generate source code dynamically at runtime for evaluation at a later stepavrohugger-tools
: Generate source code at the command line with the avrohugger-tools jar.
Alternative Distributions:
sbt-avrohugger
: Generate source code at compile time with an sbt plugin found here.avro2caseclass
: Generate source code from a web app, found here.
#####Generates Scala case classes in various formats:
-Standard
Vanilla case classes (for use with Scalavro, Salat-Avro, gfc-avro, etc.)
-SpecificRecord
Case classes that implement SpecificRecordBase
and
therefore have mutable var
fields (for use with the Avro Specific API -
Scalding, Spark, Avro, etc.).
-Scavro
Case classes with immutable fields, intended to wrap Java generated
avro classes (for use with the Scavro
runtime).
#####Supports generating case classes with arbitrary fields of the following datatypes:
- INT -> Int
- LONG -> Long
- FLOAT -> Float
- DOUBLE -> Double
- STRING -> String
- BOOLEAN -> Boolean
- NULL -> Null
- MAP -> Map
- ENUM -> scala.Enumeration (
generate-specific
: Java Enum) - BYTES -> Array[Byte]
- FIXED -> //TODO
- ARRAY -> List (
generate-scavro
: Array). See Customizable Type Mapping below. - UNION -> Option
- RECORD -> case class
####avrohugger-core
#####Get the dependency with:
"com.julianpeeters" % "avrohugger-core" %% "0.9.0"
#####Description:
Instantiate a Generator
with Standard
, Scavro
, or SpecificRecord
source
formats. Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where 'T' can be File
, Schema
, or String
.
#####Example
import avrohugger._
import format._
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File
can be .avro
, .avsc
, .avpr
, or .avdl
,
and where an input String
can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase
.
#####Doc Support:
-
.avdl
: Comments that begin with/**
are used as the documentation string for the type or field definition that follows the comment. -
.avsc
,.avpr
, and.avro
: Docs in Avro schemas are used to define a case class' ScalaDoc -
.scala
: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
#####Customizable Type Mapping:
Avro 'array' is represented by Scala List
by default. array
can be
reassigned to either Array
or Seq
by instantiating a Generator
with a
custom type map:
val generator = new Generator(SpecificRecord, avroScalaCustomType = Map("array"->classOf[Array[_]]))
#####Customizable Namespace Mapping:
Namespaces can be reassigned by instantiating a Generator
with a custom
namespace map (please see warnings below):
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
####avrohugger-tools
Download the avrohugger-tools jar for Scala 2.10 or Scala 2.11(20MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir
:
'generate' generates Scala case class definitions:
java -jar /path/to/avrohugger-tools_2.11-0.9.0-assembly.jar generate schema user.avsc .
'generate-specific' generates definitions that extend SpecificRecordBase:
java -jar /path/to/avrohugger-tools_2.11-0.9.0-assembly.jar generate-specific schema user.avsc .
'generate-scavro' generates definitions that extend Scavro's AvroSerializable:
java -jar /path/to/avrohugger-tools_2.11-0.9.0-assembly.jar generate-scavro schema user.avsc .
####sbt-avrohugger
Also available as an sbt plugin found here
that adds a generate
or generate-specific
task to compile
(an alternative
to macros).
####avro2caseclass
Code generation is also available via a web app found here. Hosted at Heroku on a hobbyist account, so it may take ~20 seconds to fire up the first time.
-
If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (as in the Avro example above).
-
For the
SpecificRecord
format, generated case class fields must be mutable (var
) in order to be compatible with the SpecificRecord API. -
When the input is a case class definition string, import statements are not supported, please use fully qualified type names if using records/classes from multiple namespaces.
-
By default, a schemas namespace is used as a package name. In the case of the Scavro output format, the default is the namespace with
model
appended. -
While Scavro format uses custom namespaces in a way that leaves it unaffected, most formats fail on schemas with records within unions (see [avro forum](see http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html)).
-
Avoid recursive schemas since they can cause compatibility issues if trying to flow data into a system that doesn't support them (e.g., Hive).
-
Use namespaces to ensure compatibility when importing into Java/Scala.
-
Use default field values in case of future schema evolution.
- Support more avro types: fixed, bytes.
The scripted
task runs all tests.
As per Doug Cutting's recommendations in the avro compiler tests,
the string-based tests in test
are augmented by scripted
tests that
generate and compile source that is run in de/serialization tests.
Depends on Avro and Treehugger. avrohugger-tools
is based on avro-tools.
Contributors:
#####Fork away, just make sure the tests pass before sending a pull request.