A runtime reflection-based Avro library in Scala.
Scalavro takes a code-first, reflection based approach to schema generation and (de)serialization. This yields a very low-overhead interface, and imposes some costs. In general, Scalavro assumes you know what types you're reading and writing. No built-in support is provided (as yet) for so-called schema resolution (taking the writer's schema into account when reading data).
- To provide an in-memory representation of avro schemas and protocols.
- To synthesize avro schemas and protocols dynamically for a useful subset of Scala types.
- To dynamically generate Scala bindings for reading and writing Avro-mapped Scala types to and from Avro binary.
- Generally, to minimize fuss required to create an Avro-capable Scala application.
The Scalavro
artifacts are available from Maven Central. The current release is 0.3.1
, built against Scala 2.10.2.
Using SBT:
libraryDependencies += "com.gensler" %% "scalavro-io" % "0.3.1"
- Generated Scaladoc for version 0.3.1
Scala Type | Avro Type |
---|---|
Unit
|
null
|
Boolean
|
boolean
|
Byte
|
int
|
Char
|
int
|
Short
|
int
|
Int
|
int
|
Long
|
long
|
Float
|
float
|
Double
|
double
|
String
|
string
|
scala.collection.Seq[Byte]
|
bytes
|
Scala Type | Avro Type |
---|---|
scala.collection.Seq[T]
|
array
|
scala.collection.Set[T]
|
array
|
scala.collection.Map[String, T]
|
map
|
scala.Enumeration#Value
|
enum
|
enum (Java) |
enum
|
scala.util.Either[A, B]
|
union
|
scala.util.Option[T]
|
union
|
com.gensler.scalavro.util.Union[U]
|
union
|
com.gensler.scalavro.util.FixedData
|
fixed
|
Supertypes of non-recursive case classes without type parameters |
union
|
Non-recursive case classes without type parameters |
record
|
- Built against Scala 2.10.2 with SBT 0.12.4
- Depends upon spray-json
- The
io
sub-project depends upon the Apache Java implementation of Avro (Version 1.7.4)
- Dynamic Avro schema generation from vanilla Scala types
- Avro protocol definitions and schema generation
- Convenient, dynamic binary IO
- Avro RPC protocol representation and schema generation
- Schema conversion to "Parsing Canonical Form" (useful for Avro RPC protocol applications)
- JSON IO is not yet implemented
- Schema resolution (taking the writer's schema into account when reading) is not yet implemented
- Recursive type dependencies are detected but not handled optimally -- potentially valid types are rejected at runtime. For example, the current version cannot synthesize an Avro schema for a simple recursively defined linked list node. Supporting this is a planned enhancement.
import com.gensler.scalavro.types.AvroType
AvroType[Seq[String]].schema
Which yields:
{
"type" : "array",
"items" : "string"
}
import com.gensler.scalavro.types.AvroType
AvroType[Set[String]].schema
Which yields:
{
"type" : "array",
"items" : "string"
}
import com.gensler.scalavro.types.AvroType
AvroType[Map[String, Double]].schema
Which yields:
{
"type" : "map",
"values" : "double"
}
package com.gensler.scalavro.tests
import com.gensler.scalavro.types.AvroType
object CardinalDirection extends Enumeration {
type CardinalDirection = Value
val N, NE, E, SE, S, SW, W, NW = Value
}
import CardinalDirection._
AvroType[CardinalDirection].schema
Which yields:
{
"name" : "CardinalDirection",
"type" : "enum",
"symbols" : ["N","NE","E","SE","S","SW","W","NW"],
"namespace" : "com.gensler.scalavro.tests.CardinalDirection"
}
Definition (Java):
package com.gensler.scalavro.tests;
enum JCardinalDirection { N, NE, E, SE, S, SW, W, NW };
Use (Scala):
import com.gensler.scalavro.types.AvroType
import com.gensler.scalavro.tests.JCardinalDirection
AvroType[JCardinalDirection].schema
Which yields:
{
"name" : "JCardinalDirection",
"type" : "enum",
"symbols" : ["N","NE","E","SE","S","SW","W","NW"],
"namespace" : "com.gensler.scalavro.tests"
}
package com.gensler.scalavro.tests
import com.gensler.scalavro.types.AvroType
AvroType[Either[Int, Boolean]].schema
Which yields:
["int", "boolean"]
and
AvroType[Either[Seq[Double], Map[String, Seq[Int]]]].schema
Which yields:
[{
"type" : "array",
"items" : "double"
},
{
"type" : "map",
"values" : {
"type" : "array",
"items" : "int"
}
}]
package com.gensler.scalavro.tests
import com.gensler.scalavro.types.AvroType
AvroType[Option[String]].schema
Which yields:
["null", "string"]
import com.gensler.scalavro.types.AvroType
import com.gensler.scalavro.util.Union._
AvroType[union [Int] #or [String] #or [Boolean]].schema
Which yields:
["int", "string", "boolean"]
package com.gensler.scalavro.tests
import com.gensler.scalavro.types.AvroType
import com.gensler.scalavro.util.FixedData
import scala.collection.immutable
@FixedData.Length(16)
case class MD5(override val bytes: immutable.Seq[Byte])
extends FixedData(bytes)
AvroType[MD5].schema
Which yields:
{
"name": "MD5",
"type": "fixed",
"size": 16,
"namespace": "com.gensler.scalavro.tests"
}
package com.gensler.scalavro.tests
import com.gensler.scalavro.types.AvroType
case class Person(name: String, age: Int)
val personAvroType = AvroType[Person]
personAvroType.schema
Which yields:
{
"name": "Person",
"type": "record",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
],
"namespace": "com.gensler.scalavro.tests"
}
And perhaps more interestingly:
case class SantaList(nice: Seq[Person], naughty: Seq[Person])
val santaListAvroType = AvroType[SantaList]
santaListAvroType.schema
Which yields:
{
"name": "SantaList",
"type": "record",
"fields": [
{
"name": "nice",
"type": {"type": "array", "items": "Person"}
},
{
"name": "naughty",
"type": {"type": "array", "items": "Person"}
}
],
"namespace": "com.gensler.scalavro.tests"
}
Given:
class Alpha
abstract class Beta extends Alpha
case class Gamma() extends Alpha
case class Delta() extends Beta
case class Epsilon[T]() extends Beta
Usage:
import com.gensler.scalavro.AvroType
AvroType[Alpha].schema
Which yields:
[
{
"name" : "Delta",
"type" : "record",
"fields" : [],
"namespace" : "com.gensler.scalavro.tests"
},
{
"name" : "Gamma",
"type" : "record",
"fields" : [],
"namespace" : "com.gensler.scalavro.tests"
}
]
Note that in the above example:
Alpha
is excluded from the union because it is not a case classBeta
is excluded from the union because it is abstract and not a case classEpsilon
is excluded from the union because it takes type parameters
import com.gensler.scalavro.AvroType
import com.gensler.scalavro.io.AvroTypeIO
import com.gensler.scalavro.io.AvroTypeIO.Implicits._
import scala.util.{Try, Success, Failure}
case class Person(name: String, age: Int)
case class SantaList(nice: Seq[Person], naughty: Seq[Person])
val santaList = SantaList(
nice = Seq(
Person("John", 17),
Person("Eve", 3)
),
naughty = Seq(
Person("Jane", 25),
Person("Alice", 65)
)
)
val santaListType = AvroType[SantaList]
val santaListIO = santaListType.io // implicitly: AvroTypeIO[SantaList]
val outStream: java.io.OutputStream = // some stream...
santaListIO.write(santaList, outStream)
val inStream: java.io.InputStream = // some stream...
santaListIO.read(inStream) match {
case Success(readResult) => // readResult is an instance of SantaList
case Failure(cause) => // handle failure...
}
- Current Apache Avro Specification
- Scala 2.10 Reflection Overview
- Great article on schema evolution in various serialization systems
- Wickedly clever technique for representing unboxed union types, proposed by Miles Sabin
Apache Avro is a trademark of The Apache Software Foundation.
Scalavro is distributed under the BSD 2-Clause License, the text of which follows:
Copyright (c) 2013, Gensler
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.