From 8bbf40abb67ec9ee807e2aa4711e33e9acf69263 Mon Sep 17 00:00:00 2001 From: Oscar Westra van Holthe - Kind Date: Fri, 17 May 2024 19:10:40 +0200 Subject: [PATCH] Allow specified fields to be absent Ignore some required fields when matching write and read schemata. This allows records to be parsed, that are augmented with metadata after parsing. E.g., by adding a "parsed at" timestamp, or similar. --- doc/index.md | 34 +- .../java/opwvhk/avro/io/AsAvroParserBase.java | 5 +- .../opwvhk/avro/json/JsonAsAvroParser.java | 23 +- .../avro/json/JsonAsAvroParserTest.java | 299 +++++++++--------- 4 files changed, 188 insertions(+), 173 deletions(-) diff --git a/doc/index.md b/doc/index.md index df2865b..53edc3c 100644 --- a/doc/index.md +++ b/doc/index.md @@ -15,27 +15,37 @@ This document describes the various functionality in more detail. Parsing ------- -The main day-to-day use of this library is to parse records in various formats into Avro. As such, -you won't find a converter for (for example) CSV files: these are container files with multiple -records. +The main day-to-day use of this library is to parse single records in various formats into Avro. As +a result, you won't find a converter for (for example) CSV files: these are container files with +multiple records. The following formats can be converted to Avro: -| Format | Parser constructor | -|--------------------|----------------------------------------------------------------------------------------------| -| JSON (with schema) | `opwvhk.avro.json.JsonAsAvroParser#JsonAsAvroParser(URI, boolean, Schema, GenericData)` | -| JSON (unvalidated) | `opwvhk.avro.json.JsonAsAvroParser#JsonAsAvroParser(Schema, GenericData)` | -| XML (with XSD) | `opwvhk.avro.xml.XmlAsAvroParser#XmlAsAvroParser(URL, String, boolean, Schema, GenericData)` | -| XML (unvalidated) | `opwvhk.avro.xml.XmlAsAvroParser#XmlAsAvroParser(Schema, GenericData)` | +| Format | Parser class | +|--------|-------------------------------------| +| JSON | `opwvhk.avro.json.JsonAsAvroParser` | +| XML | `opwvhk.avro.xml.XmlAsAvroParser` | -Parsers all use both a write schema and a read schema, just like Avro does. The write schema is used -to validate the input, and the read schema is used to describe the result. +Parsers require a read schema and an Avro model, determining the Avro record type to parse data into +and how to create the these records, respectively. Additionally, they support a format dependent +"write schema" (i.e., JSON schema, XSD, …), which is used for schema validation, and can be +used for input validation. + +### Schema evolution When parsing/converting data, the conversion can do implicit conversions that "fit". This includes like widening conversions (like int→long), lossy conversions (like decimal→float or anything→string) and parsing dates. With a write schema, binary conversions (from hexadecimal/base64 encoded text) are also supported. +In addition, the read schema is used for schema evolution: + +* removing fields: fields that are not present in the read schema will be ignored +* adding fields: fields that are not present in the input will be filled with the default values + from the read schema +* renaming fields: field aliases are also used to match incoming data, effectively renaming these + fields + ### Source schema optional but encouraged The parsers support as much functionality as possible when the write (source) schema is omitted. @@ -43,6 +53,8 @@ However, this is discouraged. The reason is that significant functionality is mi * No check on required fields: The parsers will happily generate incomplete records, which **will** break when using them. +* No check on compatibility: + Incompatible data cannot be detected, which **will** break the parsing process. * No input validation: Without a schema, a parser cannot validate input. This can cause unpredictable failures later on. diff --git a/src/main/java/opwvhk/avro/io/AsAvroParserBase.java b/src/main/java/opwvhk/avro/io/AsAvroParserBase.java index d411f91..0b7a8fc 100644 --- a/src/main/java/opwvhk/avro/io/AsAvroParserBase.java +++ b/src/main/java/opwvhk/avro/io/AsAvroParserBase.java @@ -390,8 +390,9 @@ protected ValueResolver createResolver(WriteSchema writeSchema, Schema readSchem * *