Skip to content

Commit

Permalink
AVRO-1830 [Perl] Support containers without codec
Browse files Browse the repository at this point in the history
As [the specification on Object Container Files][spec] states
(emphasis added):

> All metadata properties that start with "avro." are reserved.
> The following file metadata properties are currently used:
>
>   * **avro.schema** contains the schema of objects stored in
>     the file, as JSON data (required).
>   * **avro.codec** the name of the compression codec used to
>     compress blocks, as a string. Implementations are required
>     to support the following codecs: "null" and "deflate". _If
>     codec is absent, it is assumed to be "null"_. The codecs are
>     described with more detail below.

This change makes it so that the Perl implementation does not die
when opening a container that does not contain an explicit codec
in its metadata.

This change is inspired by one originally submitted in 2016 by
SK Liew.

[spec]: https://avro.apache.org/docs/1.11.1/specification/#object-container-files
  • Loading branch information
jjatria committed Jun 21, 2024
1 parent fdab5db commit df144a9
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 2 deletions.
3 changes: 3 additions & 0 deletions lang/perl/Changes
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ Revision history for Perl extension Avro

- Switch from JSON::XS to JSON::MaybeXS to support
multiple JSON backends
- Support object containers without an explicit
codec. It will be assumed to be 'null' as mandated
by the spec.

1.00 Fri Jan 17 15:00:00 2014
- Relicense under apache license 2.0
Expand Down
4 changes: 2 additions & 2 deletions lang/perl/lib/Avro/DataFileReader.pm
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ sub new {

sub codec {
my $datafile = shift;
return $datafile->metadata->{'avro.codec'};
return $datafile->metadata->{'avro.codec'} || 'null';
}

sub writer_schema {
Expand Down Expand Up @@ -99,7 +99,7 @@ sub read_file_header {
$datafile->{sync_marker} = $data->{sync}
or croak "sync marker appears invalid";

my $codec = $data->{meta}{'avro.codec'} || "";
my $codec = $data->{meta}{'avro.codec'} || 'null';

throw Avro::DataFile::Error::UnsupportedCodec($codec)
unless Avro::DataFile->is_codec_valid($codec);
Expand Down
17 changes: 17 additions & 0 deletions lang/perl/t/04_datafile.t
Original file line number Diff line number Diff line change
Expand Up @@ -221,4 +221,21 @@ is_deeply $all[0], $data, "Our data is intact!";
is scalar @next, 0, "no more objects back";
}

## Test with a datafile that has no codec
{
my $container = join '',
"Obj\x{01}",
"\x{02}\x{16}avro.schema\x{10}\x{22}string\x{22}\x{00}",
"\x{de}\x{ad}\x{be}\x{ef}" x 4,
"\x{02}\x{08}\x{06}foo",
"\x{de}\x{ad}\x{be}\x{ef}" x 4;

open my $fh, '<', \$container or die "Could not open memory handle: $!";

my $reader = Avro::DataFileReader->new( fh => $fh );

my ($data) = $reader->next(1);
is $data, 'foo', 'Can read data from container without a codec';
}

done_testing;

0 comments on commit df144a9

Please sign in to comment.