Scott Carey writes:
Making the DatumReader/Writers configurable would be a welcome
Ideally, much more of what goes on there could be:
1. configuration driven
2. pre-computed to avoid repeated work during decoding/encoding
We do some of both already. The trick is to do #1 without impacting
performance and #2 requires a bigger overhaul.
Which work in particular? In my pass through the AvroSerialization
implementation so far, it looks like each MR task would create either
one or two Serializers/Deserializers (key and value), each of which in
turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
Or do De/Serializers get created multiple times per task?
If you would like, a contribution including a Clojure related maven
module or two that depends on the Java stuff would be a welcome
addition and allow us to identify compatibility issues as we change
the Java library over time.
That sounds like a great end-goal. Right now at the company I work for
(Damballa) we've just started getting our toes wet with Avro. Avro won
our serialization-format bake-off, but we haven't started actually using
it. I just finished an initial pass at Avro-Clojure integration and we
have released it under an open source license:
I would very much like to eventually get a iteration of it into Avro
proper, but I want to actually start using it and Avro first, so we can
hammer out any interface issues etc.
Anyway, I'll try to work up a patch to add some more configuration hooks
to the AvroSerialization. Should I also create a ticket in the Avro