Grokbase Groups Avro user May 2013
FAQ
Hi all:

Is there a reason Avro’s Hadoop serialization classes don’t allow
configuration of the DatumReader and DatumWriter classes?

My use-case is that I’m implementing Clojure DatumReader and -Writer
classes which produce and consume Clojure’s data structures directly.
I’d like to then extend that to Hadoop MapReduce jobs which operate in
terms of Clojure data, with Avro handling all de/serialization directly
to/from that Clojure data.

Am I going around this in a backwards fashion, or would a patch to allow
configuration of the Hadoop serialization DatumReader/Writers be
welcome?

-Marshall

Search Discussions

  • Scott Carey at May 13, 2013 at 10:08 pm
    Making the DatumReader/Writers configurable would be a welcome addition.

    Ideally, much more of what goes on there could be:
      1. configuration driven
      2. pre-computed to avoid repeated work during decoding/encoding

    We do some of both already. The trick is to do #1 without impacting
    performance and #2 requires a bigger overhaul.

    If you would like, a contribution including a Clojure related maven module
    or two that depends on the Java stuff would be a welcome addition and
    allow us to identify compatibility issues as we change the Java library
    over time.

    On 5/8/13 3:33 PM, "Marshall Bockrath-Vandegrift" wrote:

    Hi all:

    Is there a reason Avro¹s Hadoop serialization classes don¹t allow
    configuration of the DatumReader and DatumWriter classes?

    My use-case is that I¹m implementing Clojure DatumReader and -Writer
    classes which produce and consume Clojure¹s data structures directly.
    I¹d like to then extend that to Hadoop MapReduce jobs which operate in
    terms of Clojure data, with Avro handling all de/serialization directly
    to/from that Clojure data.

    Am I going around this in a backwards fashion, or would a patch to allow
    configuration of the Hadoop serialization DatumReader/Writers be
    welcome?

    -Marshall
  • Marshall Bockrath-Vandegrift at May 13, 2013 at 11:23 pm

    Scott Carey writes:

    Making the DatumReader/Writers configurable would be a welcome
    addition.
    Excellent!
    Ideally, much more of what goes on there could be:
    1. configuration driven
    2. pre-computed to avoid repeated work during decoding/encoding

    We do some of both already. The trick is to do #1 without impacting
    performance and #2 requires a bigger overhaul.
    Which work in particular? In my pass through the AvroSerialization
    implementation so far, it looks like each MR task would create either
    one or two Serializers/Deserializers (key and value), each of which in
    turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
    Or do De/Serializers get created multiple times per task?
    If you would like, a contribution including a Clojure related maven
    module or two that depends on the Java stuff would be a welcome
    addition and allow us to identify compatibility issues as we change
    the Java library over time.
    That sounds like a great end-goal. Right now at the company I work for
    (Damballa) we've just started getting our toes wet with Avro. Avro won
    our serialization-format bake-off, but we haven't started actually using
    it. I just finished an initial pass at Avro-Clojure integration and we
    have released it under an open source license:

         https://github.com/damballa/abracad

    I would very much like to eventually get a iteration of it into Avro
    proper, but I want to actually start using it and Avro first, so we can
    hammer out any interface issues etc.

    Anyway, I'll try to work up a patch to add some more configuration hooks
    to the AvroSerialization. Should I also create a ticket in the Avro
    issue tracker?

    -Marshall

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriesavro
postedMay 8, '13 at 10:35p
activeMay 13, '13 at 11:23p
posts3
users2
websiteavro.apache.org
irc#avro

People

Translate

site design / logo © 2021 Grokbase