Grokbase Groups Avro user April 2013
FAQ
I have the following schema: {"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":null}]}

I am trying to deserialize the following against this schema using Java and
the GenericDatumReader: {}

I get the following error:
Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

I'm not seeing any immediate issues online around this...is this expected?
I'm reading it in as such:

Schema avroSchema = new Schema.Parser().parse(schemaLine);
GenericDatumReader<Object> reader = new
GenericDatumReader<Object>(avroSchema);
Object datum = reader.read(null,
DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

I'm going to see what's up and why it isn't picking up the default, but
imagined you guys might know what's up?

Thanks,
Jon

Search Discussions

  • Jonathan Coveney at Apr 9, 2013 at 9:32 am
    Please note: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
    doesn't work


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    I have the following schema: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":null}]}

    I am trying to deserialize the following against this schema using Java
    and the GenericDatumReader: {}

    I get the following error:
    Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
    END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at
    org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at
    org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

    I'm not seeing any immediate issues online around this...is this expected?
    I'm reading it in as such:

    Schema avroSchema = new Schema.Parser().parse(schemaLine);
    GenericDatumReader<Object> reader = new
    GenericDatumReader<Object>(avroSchema);
    Object datum = reader.read(null,
    DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

    I'm going to see what's up and why it isn't picking up the default, but
    imagined you guys might know what's up?

    Thanks,
    Jon
  • Jonathan Coveney at Apr 9, 2013 at 9:44 am
    Stepping through the code, it looks like the code only uses defaults for
    writing, not for reading. IE at read time it assumes that the defaults were
    already filled in. It seems like if the reader evolved the schema to
    include new fields, it would be desirable for the defaults to get filled in
    if not present? But stepping through, on reading the defaults are
    completely ignored.


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    Please note: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
    doesn't work


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    I have the following schema: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":null}]}

    I am trying to deserialize the following against this schema using Java
    and the GenericDatumReader: {}

    I get the following error:
    Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
    END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at
    org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at
    org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

    I'm not seeing any immediate issues online around this...is this
    expected? I'm reading it in as such:

    Schema avroSchema = new Schema.Parser().parse(schemaLine);
    GenericDatumReader<Object> reader = new
    GenericDatumReader<Object>(avroSchema);
    Object datum = reader.read(null,
    DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

    I'm going to see what's up and why it isn't picking up the default, but
    imagined you guys might know what's up?

    Thanks,
    Jon
  • Martin Kleppmann at Apr 10, 2013 at 3:43 am
    With Avro, it is generally assumed that your reader is working with
    the exact same schema as the data was written with. If you want to
    change your schema, e.g. add a field to a record, you still need the
    exact same schema as was used for writing (the "writer's schema"), but
    you can also give the decoder a second schema (the "reader's schema"),
    and Avro will map data from the writer's schema into the reader's
    schema for you ("schema evolution").

    This requirement of having the exact same schema as the writer makes
    more sense with Avro's binary encoding, because it allows Avro to omit
    the field names, which makes the encoding very compact. The
    requirement makes less sense if you're using the JSON encoding, where
    field names are inevitably part of the JSON. I think this behaviour is
    expected, but I agree that it's a bit surprising, so perhaps it's
    worth discussing whether we should change it.

    To answer your question, your input data {} looks like it was written
    with a writer schema of {"name":"hey", "type":"record", "fields":[]}
    so try using that as your writer schema. Then if you specify
    {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
    your reader schema, you should find that the resolving decoder fills
    in the field "a" with the default null.

    Best,
    Martin
    On 9 April 2013 02:44, Jonathan Coveney wrote:
    Stepping through the code, it looks like the code only uses defaults for
    writing, not for reading. IE at read time it assumes that the defaults were
    already filled in. It seems like if the reader evolved the schema to include
    new fields, it would be desirable for the defaults to get filled in if not
    present? But stepping through, on reading the defaults are completely
    ignored.


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    Please note: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
    doesn't work


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    I have the following schema: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":null}]}

    I am trying to deserialize the following against this schema using Java
    and the GenericDatumReader: {}

    I get the following error:
    Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
    END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at
    org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at
    org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

    I'm not seeing any immediate issues online around this...is this
    expected? I'm reading it in as such:

    Schema avroSchema = new Schema.Parser().parse(schemaLine);
    GenericDatumReader<Object> reader = new
    GenericDatumReader<Object>(avroSchema);
    Object datum = reader.read(null,
    DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

    I'm going to see what's up and why it isn't picking up the default, but
    imagined you guys might know what's up?

    Thanks,
    Jon
  • Scott Carey at Apr 11, 2013 at 4:21 am
    Minor addition, the default value should be

    null

    not

    "null"

    -- the latter is a string, the former is null.

    http://avro.apache.org/docs/current/spec.html#schema_record

    On 4/9/13 8:42 PM, "Martin Kleppmann" wrote:

    With Avro, it is generally assumed that your reader is working with
    the exact same schema as the data was written with. If you want to
    change your schema, e.g. add a field to a record, you still need the
    exact same schema as was used for writing (the "writer's schema"), but
    you can also give the decoder a second schema (the "reader's schema"),
    and Avro will map data from the writer's schema into the reader's
    schema for you ("schema evolution").

    This requirement of having the exact same schema as the writer makes
    more sense with Avro's binary encoding, because it allows Avro to omit
    the field names, which makes the encoding very compact. The
    requirement makes less sense if you're using the JSON encoding, where
    field names are inevitably part of the JSON. I think this behaviour is
    expected, but I agree that it's a bit surprising, so perhaps it's
    worth discussing whether we should change it.

    To answer your question, your input data {} looks like it was written
    with a writer schema of {"name":"hey", "type":"record", "fields":[]}
    so try using that as your writer schema. Then if you specify
    {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
    your reader schema, you should find that the resolving decoder fills
    in the field "a" with the default null.

    Best,
    Martin
    On 9 April 2013 02:44, Jonathan Coveney wrote:
    Stepping through the code, it looks like the code only uses defaults for
    writing, not for reading. IE at read time it assumes that the defaults
    were
    already filled in. It seems like if the reader evolved the schema to
    include
    new fields, it would be desirable for the defaults to get filled in if
    not
    present? But stepping through, on reading the defaults are completely
    ignored.


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    Please note: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
    doesn't work


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    I have the following schema: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":null}]}

    I am trying to deserialize the following against this schema using
    Java
    and the GenericDatumReader: {}

    I get the following error:
    Caused by: org.apache.avro.AvroTypeException: Expected start-union.
    Got
    END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at

    org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at

    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
    )
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :152)
    at

    org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
    r.java:177)
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :148)
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

    I'm not seeing any immediate issues online around this...is this
    expected? I'm reading it in as such:

    Schema avroSchema = new Schema.Parser().parse(schemaLine);
    GenericDatumReader<Object> reader = new
    GenericDatumReader<Object>(avroSchema);
    Object datum = reader.read(null,
    DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

    I'm going to see what's up and why it isn't picking up the default,
    but
    imagined you guys might know what's up?

    Thanks,
    Jon
  • Jonathan Coveney at Apr 11, 2013 at 10:22 pm
    Thank you both. Makes sense


    2013/4/11 Scott Carey <scottcarey@apache.org>
    Minor addition, the default value should be

    null

    not

    "null"

    -- the latter is a string, the former is null.

    http://avro.apache.org/docs/current/spec.html#schema_record

    On 4/9/13 8:42 PM, "Martin Kleppmann" wrote:

    With Avro, it is generally assumed that your reader is working with
    the exact same schema as the data was written with. If you want to
    change your schema, e.g. add a field to a record, you still need the
    exact same schema as was used for writing (the "writer's schema"), but
    you can also give the decoder a second schema (the "reader's schema"),
    and Avro will map data from the writer's schema into the reader's
    schema for you ("schema evolution").

    This requirement of having the exact same schema as the writer makes
    more sense with Avro's binary encoding, because it allows Avro to omit
    the field names, which makes the encoding very compact. The
    requirement makes less sense if you're using the JSON encoding, where
    field names are inevitably part of the JSON. I think this behaviour is
    expected, but I agree that it's a bit surprising, so perhaps it's
    worth discussing whether we should change it.

    To answer your question, your input data {} looks like it was written
    with a writer schema of {"name":"hey", "type":"record", "fields":[]}
    so try using that as your writer schema. Then if you specify
    {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
    your reader schema, you should find that the resolving decoder fills
    in the field "a" with the default null.

    Best,
    Martin
    On 9 April 2013 02:44, Jonathan Coveney wrote:
    Stepping through the code, it looks like the code only uses defaults for
    writing, not for reading. IE at read time it assumes that the defaults
    were
    already filled in. It seems like if the reader evolved the schema to
    include
    new fields, it would be desirable for the defaults to get filled in if
    not
    present? But stepping through, on reading the defaults are completely
    ignored.


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    Please note: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
    doesn't work


    2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
    I have the following schema: {"name":"hey", "type":"record",
    "fields":[{"name":"a","type":["null","string"],"default":null}]}

    I am trying to deserialize the following against this schema using
    Java
    and the GenericDatumReader: {}

    I get the following error:
    Caused by: org.apache.avro.AvroTypeException: Expected start-union.
    Got
    END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at

    org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at

    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
    )
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :152)
    at

    org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
    r.java:177)
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :148)
    at

    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
    :139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

    I'm not seeing any immediate issues online around this...is this
    expected? I'm reading it in as such:

    Schema avroSchema = new Schema.Parser().parse(schemaLine);
    GenericDatumReader<Object> reader = new
    GenericDatumReader<Object>(avroSchema);
    Object datum = reader.read(null,
    DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

    I'm going to see what's up and why it isn't picking up the default,
    but
    imagined you guys might know what's up?

    Thanks,
    Jon

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriesavro
postedApr 9, '13 at 9:07a
activeApr 11, '13 at 10:22p
posts6
users3
websiteavro.apache.org
irc#avro

People

Translate

site design / logo © 2021 Grokbase