Grokbase Groups Avro user April 2013
FAQ
I'm working on migrating an internally developed serialization format to
Avro. In the process, there have been many cases where I made a mistake
migrating the schema (I've automated it), and then avro cries that a record
I'm trying to serialize doesn't match the schema. Generally, the error it
gives doesn't help find the actual issue, and for a big enough record
finding the issue can be tedious.

I've thought about making a tool which, given the schema and the record
would tell you what the issue is, but I'm wondering if this already exists?
I suppose the error message could also include this information...

Thanks
Jon

Search Discussions

  • Jeremy Kahn at Apr 4, 2013 at 2:08 pm
    I think this would be tremendously useful.

    I am working - in my copious spare time - on improving schema validation in
    the Python library, and I think I can see how to improve things there by
    extending the data/schema parallel recursion to keep track of position in
    each.

    Jeremy
    On Apr 4, 2013 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format to
    Avro. In the process, there have been many cases where I made a mistake
    migrating the schema (I've automated it), and then avro cries that a record
    I'm trying to serialize doesn't match the schema. Generally, the error it
    gives doesn't help find the actual issue, and for a big enough record
    finding the issue can be tedious.

    I've thought about making a tool which, given the schema and the record
    would tell you what the issue is, but I'm wondering if this already exists?
    I suppose the error message could also include this information...

    Thanks
    Jon
  • Jonathan Coveney at Apr 4, 2013 at 4:02 pm
    I'm also running into issues where the Python and Java implementations are
    different (it seems like Java is less permissive than Python). Are these
    cases bugs? It can be frustrating for something to work in one but not the
    other.

    Having the info from the parallel recursion would allow us to have much
    better error messages. That would be great...


    2013/4/4 Jeremy Kahn <trochee@trochee.net>
    I think this would be tremendously useful.

    I am working - in my copious spare time - on improving schema validation
    in the Python library, and I think I can see how to improve things there by
    extending the data/schema parallel recursion to keep track of position in
    each.

    Jeremy
    On Apr 4, 2013 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format to
    Avro. In the process, there have been many cases where I made a mistake
    migrating the schema (I've automated it), and then avro cries that a record
    I'm trying to serialize doesn't match the schema. Generally, the error it
    gives doesn't help find the actual issue, and for a big enough record
    finding the issue can be tedious.

    I've thought about making a tool which, given the schema and the record
    would tell you what the issue is, but I'm wondering if this already exists?
    I suppose the error message could also include this information...

    Thanks
    Jon
  • Philip Zeyliger at Apr 4, 2013 at 4:14 pm
    Hi Jonathan,

    The python implementation is definitely less mature than the Java one. As
    you run into things, please do file bugs (and, better, yet, fixes!).

    At one point someone on this list was working on an alternative python
    implementation that generated python objects to represent the Avro records.
    I think that's a wise idea (and is what Thrift does). Not sure where
    that's gone.

    -- Philip

    On Thu, Apr 4, 2013 at 9:02 AM, Jonathan Coveney wrote:

    I'm also running into issues where the Python and Java implementations are
    different (it seems like Java is less permissive than Python). Are these
    cases bugs? It can be frustrating for something to work in one but not the
    other.

    Having the info from the parallel recursion would allow us to have much
    better error messages. That would be great...


    2013/4/4 Jeremy Kahn <trochee@trochee.net>
    I think this would be tremendously useful.

    I am working - in my copious spare time - on improving schema validation
    in the Python library, and I think I can see how to improve things there by
    extending the data/schema parallel recursion to keep track of position in
    each.

    Jeremy
    On Apr 4, 2013 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format to
    Avro. In the process, there have been many cases where I made a mistake
    migrating the schema (I've automated it), and then avro cries that a record
    I'm trying to serialize doesn't match the schema. Generally, the error it
    gives doesn't help find the actual issue, and for a big enough record
    finding the issue can be tedious.

    I've thought about making a tool which, given the schema and the record
    would tell you what the issue is, but I'm wondering if this already exists?
    I suppose the error message could also include this information...

    Thanks
    Jon
  • Jonathan Coveney at Apr 4, 2013 at 4:16 pm
    Ok, cool. I've been using the python implementation pretty heavily and
    didn't realize that it was less mature. Will definitely work on maturing it
    where possible :)


    2013/4/4 Philip Zeyliger <philip@cloudera.com>
    Hi Jonathan,

    The python implementation is definitely less mature than the Java one. As
    you run into things, please do file bugs (and, better, yet, fixes!).

    At one point someone on this list was working on an alternative python
    implementation that generated python objects to represent the Avro records.
    I think that's a wise idea (and is what Thrift does). Not sure where
    that's gone.

    -- Philip

    On Thu, Apr 4, 2013 at 9:02 AM, Jonathan Coveney wrote:

    I'm also running into issues where the Python and Java implementations
    are different (it seems like Java is less permissive than Python). Are
    these cases bugs? It can be frustrating for something to work in one but
    not the other.

    Having the info from the parallel recursion would allow us to have much
    better error messages. That would be great...


    2013/4/4 Jeremy Kahn <trochee@trochee.net>
    I think this would be tremendously useful.

    I am working - in my copious spare time - on improving schema validation
    in the Python library, and I think I can see how to improve things there by
    extending the data/schema parallel recursion to keep track of position in
    each.

    Jeremy
    On Apr 4, 2013 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format
    to Avro. In the process, there have been many cases where I made a mistake
    migrating the schema (I've automated it), and then avro cries that a record
    I'm trying to serialize doesn't match the schema. Generally, the error it
    gives doesn't help find the actual issue, and for a big enough record
    finding the issue can be tedious.

    I've thought about making a tool which, given the schema and the record
    would tell you what the issue is, but I'm wondering if this already exists?
    I suppose the error message could also include this information...

    Thanks
    Jon
  • Jeremy Kahn at Apr 4, 2013 at 4:27 pm
    AVRO-1284 [0] has the first patch in improving the schema validation so
    that schema polymorphism handles validation. Philip and Jonathan, a review
    would be nice. Upvotes would be nice too, but constructive feedback is
    probably even better.

    I think there's a further generalization that allows schema objects
    (schemata?) to recursively callback to a a generic data-and-schema
    parallel-walker ("validate" would be one of those, but so could be
    "default-filler"[1]).

    That might be a bit tricky to build without breaking backwards
    compatibility to older Python uses, but ... maybe not. Older methods
    should be implementable in these terms.

    I'll file a separate issue to provide this refactoring into a base walker
    class.

    --Jeremy

    [0] https://issues.apache.org/jira/browse/AVRO-1284
    [1] https://issues.apache.org/jira/browse/AVRO-1265

    On Thu, Apr 4, 2013 at 9:16 AM, Jonathan Coveney wrote:

    Ok, cool. I've been using the python implementation pretty heavily and
    didn't realize that it was less mature. Will definitely work on maturing it
    where possible :)


    2013/4/4 Philip Zeyliger <philip@cloudera.com>
    Hi Jonathan,

    The python implementation is definitely less mature than the Java one.
    As you run into things, please do file bugs (and, better, yet, fixes!).

    At one point someone on this list was working on an alternative python
    implementation that generated python objects to represent the Avro records.
    I think that's a wise idea (and is what Thrift does). Not sure where
    that's gone.

    -- Philip

    On Thu, Apr 4, 2013 at 9:02 AM, Jonathan Coveney wrote:

    I'm also running into issues where the Python and Java implementations
    are different (it seems like Java is less permissive than Python). Are
    these cases bugs? It can be frustrating for something to work in one but
    not the other.

    Having the info from the parallel recursion would allow us to have much
    better error messages. That would be great...


    2013/4/4 Jeremy Kahn <trochee@trochee.net>
    I think this would be tremendously useful.

    I am working - in my copious spare time - on improving schema
    validation in the Python library, and I think I can see how to improve
    things there by extending the data/schema parallel recursion to keep track
    of position in each.

    Jeremy
    On Apr 4, 2013 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format
    to Avro. In the process, there have been many cases where I made a mistake
    migrating the schema (I've automated it), and then avro cries that a record
    I'm trying to serialize doesn't match the schema. Generally, the error it
    gives doesn't help find the actual issue, and for a big enough record
    finding the issue can be tedious.

    I've thought about making a tool which, given the schema and the
    record would tell you what the issue is, but I'm wondering if this already
    exists? I suppose the error message could also include this information...

    Thanks
    Jon
  • Scott Carey at Apr 6, 2013 at 8:42 pm
    Try GenericRecordBuilder.

    For the Specific API, there are builders that will not let you construct an
    object that can not be serialized.
    The Generic API should have the same thing, but I am not 100% sure the
    builder there covers it.

    I have always avoided using any API that allows me to create an object that
    is unsafe to serialize since finding out at serialization time is a huge
    pain (and in my case, is often on a separate thread from the place it was
    created).
    On 4/4/13 6:58 AM, "Jonathan Coveney" wrote:

    I'm working on migrating an internally developed serialization format to Avro.
    In the process, there have been many cases where I made a mistake migrating
    the schema (I've automated it), and then avro cries that a record I'm trying
    to serialize doesn't match the schema. Generally, the error it gives doesn't
    help find the actual issue, and for a big enough record finding the issue can
    be tedious.

    I've thought about making a tool which, given the schema and the record would
    tell you what the issue is, but I'm wondering if this already exists? I
    suppose the error message could also include this information...

    Thanks
    Jon

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriesavro
postedApr 4, '13 at 1:58p
activeApr 6, '13 at 8:42p
posts7
users4
websiteavro.apache.org
irc#avro

People

Translate

site design / logo © 2021 Grokbase