Grokbase Groups Avro user March 2012
FAQ
Ok, now I have a followup question...

how does one recover from an exception writing an Avro? The incomplete
record is being written, which is crashing the reader.
On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney wrote:

Thanks Scott, looking at the raw data it seems to have been a truncated
record due to UTF problems.

Russell Jurney http://datasyndrome.com

On Mar 23, 2012, at 7:59 PM, Scott Carey wrote:


It appears to be reading a union index and failing in there somehow. If
it did not have any of the pig AvroStorage stuff in there I could tell you
more.

What does avro-tools.jar 's 'tojson' tool do? (java –jar
avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader)
What version of Avro is the java stack trace below?


On 3/23/12 7:01 PM, "Russell Jurney" wrote:

I have a problem record I've written in Avro that crashes anything which
tries to read it :(

Can anyone make sense of these errors?

The exception in Pig/AvroStorage is this:

java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at
org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
... 7 more

When reading the record in Python:

File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
for record in df_reader:
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
line 354, in next
datum = self.datum_reader.read(self.datum_decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 445, in read
return self.read_data(self.writers_schema, self.readers_schema,
decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 488, in read_data
return self.read_union(writers_schema, readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 650, in read_union
raise SchemaResolutionException(fail_msg, writers_schema,
readers_schema)
avro.io.SchemaResolutionException: Can't access branch index 64 for union
with 2 branches

When reading the record in Ruby:

/Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
`read_data': Writer's schema and Reader's schema ["string","null"] do not
match. (Avro::IO::SchemaMatchException)

--
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
com

--
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Search Discussions

  • Scott Carey at Mar 26, 2012 at 3:55 pm
    Avro Java's file writer[1] (the last several versions) rewinds its buffer if
    there is an exception during writing, so if there are writes afterwords the
    file will not be corrupt. However, most tools are not so careful.

    [1] DataFileWriter.append()
    http://svn.apache.org/repos/asf/avro/trunk/lang/java/avro/src/main/java/org/
    apache/avro/file/DataFileWriter.java

    On 3/23/12 8:27 PM, "Russell Jurney" wrote:

    Ok, now I have a followup question...

    how does one recover from an exception writing an Avro? The incomplete record
    is being written, which is crashing the reader.
    On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney wrote:
    Thanks Scott, looking at the raw data it seems to have been a truncated
    record due to UTF problems.

    Russell Jurney http://datasyndrome.com
    On Mar 23, 2012, at 7:59 PM, Scott Carey wrote:


    It appears to be reading a union index and failing in there somehow. If it
    did not have any of the pig AvroStorage stuff in there I could tell you
    more.

    What does avro-tools.jar 's 'tojson' tool do? (java ­jar
    avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader)
    What version of Avro is the java stack trace below?

    On 3/23/12 7:01 PM, "Russell Jurney" wrote:

    I have a problem record I've written in Avro that crashes anything which
    tries to read it :(

    Can anyone make sense of these errors?

    The exception in Pig/AvroStorage is this:
    java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
    at
    org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
    :275)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
    er.nextKeyValue(PigRecordReader.java:187)
    at
    org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
    ask.java:532)
    at
    org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
    org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14
    2)
    at
    org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvr
    oDatumReader.java:67)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
    8)
    at
    org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
    9)
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
    at
    org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(
    PigAvroRecordReader.java:80)
    at
    org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
    :273)
    ... 7 more
    When reading the record in Python:
    File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
    for record in df_reader:
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354,
    in next
    datum = self.datum_reader.read(self.datum_decoder)
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in
    read
    return self.read_data(self.writers_schema, self.readers_schema,
    decoder)
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in
    read_data
    return self.read_record(writers_schema, readers_schema, decoder)
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in
    read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in
    read_data
    return self.read_union(writers_schema, readers_schema, decoder)
    File
    "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
    /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 650, in
    read_union
    raise SchemaResolutionException(fail_msg, writers_schema,
    readers_schema)
    avro.io.SchemaResolutionException: Can't access branch index 64 for union
    with 2 branches
    When reading the record in Ruby:
    /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298
    :in `read_data': Writer's schema and Reader's schema ["string","null"] do
    not match. (Avro::IO::SchemaMatchException)
    --
    Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
    russell.jurney@gmail.com >>> datasyndrome.com <http://datasyndrome.com/>


    --
    Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
    russell.jurney@gmail.com datasyndrome.com
    <http://datasyndrome.com/>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriesavro
postedMar 24, '12 at 3:27a
activeMar 26, '12 at 3:55p
posts2
users2
websiteavro.apache.org
irc#avro

2 users in discussion

Russell Jurney: 1 post Scott Carey: 1 post

People

Translate

site design / logo © 2021 Grokbase