Grokbase Groups Avro user April 2012
FAQ
Hey everyone,

we're facing a problem while reading AVRO files written with FLUME using
the AVRO Java API 1.5.4 into a HADOOP cluster. The Avro Data Store
complains about missing sync marker. Investigating the problem shows us,
that's perfectly right. The sync marker is missing. Thus we have a block
of the double size.

Our software packets:
rpm -qa | grep hadoop
hadoop-0.20-namenode-0.20.2+923.142-1
hadoop-0.20-0.20.2+923.142-1
hadoop-0.20-native-0.20.2+923.142-1
hadoop-hive-0.7.1+42.27-2
hadoop-pig-0.8.1+28.18-1

This is pretty much all a basic cloudera
CDH3 Update 2 Packaging installation with a patched PIG version which is
CDH3 Update 3.

Did anyone had a similar issue? Does this ring a bell?

Thanks

Markus

Search Discussions

  • Scott Carey at Apr 3, 2012 at 4:23 pm
    I have not seen this issue before with 100 TB of Avro files, but am not
    using Flume to write them. We have moved on to Avro 1.6.x but were on the
    1.5.x line for quite some time. Perhaps while writing there was an
    exception of some sort that was not handled correctly in Avro or Flume.

    Looking at the DataFileWriter code, I can see how a file could get
    truncated without a sync marker if the writing process crashes, but not
    how it could successfully write two blocks in a row without a sync between.

    You should be able to modify the file reader to recover and re-write the
    data if it is only a missing sync marker, or skip over the block if it is
    corrupt.
    On 4/3/12 1:28 AM, "Markus Resch" wrote:

    Hey everyone,

    we're facing a problem while reading AVRO files written with FLUME using
    the AVRO Java API 1.5.4 into a HADOOP cluster. The Avro Data Store
    complains about missing sync marker. Investigating the problem shows us,
    that's perfectly right. The sync marker is missing. Thus we have a block
    of the double size.

    Our software packets:
    rpm -qa | grep hadoop
    hadoop-0.20-namenode-0.20.2+923.142-1
    hadoop-0.20-0.20.2+923.142-1
    hadoop-0.20-native-0.20.2+923.142-1
    hadoop-hive-0.7.1+42.27-2
    hadoop-pig-0.8.1+28.18-1

    This is pretty much all a basic cloudera
    CDH3 Update 2 Packaging installation with a patched PIG version which is
    CDH3 Update 3.

    Did anyone had a similar issue? Does this ring a bell?

    Thanks

    Markus

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriesavro
postedApr 3, '12 at 8:29a
activeApr 3, '12 at 4:23p
posts2
users2
websiteavro.apache.org
irc#avro

2 users in discussion

Markus Resch: 1 post Scott Carey: 1 post

People

Translate

site design / logo © 2021 Grokbase