A sync marker delimits each block in the avro file. If you want to start
reading data from the middle of a 100GB file, DataFileReader will seek to
the middle and find the next sync marker. Each block can be individually
compressed, and by default when writing a file the writer will not
compress the block and flush to disk until a block as gotten as large as
the sync interval in bytes. Alternatively, you can manually sync().

If you have a 1000000 byte sync interval, you may not see any data reach
disk until that many bytes have been written (or sync() is called

Your problem is likely that the first block in the file has not been
flushed to disk yet, and therefore the file is corrupt and missing a
trailing sync marker.
On 1/3/13 12:36 PM, "Terry Healy" wrote:


I'm upgrading a logging program to append GenericRecords to a .avro file
instead of text (.tsv). I have a working schema that is used to convert
existing .tsv of the same format into .avro and that works fine.

When I run a test writing 30,000 bogus records, it runs but when I try
to use "avro-tools-1.7.3.jar tojson" on the output file, it reports:

"AvroRuntimeException: java.io.IOException: Invalid sync!"

The file is still open at this point since the logging program is
running. Is this expected behavior because it is still open? (getmeta
and getschema work fine).

I'm not sure if it has any bearing, since I never really understood the
function of the the AVRO sync interval; in this and the working programs
it is set to 1000000.

Any ideas appreciated.


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupuser @
postedJan 3, '13 at 8:37p
activeJan 9, '13 at 4:22p

2 users in discussion

Terry Healy: 2 posts Scott Carey: 1 post



site design / logo © 2021 Grokbase