Grokbase Groups Hive user July 2011
Knowing that sequencefiles can store data (especially numeric data) much
more compact that text, i started converting our hive database from lzo
compressed text format to lzo compressed sequencdfiles.

My first observation was that the files were not smaller, which surprised me
since we have mostly numerical data which has a more compact binary

So then i issued some "describe extended" queries to poke around in the
sequencefile format used by hive. And it seems that 1) the keys are not
used, and 2) all the values are simply stored as a Text Writable? Is this
simply a copy of the textual representation which was used in the text
files? That would explain why the data did not get any smaller. But it also
would defeat all the benefits of sequencefiles, no?

Thanks Koert

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 25, '11 at 7:54p
activeJul 25, '11 at 7:54p

1 user in discussion

Koert Kuipers: 1 post



site design / logo © 2022 Grokbase