|| at Jan 24, 2011 at 9:42 pm
On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he wrote:
How did you upload the data to the new table?
You can get the data compressed by doing a insert overwrite to the
destination table with setting "hive.exec.compress.output" to true.
On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo wrote:
I am trying to explore some use case that I believe are perfect for
the columnarSerDe, tables with 100+ columns where only one or two are
selected in a particular query.
CREATE TABLE (....)
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile ;
My issue is my data from our source table, with gzip sequence files,
is much smaller then the ColumnarSerDe table and as a result any
performance gains are lost.
Thank you! That was a RTFM question.
I was unclear about 'STORED AS RCFile' since normally you would need
to use ' STORED AS SEQUENCEFILE'
explains this well. RCFILE is a special type of sequence file.
I did get it working. Looks good compression for my table was smaller
then using GZIP BLOCK Sequence file. Query time was slightly better in
limited testing. Cool stuff.