FAQ
Compression is a file level property for Parquet. Therefore, it's not
reported in "describe table formatted".
On Mon, Oct 20, 2014 at 2:14 PM, Victor Bittorf wrote:

Hi Venkat,

Parquet will use compression by default. It is possible that both tables
are compressed using snappy. Try setting PARQUET_COMPRESSION_CODEC to
NONE if you want disable compression.

I'm not sure why Hive reports 'NO' for compression, I'll take a look at
this.

Victor
On Mon, Oct 20, 2014 at 1:57 PM, Venkat Ankam wrote:

How do I know whether or not my Parquet Table is compressed?

I am using below code to create the Impala table with Parquet file format
and Snappy compression.

use USATPSA;
set PARQUET_COMPRESSION_CODEC=snappy;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala;

hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala


/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/.impala_insert_staging
106.9 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abc_54183762_data.0
109.4 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abd_348159391_data.0
106.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abe_1748975375_data.0
111.3 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abf_2115313605_data.0
109.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac0_1698273434_data.0
108.5 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac1_1933141253_data.0


I don't see any difference removing the snappy compression format (in
terms of size of the data).

use USATPSA;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala_new LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala_new SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala_new;

hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new

/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/.impala_insert_staging
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618c_672655044_data.0
111.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618d_190092874_data.0
110.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618e_685036987_data.0
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618f_620102463_data.0
102.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06190_1532837242_data.0
117.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06191_2117033070_data.0


When I issue the 'describe formatted table_name' command in Hive, it
always shows compressed column as 'NO'.

I am using impala version 1.3.1-cdh5. Any thoughts on this?

Regards,
Venkat

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 6 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedOct 20, '14 at 9:05p
activeOct 27, '14 at 2:16p
posts6
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase