reported in "describe table formatted".
On Mon, Oct 20, 2014 at 2:14 PM, Victor Bittorf wrote:
Hi Venkat,
Parquet will use compression by default. It is possible that both tables
are compressed using snappy. Try setting PARQUET_COMPRESSION_CODEC to
NONE if you want disable compression.
I'm not sure why Hive reports 'NO' for compression, I'll take a look at
this.
Victor
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.Hi Venkat,
Parquet will use compression by default. It is possible that both tables
are compressed using snappy. Try setting PARQUET_COMPRESSION_CODEC to
NONE if you want disable compression.
I'm not sure why Hive reports 'NO' for compression, I'll take a look at
this.
Victor
On Mon, Oct 20, 2014 at 1:57 PM, Venkat Ankam wrote:
How do I know whether or not my Parquet Table is compressed?
I am using below code to create the Impala table with Parquet file format
and Snappy compression.
use USATPSA;
set PARQUET_COMPRESSION_CODEC=snappy;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala;
hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/.impala_insert_staging
106.9 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abc_54183762_data.0
109.4 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abd_348159391_data.0
106.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abe_1748975375_data.0
111.3 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abf_2115313605_data.0
109.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac0_1698273434_data.0
108.5 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac1_1933141253_data.0
I don't see any difference removing the snappy compression format (in
terms of size of the data).
use USATPSA;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala_new LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala_new SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala_new;
hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/.impala_insert_staging
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618c_672655044_data.0
111.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618d_190092874_data.0
110.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618e_685036987_data.0
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618f_620102463_data.0
102.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06190_1532837242_data.0
117.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06191_2117033070_data.0
When I issue the 'describe formatted table_name' command in Hive, it
always shows compressed column as 'NO'.
I am using impala version 1.3.1-cdh5. Any thoughts on this?
Regards,
Venkat
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send anHow do I know whether or not my Parquet Table is compressed?
I am using below code to create the Impala table with Parquet file format
and Snappy compression.
use USATPSA;
set PARQUET_COMPRESSION_CODEC=snappy;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala;
hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/.impala_insert_staging
106.9 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abc_54183762_data.0
109.4 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abd_348159391_data.0
106.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abe_1748975375_data.0
111.3 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00abf_2115313605_data.0
109.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac0_1698273434_data.0
108.5 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala/784bf3fd41cff098-7b26f76785f00ac1_1933141253_data.0
I don't see any difference removing the snappy compression format (in
terms of size of the data).
use USATPSA;
CREATE TABLE USATPSA_SALES_CMP_PMC_Impala_new LIKE
USATPSA.USATPSA_SALES_CMP_PMC STORED AS PARQUET;
insert into USATPSA_SALES_CMP_PMC_Impala_new SELECT * FROM
USATPSA.USATPSA_SALES_CMP_PMC;
compute stats USATPSA_SALES_CMP_PMC_Impala_new;
hadoop fs -du -h
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/.impala_insert_staging
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618c_672655044_data.0
111.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618d_190092874_data.0
110.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618e_685036987_data.0
106.1 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f0618f_620102463_data.0
102.0 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06190_1532837242_data.0
117.6 M
/user/hive/warehouse/usatpsa.db/usatpsa_sales_cmp_pmc_impala_new/da4068743f51bb09-ea98fa85d5f06191_2117033070_data.0
When I issue the 'describe formatted table_name' command in Hive, it
always shows compressed column as 'NO'.
I am using impala version 1.3.1-cdh5. Any thoughts on this?
Regards,
Venkat
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
email to impala-user+unsubscribe@cloudera.org.