Hi everyone,
I have a question about Parquet file size.
I have a Text table(text_test1) which contains 8 different files.
[hadoop@pdpds03 ~]$ hdfs dfs -ls -h /user/hive/warehouse/text_test1
Found 8 items
-rw-r--r-- 3 hadoop supergroup 7.6g 2013-05-28 10:32
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921741_2108494732_data.0
-rw-r--r-- 3 hadoop supergroup 7.1g 2013-05-28 10:31
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921742_378015946_data.0
-rw-r--r-- 3 hadoop supergroup 5.8g 2013-05-28 10:30
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921743_2108494732_data.0
-rw-r--r-- 3 hadoop supergroup 7.7g 2013-05-28 10:33
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921744_2108494732_data.0
-rw-r--r-- 3 hadoop supergroup 7.6g 2013-05-28 10:32
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921745_378015946_data.0
-rw-r--r-- 3 hadoop supergroup 7.8g 2013-05-28 10:32
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921746_2108494732_data.0
-rw-r--r-- 3 hadoop supergroup 7.5g 2013-05-28 10:32
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921747_378015946_data.0
-rw-r--r-- 3 hadoop supergroup 7.5g 2013-05-28 10:32
/user/hive/warehouse/text_test1/-5911224784759992869--6015766500341921748_378015946_data.0
I create a Parquet table(par_test2) then, insert all rows from text_test1
table into par_test2 table.
impala-shell> insert into table par_test2 select * from text_test1;
After inserting rows, I've checked for new files created in a directory
like below.
[ha[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/par_test2
Found 88 items
-rw-r--r-- 3 hadoop supergroup 239442510 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.0
-rw-r--r-- 3 hadoop supergroup 239451680 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.1
-rw-r--r-- 3 hadoop supergroup 188313187 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.10
-rw-r--r-- 3 hadoop supergroup 239650444 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.2
-rw-r--r-- 3 hadoop supergroup 239787685 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.3
-rw-r--r-- 3 hadoop supergroup 239814629 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.4
-rw-r--r-- 3 hadoop supergroup 239695822 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.5
-rw-r--r-- 3 hadoop supergroup 239343840 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.6
-rw-r--r-- 3 hadoop supergroup 239508301 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.7
-rw-r--r-- 3 hadoop supergroup 239645955 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.8
-rw-r--r-- 3 hadoop supergroup 239638298 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917499_1487715093_data.9
-rw-r--r-- 3 hadoop supergroup 239561208 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.0
-rw-r--r-- 3 hadoop supergroup 239721623 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.1
-rw-r--r-- 3 hadoop supergroup 188188844 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.10
-rw-r--r-- 3 hadoop supergroup 239814219 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.2
-rw-r--r-- 3 hadoop supergroup 239541116 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.3
-rw-r--r-- 3 hadoop supergroup 239713124 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.4
-rw-r--r-- 3 hadoop supergroup 239790985 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.5
-rw-r--r-- 3 hadoop supergroup 239479260 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.6
-rw-r--r-- 3 hadoop supergroup 239526927 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.7
-rw-r--r-- 3 hadoop supergroup 239502499 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.8
-rw-r--r-- 3 hadoop supergroup 239374211 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917500_1901668664_data.9
-rw-r--r-- 3 hadoop supergroup 239630966 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.0
-rw-r--r-- 3 hadoop supergroup 239749955 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.1
-rw-r--r-- 3 hadoop supergroup 203857937 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.10
-rw-r--r-- 3 hadoop supergroup 239464319 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.2
-rw-r--r-- 3 hadoop supergroup 239670094 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.3
-rw-r--r-- 3 hadoop supergroup 239750489 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.4
-rw-r--r-- 3 hadoop supergroup 239562727 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.5
-rw-r--r-- 3 hadoop supergroup 239639054 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.6
-rw-r--r-- 3 hadoop supergroup 239567425 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.7
-rw-r--r-- 3 hadoop supergroup 239429069 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.8
-rw-r--r-- 3 hadoop supergroup 239613065 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917501_1901668664_data.9
-rw-r--r-- 3 hadoop supergroup 239564618 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.0
-rw-r--r-- 3 hadoop supergroup 239570413 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.1
-rw-r--r-- 3 hadoop supergroup 166134666 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.10
-rw-r--r-- 3 hadoop supergroup 239626920 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.2
-rw-r--r-- 3 hadoop supergroup 239575519 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.3
-rw-r--r-- 3 hadoop supergroup 239676167 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.4
-rw-r--r-- 3 hadoop supergroup 239697283 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.5
-rw-r--r-- 3 hadoop supergroup 239537018 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.6
-rw-r--r-- 3 hadoop supergroup 239593281 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.7
-rw-r--r-- 3 hadoop supergroup 239396179 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.8
-rw-r--r-- 3 hadoop supergroup 239562409 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917502_1901668664_data.9
-rw-r--r-- 3 hadoop supergroup 239568601 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.0
-rw-r--r-- 3 hadoop supergroup 239670551 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.1
-rw-r--r-- 3 hadoop supergroup 176017520 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.10
-rw-r--r-- 3 hadoop supergroup 239403592 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.2
-rw-r--r-- 3 hadoop supergroup 239643243 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.3
-rw-r--r-- 3 hadoop supergroup 239657207 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.4
-rw-r--r-- 3 hadoop supergroup 239689305 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.5
-rw-r--r-- 3 hadoop supergroup 239482229 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.6
-rw-r--r-- 3 hadoop supergroup 239449412 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.7
-rw-r--r-- 3 hadoop supergroup 239451684 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.8
-rw-r--r-- 3 hadoop supergroup 239526293 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917503_1260623520_data.9
-rw-r--r-- 3 hadoop supergroup 239547268 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.0
-rw-r--r-- 3 hadoop supergroup 239679417 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.1
-rw-r--r-- 3 hadoop supergroup 203060392 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.10
-rw-r--r-- 3 hadoop supergroup 239521224 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.2
-rw-r--r-- 3 hadoop supergroup 239544524 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.3
-rw-r--r-- 3 hadoop supergroup 239507007 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.4
-rw-r--r-- 3 hadoop supergroup 239514239 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.5
-rw-r--r-- 3 hadoop supergroup 239511207 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.6
-rw-r--r-- 3 hadoop supergroup 239589789 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.7
-rw-r--r-- 3 hadoop supergroup 239543836 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.8
-rw-r--r-- 3 hadoop supergroup 239596665 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917504_1684911546_data.9
-rw-r--r-- 3 hadoop supergroup 239659618 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.0
-rw-r--r-- 3 hadoop supergroup 239689357 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.1
-rw-r--r-- 3 hadoop supergroup 188347473 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.10
-rw-r--r-- 3 hadoop supergroup 239749017 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.2
-rw-r--r-- 3 hadoop supergroup 239681039 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.3
-rw-r--r-- 3 hadoop supergroup 239692309 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.4
-rw-r--r-- 3 hadoop supergroup 239622149 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.5
-rw-r--r-- 3 hadoop supergroup 239684365 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.6
-rw-r--r-- 3 hadoop supergroup 239800527 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.7
-rw-r--r-- 3 hadoop supergroup 239605161 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.8
-rw-r--r-- 3 hadoop supergroup 239660185 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917505_1260623520_data.9
-rw-r--r-- 3 hadoop supergroup 239735438 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.0
-rw-r--r-- 3 hadoop supergroup 239646729 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.1
-rw-r--r-- 3 hadoop supergroup 177441553 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.10
-rw-r--r-- 3 hadoop supergroup 239587853 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.2
-rw-r--r-- 3 hadoop supergroup 239643517 2013-05-28 10:48
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.3
-rw-r--r-- 3 hadoop supergroup 239717605 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.4
-rw-r--r-- 3 hadoop supergroup 239632315 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.5
-rw-r--r-- 3 hadoop supergroup 239612703 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.6
-rw-r--r-- 3 hadoop supergroup 239536937 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.7
-rw-r--r-- 3 hadoop supergroup 239633765 2013-05-28 10:49
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.8
-rw-r--r-- 3 hadoop supergroup 239470525 2013-05-28 10:50
/user/hive/warehouse/par_test2/-7447783798761306732--8767185485963917506_1684911546_data.9
[[email protected] ~]$
I expected that about 62 Parquet files would be created. (Each file of ~1Gb)
As you can see above Parquet file size is too small(~230Mb), against my
expectation.
As far as I know, the optimal Parquet file size for maximizing sequential
I/O is 1Gb.
Is this expected behaviour with creating small Parquet files?
Or, can I configure the max Parquet file size?
Thanks,
Jung-Yup