FAQ
I donot think so, I've tried 256MB block size, it also works.



Sent from my MiPhone

Pengcheng Liu <zenonlpc@gmail.com>编写:
I had a impala version vcdh5-1.3.0


But I just noticed my block size is not 4GB it is 3.96GB, Is this the reason my test is failed, block size has to be a multiplier of 1MB or 1GB?


Thanks

Pengcheng



On Tue, May 13, 2014 at 10:10 AM, Zesheng Wu wrote:

I've tried the option on impala 1.2.4, it does work.



2014-05-13 22:07 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:


Hello Zesheng


I tried that still not working. This time when I use 4GB block query failed not returning any values. Before when I use 1GB block size, the query will complete and give me a result and with some additional error log information.


Thanks

Pengcheng



On Sat, May 10, 2014 at 10:54 PM, Zesheng Wu wrote:

Hi Pengcheng, you can try this one in impala-shell:

set PARQUET_FILE_SIZE=${block_size_you_want_to_set};



2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:


Hello Lenni


I already tried invalidate metadata command, this doesn't work.


I am writing the parquet files from a mapreduce job and after the job finished I online those files through the impala JDBC API.


Then I have to call invalidate metadata to see the table in impala.


I was wondering if there is any configuration settings for impala or hdfs which control the maximum block size of the file on hdfs.


Thanks

Pengcheng



On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

Hi Pengcheng,

Since Impala caches the table metadata, including block location information, you will need to run an "invalidate metadata <table name>" after you change the block size. Can you try running that command and then re-running your query?


Let me know how this works out. If it resolves the problem we can look at how to improve the error message in Impala to make it easier to diagnose.


Thanks,

Lenni



On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

Hello experts


I have been working with impala for a year and now the new parquet format is really exciting.


I had impala version vcdh5-1.3.0


I had a data set about 40G size in parquet (raw data is 500G) and with 20 partitions but the partition is not evenly distributed.


When i set the block size 1 GB, some of files are split into multiple blocks since they are larger than 1 GB.


The impala query will work but it gives me some warning information saying cannot query parquet files with multiple blocks.


And I saw some folks posted a similar problem here and one of response is setting the block size larger than the actual size of the file.


So I go ahead tried that I used 10 GB as my hdfs file block size.


Now my query failed with this error message:


ERROR: Error seeking to 3955895608 in file: hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet

Error(22): Invalid argument

ERROR: Invalid query handle


Is this error due to the large block size I used? Is there any limits on the maximum block size we can create on hdfs?


Thanks

Pengcheng




To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.




--
Best Wishes!

Yours, Zesheng

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.




--
Best Wishes!

Yours, Zesheng

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 13, '14 at 2:54p
activeMay 14, '14 at 2:54p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Wuzesheng86: 1 post Pengcheng Liu: 1 post

People

Translate

site design / logo © 2022 Grokbase