Would you mind sharing the following with us?
1. The script that generate the parquet file.
2. The parquet file size, hdfs block size.
Thanks,
Alan
On Tue, May 20, 2014 at 6:40 AM, Pengcheng Liu wrote:
Hello Alan
Thanks for the answer. I was generating the parquet files from mapreduce
job using parquet-mr package.
When I set the block size to 1GB the query works fine but with some
warning information which says parquet files doesn't like multiple blocks.
Then I tried to get rid of the warning by increasing the block size so my
big parquet file can live in on block. But my query will fail and so far I
haven't been successfully query the new table. I also followed some of the
suggestion here set the PARQUET_FILE_SIZE parameter to my block size. Which
doesn't work.
I tried all of this on both versions: 1.3.0 and 1,3.1
Thanks
Pengcheng
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.Hello Alan
Thanks for the answer. I was generating the parquet files from mapreduce
job using parquet-mr package.
When I set the block size to 1GB the query works fine but with some
warning information which says parquet files doesn't like multiple blocks.
Then I tried to get rid of the warning by increasing the block size so my
big parquet file can live in on block. But my query will fail and so far I
haven't been successfully query the new table. I also followed some of the
suggestion here set the PARQUET_FILE_SIZE parameter to my block size. Which
doesn't work.
I tried all of this on both versions: 1.3.0 and 1,3.1
Thanks
Pengcheng
On Mon, May 19, 2014 at 8:37 PM, Alan Choi wrote:
Hi Pengcheng,
If you're generating the parquet file from Impala, then Impala should
correctly create one block per file for you. If you data is more than 1GB,
Impala should split it into multiple files.
If you're generating the parquet file in Hive, then you need to set the "dfs.block.size"
to 1GB.
Thanks,
Alan
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send anHi Pengcheng,
If you're generating the parquet file from Impala, then Impala should
correctly create one block per file for you. If you data is more than 1GB,
Impala should split it into multiple files.
If you're generating the parquet file in Hive, then you need to set the "dfs.block.size"
to 1GB.
Thanks,
Alan
On Mon, May 19, 2014 at 7:02 AM, Pengcheng Liu wrote:
Hello Deepak
When using 1 GB block size, the query works but there will be some
warning information in query results about parquet file doesn't like
multiple blocks.
That is why I tried larger block size to get rid of the warning, but so
far unsuccessful.
Thanks
Pengcheng
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, sendHello Deepak
When using 1 GB block size, the query works but there will be some
warning information in query results about parquet file doesn't like
multiple blocks.
That is why I tried larger block size to get rid of the warning, but so
far unsuccessful.
Thanks
Pengcheng
On Thu, May 15, 2014 at 3:13 PM, gvr.deepak wrote:
Use Parquet with gb block size with snappy hope that works
Thanks
Deepak Gattala
Sent via the Samsung Galaxy Note® 3, an AT&T 4G LTE smartphone
-------- Original message --------
From: Pengcheng Liu
Date:05/13/2014 7:20 AM (GMT-08:00)
To: impala-user@cloudera.org
Subject: Re: Impala won't work with large parquet files
I had a impala version vcdh5-1.3.0
But I just noticed my block size is not 4GB it is 3.96GB, Is this the
reason my test is failed, block size has to be a multiplier of 1MB or 1GB?
Thanks
Pengcheng
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, sendUse Parquet with gb block size with snappy hope that works
Thanks
Deepak Gattala
Sent via the Samsung Galaxy Note® 3, an AT&T 4G LTE smartphone
-------- Original message --------
From: Pengcheng Liu
Date:05/13/2014 7:20 AM (GMT-08:00)
To: impala-user@cloudera.org
Subject: Re: Impala won't work with large parquet files
I had a impala version vcdh5-1.3.0
But I just noticed my block size is not 4GB it is 3.96GB, Is this the
reason my test is failed, block size has to be a multiplier of 1MB or 1GB?
Thanks
Pengcheng
On Tue, May 13, 2014 at 10:10 AM, Zesheng Wu wrote:
I've tried the option on impala 1.2.4, it does work.
2014-05-13 22:07 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
Hello Zesheng
--
Best Wishes!
Yours, Zesheng
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, sendI've tried the option on impala 1.2.4, it does work.
2014-05-13 22:07 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
Hello Zesheng
I tried that still not working. This time when I use 4GB block query
failed not returning any values. Before when I use 1GB block size, the
query will complete and give me a result and with some additional error log
information.
Thanks
Pengcheng
send an email to impala-user+unsubscribe@cloudera.org.
failed not returning any values. Before when I use 1GB block size, the
query will complete and give me a result and with some additional error log
information.
Thanks
Pengcheng
On Sat, May 10, 2014 at 10:54 PM, Zesheng Wu wrote:
Hi Pengcheng, you can try this one in impala-shell:
set PARQUET_FILE_SIZE=${block_size_you_want_to_set};
2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
Hello Lenni
--
Best Wishes!
Yours, Zesheng
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,Hi Pengcheng, you can try this one in impala-shell:
set PARQUET_FILE_SIZE=${block_size_you_want_to_set};
2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
Hello Lenni
I already tried invalidate metadata command, this doesn't work.
I am writing the parquet files from a mapreduce job and after the
job finished I online those files through the impala JDBC API.
Then I have to call invalidate metadata to see the table in impala.
I was wondering if there is any configuration settings for impala
or hdfs which control the maximum block size of the file on hdfs.
Thanks
Pengcheng
send an email to impala-user+unsubscribe@cloudera.org.
I am writing the parquet files from a mapreduce job and after the
job finished I online those files through the impala JDBC API.
Then I have to call invalidate metadata to see the table in impala.
I was wondering if there is any configuration settings for impala
or hdfs which control the maximum block size of the file on hdfs.
Thanks
Pengcheng
On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:
Hi Pengcheng,
Since Impala caches the table metadata, including block location
information, you will need to run an "invalidate metadata <table name>"
after you change the block size. Can you try running that command and then
re-running your query?
Let me know how this works out. If it resolves the problem we can
look at how to improve the error message in Impala to make it easier to
diagnose.
Thanks,
Lenni
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,Hi Pengcheng,
Since Impala caches the table metadata, including block location
information, you will need to run an "invalidate metadata <table name>"
after you change the block size. Can you try running that command and then
re-running your query?
Let me know how this works out. If it resolves the problem we can
look at how to improve the error message in Impala to make it easier to
diagnose.
Thanks,
Lenni
On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:
Hello experts
I have been working with impala for a year and now the new
parquet format is really exciting.
I had impala version vcdh5-1.3.0
I had a data set about 40G size in parquet (raw data is 500G) and
with 20 partitions but the partition is not evenly distributed.
When i set the block size 1 GB, some of files are split into
multiple blocks since they are larger than 1 GB.
The impala query will work but it gives me some warning
information saying cannot query parquet files with multiple blocks.
And I saw some folks posted a similar problem here and one of
response is setting the block size larger than the actual size of the file.
So I go ahead tried that I used 10 GB as my hdfs file block size.
Now my query failed with this error message:
ERROR: Error seeking to 3955895608 in file:
hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet
Error(22): Invalid argument
ERROR: Invalid query handle
Is this error due to the large block size I used? Is there any
limits on the maximum block size we can create on hdfs?
Thanks
Pengcheng
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,Hello experts
I have been working with impala for a year and now the new
parquet format is really exciting.
I had impala version vcdh5-1.3.0
I had a data set about 40G size in parquet (raw data is 500G) and
with 20 partitions but the partition is not evenly distributed.
When i set the block size 1 GB, some of files are split into
multiple blocks since they are larger than 1 GB.
The impala query will work but it gives me some warning
information saying cannot query parquet files with multiple blocks.
And I saw some folks posted a similar problem here and one of
response is setting the block size larger than the actual size of the file.
So I go ahead tried that I used 10 GB as my hdfs file block size.
Now my query failed with this error message:
ERROR: Error seeking to 3955895608 in file:
hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet
Error(22): Invalid argument
ERROR: Invalid query handle
Is this error due to the large block size I used? Is there any
limits on the maximum block size we can create on hdfs?
Thanks
Pengcheng
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
send an email to impala-user+unsubscribe@cloudera.org.
send an email to impala-user+unsubscribe@cloudera.org.
--
Best Wishes!
Yours, Zesheng
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
send an email to impala-user+unsubscribe@cloudera.org.
--
Best Wishes!
Yours, Zesheng
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
an email to impala-user+unsubscribe@cloudera.org.
an email to impala-user+unsubscribe@cloudera.org.
email to impala-user+unsubscribe@cloudera.org.