FAQ
Hello guys

I tried the new release impala cdh5 1.3.1. Still not working.

Here is the list parameters I can set from shell:

ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
         ABORT_ON_ERROR: 0
         ALLOW_UNSUPPORTED_FORMATS: 0
         BATCH_SIZE: 0
         DEBUG_ACTION:
         DEFAULT_ORDER_BY_LIMIT: -1
         DISABLE_CACHED_READS: 0
         DISABLE_CODEGEN: 0
         EXPLAIN_LEVEL: 0
         HBASE_CACHE_BLOCKS: 0
         HBASE_CACHING: 0
         MAX_ERRORS: 0
         MAX_IO_BUFFERS: 0
         MAX_SCAN_RANGE_LENGTH: 0
         MEM_LIMIT: 0
         NUM_NODES: 0
         NUM_SCANNER_THREADS: 0
         PARQUET_COMPRESSION_CODEC: SNAPPY
         PARQUET_FILE_SIZE: 0
         REQUEST_POOL:
         RESERVATION_REQUEST_TIMEOUT: 0
         SUPPORT_START_OVER: false
         SYNC_DDL: 0
         V_CPU_CORES: 0

Other than PARQUET_FILE_SIZE what is the meaning of BATCH_SIZE?

After I set PARQUET_FILE_SIZE, what command I can use to get the value of
the parameter?

Thanks
Pengcheng


On Tue, May 13, 2014 at 4:05 PM, Pengcheng Liu wrote:

I tried with 4GB block size again. It failed. The error info is below when
I try to query "select count(*) from table;"

ERRORS ENCOUNTERED DURING EXECUTION:
Backend 0:Unknown disk id. This will negatively affect performance. Check
your hdfs settings to enable block location metadata.
Backend 1:Unknown disk id. This will negatively affect performance. Check
your hdfs settings to enable block location metadata.
Error seeking to 2299524674 in file:
hdfs://research-mn00.saas.local:8020/user/pcliu/bigtablepar/201402/-r-00060.snappy.parquet
Error(22): Invalid argument
Backend 9:Unknown disk id. This will negatively affect performance. Check
your hdfs settings to enable block location metadata.

ERROR: Error seeking to 2299524674 in file:
hdfs://research-mn00.saas.local:8020/user/pcliu/bigtablepar/201402/-r-00060.snappy.parquet
Error(22): Invalid argument
ERROR: Invalid query handle

Thanks
Pengcheng


On Tue, May 13, 2014 at 10:53 AM, wuzesheng86@gmail.com <
wuzesheng86@gmail.com> wrote:
I donot think so, I've tried 256MB block size, it also works.



Sent from my MiPhone

Pengcheng Liu <zenonlpc@gmail.com>编写:


I had a impala version vcdh5-1.3.0

But I just noticed my block size is not 4GB it is 3.96GB, Is this the
reason my test is failed, block size has to be a multiplier of 1MB or 1GB?

Thanks
Pengcheng

On Tue, May 13, 2014 at 10:10 AM, Zesheng Wu wrote:

I've tried the option on impala 1.2.4, it does work.


2014-05-13 22:07 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:

Hello Zesheng
I tried that still not working. This time when I use 4GB block query
failed not returning any values. Before when I use 1GB block size, the
query will complete and give me a result and with some additional error log
information.

Thanks
Pengcheng

On Sat, May 10, 2014 at 10:54 PM, Zesheng Wu wrote:

Hi Pengcheng, you can try this one in impala-shell:

set PARQUET_FILE_SIZE=${block_size_you_want_to_set};


2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:

Hello Lenni
I already tried invalidate metadata command, this doesn't work.

I am writing the parquet files from a mapreduce job and after the job
finished I online those files through the impala JDBC API.

Then I have to call invalidate metadata to see the table in impala.

I was wondering if there is any configuration settings for impala or
hdfs which control the maximum block size of the file on hdfs.

Thanks
Pengcheng

On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

Hi Pengcheng,
Since Impala caches the table metadata, including block location
information, you will need to run an "invalidate metadata <table name>"
after you change the block size. Can you try running that command and then
re-running your query?

Let me know how this works out. If it resolves the problem we can
look at how to improve the error message in Impala to make it easier to
diagnose.

Thanks,
Lenni

On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

Hello experts

I have been working with impala for a year and now the new parquet
format is really exciting.

I had impala version vcdh5-1.3.0

I had a data set about 40G size in parquet (raw data is 500G) and
with 20 partitions but the partition is not evenly distributed.

When i set the block size 1 GB, some of files are split into
multiple blocks since they are larger than 1 GB.

The impala query will work but it gives me some warning information
saying cannot query parquet files with multiple blocks.

And I saw some folks posted a similar problem here and one of
response is setting the block size larger than the actual size of the file.

So I go ahead tried that I used 10 GB as my hdfs file block size.

Now my query failed with this error message:

ERROR: Error seeking to 3955895608 in file:
hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


Error(22): Invalid argument
ERROR: Invalid query handle

Is this error due to the large block size I used? Is there any
limits on the maximum block size we can create on hdfs?

Thanks
Pengcheng



To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.


--
Best Wishes!

Yours, Zesheng

To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.


--
Best Wishes!

Yours, Zesheng

To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 13, '14 at 2:54p
activeMay 14, '14 at 2:54p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Wuzesheng86: 1 post Pengcheng Liu: 1 post

People

Translate

site design / logo © 2022 Grokbase