FAQ
Hello experts

I have been working with impala for a year and now the new parquet format
is really exciting.

I had impala version vcdh5-1.3.0

I had a data set about 40G size in parquet (raw data is 500G) and with 20
partitions but the partition is not evenly distributed.

When i set the block size 1 GB, some of files are split into multiple
blocks since they are larger than 1 GB.

The impala query will work but it gives me some warning information saying
cannot query parquet files with multiple blocks.

And I saw some folks posted a similar problem here and one of response is
setting the block size larger than the actual size of the file.

So I go ahead tried that I used 10 GB as my hdfs file block size.

Now my query failed with this error message:

ERROR: Error seeking to 3955895608 in file:
hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


Error(22): Invalid argument
ERROR: Invalid query handle

Is this error due to the large block size I used? Is there any limits on
the maximum block size we can create on hdfs?

Thanks
Pengcheng



To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Lenni Kuff at May 8, 2014 at 7:43 pm
    Hi Pengcheng,
    Since Impala caches the table metadata, including block location
    information, you will need to run an "invalidate metadata <table name>"
    after you change the block size. Can you try running that command and then
    re-running your query?

    Let me know how this works out. If it resolves the problem we can look at
    how to improve the error message in Impala to make it easier to diagnose.

    Thanks,
    Lenni

    On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet format
    is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with 20
    partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information saying
    cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response is
    setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits on
    the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Pengcheng Liu at May 9, 2014 at 8:22 pm
    Hello Lenni

    I already tried invalidate metadata command, this doesn't work.

    I am writing the parquet files from a mapreduce job and after the job
    finished I online those files through the impala JDBC API.

    Then I have to call invalidate metadata to see the table in impala.

    I was wondering if there is any configuration settings for impala or hdfs
    which control the maximum block size of the file on hdfs.

    Thanks
    Pengcheng

    On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

    Hi Pengcheng,
    Since Impala caches the table metadata, including block location
    information, you will need to run an "invalidate metadata <table name>"
    after you change the block size. Can you try running that command and then
    re-running your query?

    Let me know how this works out. If it resolves the problem we can look at
    how to improve the error message in Impala to make it easier to diagnose.

    Thanks,
    Lenni

    On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet format
    is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with 20
    partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information
    saying cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response is
    setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits on
    the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Zesheng Wu at May 11, 2014 at 2:54 am
    Hi Pengcheng, you can try this one in impala-shell:

    set PARQUET_FILE_SIZE=${block_size_you_want_to_set};


    2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
    Hello Lenni

    I already tried invalidate metadata command, this doesn't work.

    I am writing the parquet files from a mapreduce job and after the job
    finished I online those files through the impala JDBC API.

    Then I have to call invalidate metadata to see the table in impala.

    I was wondering if there is any configuration settings for impala or hdfs
    which control the maximum block size of the file on hdfs.

    Thanks
    Pengcheng

    On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

    Hi Pengcheng,
    Since Impala caches the table metadata, including block location
    information, you will need to run an "invalidate metadata <table name>"
    after you change the block size. Can you try running that command and then
    re-running your query?

    Let me know how this works out. If it resolves the problem we can look at
    how to improve the error message in Impala to make it easier to diagnose.

    Thanks,
    Lenni

    On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet
    format is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with
    20 partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information
    saying cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response
    is setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits on
    the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.


    --
    Best Wishes!

    Yours, Zesheng

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Pengcheng Liu at May 13, 2014 at 2:07 pm
    Hello Zesheng

    I tried that still not working. This time when I use 4GB block query failed
    not returning any values. Before when I use 1GB block size, the query will
    complete and give me a result and with some additional error log
    information.

    Thanks
    Pengcheng

    On Sat, May 10, 2014 at 10:54 PM, Zesheng Wu wrote:

    Hi Pengcheng, you can try this one in impala-shell:

    set PARQUET_FILE_SIZE=${block_size_you_want_to_set};


    2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:

    Hello Lenni
    I already tried invalidate metadata command, this doesn't work.

    I am writing the parquet files from a mapreduce job and after the job
    finished I online those files through the impala JDBC API.

    Then I have to call invalidate metadata to see the table in impala.

    I was wondering if there is any configuration settings for impala or hdfs
    which control the maximum block size of the file on hdfs.

    Thanks
    Pengcheng

    On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

    Hi Pengcheng,
    Since Impala caches the table metadata, including block location
    information, you will need to run an "invalidate metadata <table name>"
    after you change the block size. Can you try running that command and then
    re-running your query?

    Let me know how this works out. If it resolves the problem we can look
    at how to improve the error message in Impala to make it easier to diagnose.

    Thanks,
    Lenni

    On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet
    format is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with
    20 partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information
    saying cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response
    is setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits
    on the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.


    --
    Best Wishes!

    Yours, Zesheng

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Zesheng Wu at May 13, 2014 at 2:10 pm
    I've tried the option on impala 1.2.4, it does work.


    2014-05-13 22:07 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:
    Hello Zesheng

    I tried that still not working. This time when I use 4GB block query
    failed not returning any values. Before when I use 1GB block size, the
    query will complete and give me a result and with some additional error log
    information.

    Thanks
    Pengcheng

    On Sat, May 10, 2014 at 10:54 PM, Zesheng Wu wrote:

    Hi Pengcheng, you can try this one in impala-shell:

    set PARQUET_FILE_SIZE=${block_size_you_want_to_set};


    2014-05-10 4:22 GMT+08:00 Pengcheng Liu <zenonlpc@gmail.com>:

    Hello Lenni
    I already tried invalidate metadata command, this doesn't work.

    I am writing the parquet files from a mapreduce job and after the job
    finished I online those files through the impala JDBC API.

    Then I have to call invalidate metadata to see the table in impala.

    I was wondering if there is any configuration settings for impala or
    hdfs which control the maximum block size of the file on hdfs.

    Thanks
    Pengcheng

    On Thu, May 8, 2014 at 3:43 PM, Lenni Kuff wrote:

    Hi Pengcheng,
    Since Impala caches the table metadata, including block location
    information, you will need to run an "invalidate metadata <table name>"
    after you change the block size. Can you try running that command and then
    re-running your query?

    Let me know how this works out. If it resolves the problem we can look
    at how to improve the error message in Impala to make it easier to diagnose.

    Thanks,
    Lenni

    On Thu, May 8, 2014 at 8:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet
    format is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with
    20 partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information
    saying cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response
    is setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits
    on the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.


    --
    Best Wishes!

    Yours, Zesheng

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.


    --
    Best Wishes!

    Yours, Zesheng

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Deepak Gattala at May 18, 2014 at 7:22 am
    i also have similar issue going with me i am on CDH 4.5 and i am trying to
    insert into parquet table from a raw table in hive.



    [ausgtmhadoop01:21000] > insert INTO gbl_sdr_aud_t.customer_product_append
    SELECT * FROM gbl_sdr_aud_t.customer_product_inc;
    Query: insert INTO gbl_sdr_aud_t.customer_product_append SELECT * FROM
    gbl_sdr_aud_t.customer_product_inc
    Query aborted.

    ERRORS ENCOUNTERED DURING EXECUTION: Backend 1:Failed to close HDFS file:
    hdfs://nameservice1/user/hive/warehouse/gbl_sdr_aud_t.db/customer_product_append/.impala_insert_staging/fe4e8a47162c631b_351ee8923ed672b9//.-122008101973105893-3827752448128611003_1027235387_dir/-122008101973105893-3827752448128611003_1531887124_data.0
    Error(255): Unknown error 255
    Failed to get info on temporary HDFS file:
    hdfs://nameservice1/user/hive/warehouse/gbl_sdr_aud_t.db/customer_product_append/.impala_insert_staging/fe4e8a47162c631b_351ee8923ed672b9//.-122008101973105893-3827752448128611003_1027235387_dir/-122008101973105893-3827752448128611003_1531887124_data.0
    Error(2): No such file or directory
    Backend 4:Error seeking to 7516192768 in file:
    hdfs://nameservice1/user/hive/warehouse/gbl_sdr_aud_t.db/customer_product_inc/part-m-00000

    Error(255): Unknown error 255


    On Thu, May 8, 2014 at 10:15 AM, Pengcheng Liu wrote:

    Hello experts

    I have been working with impala for a year and now the new parquet format
    is really exciting.

    I had impala version vcdh5-1.3.0

    I had a data set about 40G size in parquet (raw data is 500G) and with 20
    partitions but the partition is not evenly distributed.

    When i set the block size 1 GB, some of files are split into multiple
    blocks since they are larger than 1 GB.

    The impala query will work but it gives me some warning information saying
    cannot query parquet files with multiple blocks.

    And I saw some folks posted a similar problem here and one of response is
    setting the block size larger than the actual size of the file.

    So I go ahead tried that I used 10 GB as my hdfs file block size.

    Now my query failed with this error message:

    ERROR: Error seeking to 3955895608 in file:
    hdfs://research-mn00.saas.local:8020/user/tablepar/201309/-r-00106.snappy.parquet


    Error(22): Invalid argument
    ERROR: Invalid query handle

    Is this error due to the large block size I used? Is there any limits on
    the maximum block size we can create on hdfs?

    Thanks
    Pengcheng



    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 8, '14 at 3:15p
activeMay 18, '14 at 7:22a
posts7
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase