FAQ
I can't say for sure that RCFile + LZO is supported, I haven't tried it myself.

LZO is special because it requires doing some installation on each node. See instructions here:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html

John
On Apr 19, 2013, at 11:56 AM, Masahiro Kiura wrote:

Hi Impala, CDH and Cloudera Manager users,

I'm new to Impala. But I'm trying to measure its querying response time. Today,during such process,I got following error message from impala-shell.

$ impala-shell -i <host1> -f <SQL file>
Connected to <host1>:21000
Query: select count(distinct column1) from TABLE_A
Query aborted, unable to fetch data

Backend 6:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
Backend 7:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
Backend 8:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
Backend 9:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
Backend 10:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
Backend 11:Unknown Codec: com.hadoop.compression.lzo.LzopCodec

Regarding TABLE_A, I loaded TABLE_B's records to TABLE_A by following SQL.
$ hive
hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
hive> INSERT INTO TABLE TABLE_A PARTITION(dt='20130420') SELECT <skipped> FROM TABLE_B where dt='20130420';
This process was successful because "SELECT count(*) FROM TABLE_A" returned expected load size.

The installed versions of CDH and impala are as follows.
- CDH 4.2 using parcels
- Cloudera manager 4.5
- impala 0.7 beta
NOTE: I installed hadoop-lzo and impala-lzo via http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html.

My question
1) Dose Impala 0.7 support Lzo-compressed RCFile ?
2) If so, could you please tell me how to configure lzo-compressed RCFile in impala ?

As an additional information, querying to Snappy-compressed RCFile is successful.
$ hive
hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
hive> INSERT INTO TABLE TABLE_C PARTITION(dt='20130420') SELECT <skipped> FROM TABLE_B where dt='20130420';

$ impala-shell -i <host1> -f <SQL file>
Connected to <host1>:21000
Query: select count(distinct column1) from TABLE_C
This query returned expected number.

Best regards,
Masahiro KIURA

Search Discussions

  • Lenni Kuff at Apr 19, 2013 at 8:12 pm
    Hi,
    Currently Impala only supports LZO compressed text files, LZO compression
    is not currently supported for other formats.

    Impala should have a more clear error message and we also need to improve
    our docs. I have filed: https://issues.cloudera.org/browse/IMPALA-290 to
    track improving the error message and will follow up with our docs team to
    that cleaned up.

    Thanks,
    Lenni
    Software Engineer - Cloudera


    On Fri, Apr 19, 2013 at 12:46 PM, John Russell wrote:

    I can't say for sure that RCFile + LZO is supported, I haven't tried it
    myself.

    LZO is special because it requires doing some installation on each node.
    See instructions here:


    http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html

    John

    On Apr 19, 2013, at 11:56 AM, Masahiro Kiura wrote:

    Hi Impala, CDH and Cloudera Manager users,

    I'm new to Impala. But I'm trying to measure its querying response time.
    Today,during such process,I got following error message from impala-shell.

    $ impala-shell -i <host1> -f <SQL file>
    Connected to <host1>:21000
    Query: select count(distinct column1) from TABLE_A
    Query aborted, unable to fetch data

    Backend 6:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 7:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 8:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 9:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 10:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 11:Unknown Codec: com.hadoop.compression.lzo.LzopCodec

    Regarding TABLE_A, I loaded TABLE_B's records to TABLE_A by following SQL.
    $ hive
    hive> SET hive.exec.compress.output=true;
    hive> SET
    mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
    hive> INSERT INTO TABLE TABLE_A PARTITION(dt='20130420') SELECT <skipped>
    FROM TABLE_B where dt='20130420';
    This process was successful because "SELECT count(*) FROM TABLE_A"
    returned expected load size.

    The installed versions of CDH and impala are as follows.
    - CDH 4.2 using parcels
    - Cloudera manager 4.5
    - impala 0.7 beta
    NOTE: I installed hadoop-lzo and impala-lzo via
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html
    .
    My question
    1) Dose Impala 0.7 support Lzo-compressed RCFile ?
    2) If so, could you please tell me how to configure lzo-compressed RCFile
    in impala ?

    As an additional information, querying to Snappy-compressed RCFile is
    successful.
    $ hive
    hive> SET hive.exec.compress.output=true;
    hive> SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    hive> INSERT INTO TABLE TABLE_C PARTITION(dt='20130420') SELECT <skipped>
    FROM TABLE_B where dt='20130420';

    $ impala-shell -i <host1> -f <SQL file>
    Connected to <host1>:21000
    Query: select count(distinct column1) from TABLE_C
    This query returned expected number.

    Best regards,
    Masahiro KIURA

  • Masahiro Kiura at Apr 19, 2013 at 8:31 pm
    Hi Iskuff,

    Thank you for your kind reply and creating the ticket !

    I understood that LZO compression is not currently supported for RCFile. I
    should have read full source of Impala 0.7 on github before sending my
    question. Thank you for your following up.

    Best regards,
    Masahiro KIURA

    2013年4月20日土曜日 5時12分37秒 UTC+9 lskuff:
    Hi,
    Currently Impala only supports LZO compressed text files, LZO compression
    is not currently supported for other formats.

    Impala should have a more clear error message and we also need to improve
    our docs. I have filed: https://issues.cloudera.org/browse/IMPALA-290 to
    track improving the error message and will follow up with our docs team to
    that cleaned up.

    Thanks,
    Lenni
    Software Engineer - Cloudera



    On Fri, Apr 19, 2013 at 12:46 PM, John Russell <jrus...@cloudera.com<javascript:>
    wrote:
    I can't say for sure that RCFile + LZO is supported, I haven't tried it
    myself.

    LZO is special because it requires doing some installation on each node.
    See instructions here:


    http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html

    John

    On Apr 19, 2013, at 11:56 AM, Masahiro Kiura <rocknrol...@gmail.com<javascript:>>
    wrote:

    Hi Impala, CDH and Cloudera Manager users,

    I'm new to Impala. But I'm trying to measure its querying response time.
    Today,during such process,I got following error message from impala-shell.

    $ impala-shell -i <host1> -f <SQL file>
    Connected to <host1>:21000
    Query: select count(distinct column1) from TABLE_A
    Query aborted, unable to fetch data

    Backend 6:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 7:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 8:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 9:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 10:Unknown Codec: com.hadoop.compression.lzo.LzopCodec
    Backend 11:Unknown Codec: com.hadoop.compression.lzo.LzopCodec

    Regarding TABLE_A, I loaded TABLE_B's records to TABLE_A by following SQL.
    $ hive
    hive> SET hive.exec.compress.output=true;
    hive> SET
    mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
    hive> INSERT INTO TABLE TABLE_A PARTITION(dt='20130420') SELECT
    <skipped> FROM TABLE_B where dt='20130420';
    This process was successful because "SELECT count(*) FROM TABLE_A"
    returned expected load size.

    The installed versions of CDH and impala are as follows.
    - CDH 4.2 using parcels
    - Cloudera manager 4.5
    - impala 0.7 beta
    NOTE: I installed hadoop-lzo and impala-lzo via
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/ImpalaBeta/0.7/Installing-and-Using-Impala/ciiu_topic_7_2.html
    .
    My question
    1) Dose Impala 0.7 support Lzo-compressed RCFile ?
    2) If so, could you please tell me how to configure lzo-compressed RCFile
    in impala ?

    As an additional information, querying to Snappy-compressed RCFile is
    successful.
    $ hive
    hive> SET hive.exec.compress.output=true;
    hive> SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    hive> INSERT INTO TABLE TABLE_C PARTITION(dt='20130420') SELECT <skipped>
    FROM TABLE_B where dt='20130420';

    $ impala-shell -i <host1> -f <SQL file>
    Connected to <host1>:21000
    Query: select count(distinct column1) from TABLE_C
    This query returned expected number.

    Best regards,
    Masahiro KIURA

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 19, '13 at 7:46p
activeApr 19, '13 at 8:31p
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase