FAQ
Hi Mark,

the error indicates that the job was not able to find an applicable
compression codec based on the ".lzo" file extension. The job consults the
hadoop configuration files to determine such codecs.
To run the indexed it is required that you follow the steps in:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

In particular, you should make the changes to the core-site.xml
configuration files. Those changes are also required to make the indexer
run.

Cheers,

Alex

On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

Mark, you might also want to take a look at our documentation about
how to use lzo-compressed files with Impala:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
On Thu, May 9, 2013 at 4:22 PM, Alex Behm wrote:
Hi Mark,

you can point the LZO indexer to an HDFS directory and the indexer job will
traverse all sub-directories. It will look for files with the ".lzo"
extension and create a ".index" file for each such file.
For example, suppose you had the following HDFS directory structure for
managing your "searches" table partitioned by "day":
/searches/1/file_a.lzo
/searches/1/file_b.lzo
/searches/2/file_a.lzo
/searches/3/file_a.lzo

You can point the LZO indexer to the "/searches/" directory and it will
create appropriate index files resulting in:
/searches/1/file_a.lzo
/searches/1/file_a.index
/searches/1/file_b.lzo
/searches/1/file_b.index
/searches/2/file_a.lzo
/searches/2/file_a.index
/searches/3/file_a.lzo
/searches/3/file_a.index

Hope it helps!

Cheers,

Alex





On Thu, May 9, 2013 at 11:37 AM, Mark wrote:

So I'm using an external table with partitions by day. Will I need to
run
this indexer over each partition? I'm guessing so since there is no
"real"
table anywhere. Also, where do these index files get stored?

Thanks

On May 8, 2013, at 5:41 PM, Alex Behm wrote:

Sure, you can run the indexer on external tables.

You can follow the steps documented here:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
Basically, you need to install another software package, change a few
configuration options, and then run the indexer on whatever compressed
LZO
tables you have.
The indexer is nothing but a Hadoop MapReduce job.

Cheers,

Alex



On Wed, May 8, 2013 at 3:56 PM, Mark wrote:

Thanks. Could you please explain the indexing processes a bit further?
Can this be used on an external table?

On May 8, 2013, at 2:50 PM, Alex Behm wrote:

Hi Mark,

the error indicates that the table metadata obtained from the Hive
meteatore is inconsistent with the ".lzo" file suffix, i.e., the table
metadata says that the file format is not LZO compressed text.
When creating the external table did you use the following proper
'stored
as' clause?

STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
Btw, Impala is able to query unindexed LZO compressed text files, but
it
is strongly encouraged to index the data for performance reasons.

Cheers,

Alex

On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

Best to ask Impala questions on the Impala user lists
(impala-user@cloudera.org), which I've added here.

Yes Impala can query LZO tables iff they are also indexed. Have you
installed the gplextras packages required for this functionality, as
documented at
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
On Wed, May 8, 2013 at 3:19 AM, StaticVoid <static.void.dev@gmail.com
wrote:
Can you use impala on an external table that has LZO compressed
files?
Your query has the following error(s):

AnalysisException: Failed to load metadata for table: searches
CAUSED
BY:
TableLoadingException: Failed to load metadata for table: searches
CAUSED
BY: RuntimeException: Compressed file not supported without
compression
input format:

hdfs://
hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
--



--
Harsh J

Search Discussions

  • Alex Behm at May 10, 2013 at 2:23 am
    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then
    you unzip those configs on your client (where you want to run the "hadoop"
    command from) and set an environment variable HADOOP_CONF_DIR to point to
    those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that LZO
    support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Thu, May 9, 2013 at 4:22 PM, Alex Behm wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need to
    run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm wrote:

    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever compressed
    LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files, but
    it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality, as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table: searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Alex Behm at May 13, 2013 at 7:59 pm
    Glad to know you got it working, Mark!

    Alex

    On Fri, May 10, 2013 at 10:39 AM, Mark wrote:

    Just wanted to let you know that I ended up copying the configuration
    files to one of the nodes and configured hadoop-conf, hbase-conf and
    hive-conf via alternatives and everything appears to be working!

    On May 10, 2013, at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how it
    works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then
    you unzip those configs on your client (where you want to run the "hadoop"
    command from) and set an environment variable HADOOP_CONF_DIR to point to
    those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need to
    run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm wrote:

    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality, as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table: searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Alex Behm at May 13, 2013 at 8:01 pm
    As far as I know CM should configure the machines in such a way that your
    shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be able
    to provide you with more details.

    Cheers,

    Alex

    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how it
    works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then
    you unzip those configs on your client (where you want to run the "hadoop"
    command from) and set an environment variable HADOOP_CONF_DIR to point to
    those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need to
    run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm wrote:

    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality, as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table: searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Mark at May 13, 2013 at 8:33 pm
    Got it. There is a "Deploy Configuration" option in the Cluster actions dropdown that will update the local config files on all of the nodes. Sweet!



    On May 13, 2013, at 1:01 PM, Alex Behm wrote:

    As far as I know CM should configure the machines in such a way that your shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be able to provide you with more details.

    Cheers,

    Alex


    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:
    Thanks for the help. I'll try this sometime today and let you know how it works out.

    Would you mind clarifying some things for me though. So it seems everything configured via CM doesn't apply when running any command line tools from individual nodes, correct? Is there any reason CM doesn't modify the appropriate configuration files on the cluster? Will I need to modify each nodes local configuration files for these command line tools to work or will it be sufficient just modifying the one I'm issuing the command from? Is there any wrapper script I can use that will effectively use the configurations stored in CM?

    Thanks
    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps here to make sure the "hadoop" command that you are running from the shell is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then you unzip those configs on your client (where you want to run the "hadoop" command from) and set an environment variable HADOOP_CONF_DIR to point to those configs.
    This should ensure that the "hadoop" command is picking up the proper configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex



    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:
    I have made all those changes via the CM in Hue and I've confined that LZO support does indeed work when running tasks through the Oozie workflow manager. I've also confirmed that Impala works over LZO (unindexed) files via Hue.

    Seems like everything through the traditional command line hasn't been configured with LZO support though. This is exactly what I'm running into with this: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?


    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable compression codec based on the ".lzo" file extension. The job consults the hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml configuration files. Those changes are also required to make the indexer run.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:
    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Thu, May 9, 2013 at 4:22 PM, Alex Behm wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex





    On Thu, May 9, 2013 at 11:37 AM, Mark wrote:

    So I'm using an external table with partitions by day. Will I need to run
    this indexer over each partition? I'm guessing so since there is no "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm wrote:

    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex



    On Wed, May 8, 2013 at 3:56 PM, Mark wrote:

    Thanks. Could you please explain the indexing processes a bit further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper 'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

    Btw, Impala is able to query unindexed LZO compressed text files, but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality, as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed files?

    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table: searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo

    --



    --
    Harsh J
  • Alex Behm at May 13, 2013 at 8:34 pm
    Nice! Thanks for letting me know.

    Alex

    On Mon, May 13, 2013 at 1:33 PM, Mark wrote:

    Got it. There is a "Deploy Configuration" option in the Cluster actions
    dropdown that will update the local config files on all of the nodes. Sweet!



    On May 13, 2013, at 1:01 PM, Alex Behm wrote:

    As far as I know CM should configure the machines in such a way that your
    shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be
    able to provide you with more details.

    Cheers,

    Alex

    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how it
    works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then
    you unzip those configs on your client (where you want to run the "hadoop"
    command from) and set an environment variable HADOOP_CONF_DIR to point to
    those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need
    to run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality,
    as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table:
    searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Deepak Gattala at Jul 8, 2013 at 11:40 pm
    Did u make this work I had same setup in CM to do the lzo compression and
    it works great when I query in hive or in hue beeswax it works great. if I
    run same sql select in impala it will throw a error as you knoe can you
    please guide steps on how to make this work.

    Thanks a million for ur help
    Deepak Gattala
    On May 13, 2013 3:33 PM, "Mark" wrote:

    Got it. There is a "Deploy Configuration" option in the Cluster actions
    dropdown that will update the local config files on all of the nodes. Sweet!



    On May 13, 2013, at 1:01 PM, Alex Behm wrote:

    As far as I know CM should configure the machines in such a way that your
    shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be
    able to provide you with more details.

    Cheers,

    Alex

    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how it
    works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.

    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file. Then
    you unzip those configs on your client (where you want to run the "hadoop"
    command from) and set an environment variable HADOOP_CONF_DIR to point to
    those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need
    to run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex

    On Wed, May 8, 2013 at 2:01 AM, Harsh J wrote:

    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have you
    installed the gplextras packages required for this functionality,
    as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table:
    searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Venkata Gattala at Jul 9, 2013 at 4:57 pm
    i am still getting


    *Your query has the following error(s):*

    AnalysisException: Failed to load metadata for table: comb_prod_hier CAUSED
    BY: TableLoadingException: Failed to load metadata for table:
    comb_prod_hier CAUSED BY: RuntimeException: Compressed file not supported
    without compression input format:
    hdfs://nameservice1/user/hive/warehouse/comb_prod_hier/part-m-00000.lzo



    [root@ahad]# rpm -qa | grep impala
    impala-lzo-1.0.1-1.gplextras.p0.84.el5
    impala-1.0.1-1.p0.888.el5
    impala-shell-1.0-1.p0.819.el5
    impala-lzo-debuginfo-1.0.1-1.gplextras.p0.84.el5
    hue-impala-2.2.0+189-1.cdh4.2.0.p0.8.el5



    please advise.
    On Monday, July 8, 2013 6:40:13 PM UTC-5, Venkata Gattala wrote:

    Did u make this work I had same setup in CM to do the lzo compression and
    it works great when I query in hive or in hue beeswax it works great. if I
    run same sql select in impala it will throw a error as you knoe can you
    please guide steps on how to make this work.

    Thanks a million for ur help
    Deepak Gattala
    On May 13, 2013 3:33 PM, "Mark" wrote:

    Got it. There is a "Deploy Configuration" option in the Cluster actions
    dropdown that will update the local config files on all of the nodes. Sweet!



    On May 13, 2013, at 1:01 PM, Alex Behm wrote:

    As far as I know CM should configure the machines in such a way that your
    shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be
    able to provide you with more details.

    Cheers,

    Alex

    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how
    it works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.


    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file.
    Then you unzip those configs on your client (where you want to run the
    "hadoop" command from) and set an environment variable HADOOP_CONF_DIR to
    point to those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need
    to run
    this indexer over each partition? I'm guessing so since there is no
    "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a
    few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex


    On Wed, May 8, 2013 at 2:01 AM, Harsh J <harsh@cloudera.com>
    wrote:
    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have
    you
    installed the gplextras packages required for this functionality,
    as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO compressed
    files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table:
    searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J
  • Venkata Gattala at Jul 9, 2013 at 5:01 pm
    oops sorry i had to restart the impala services and it all worked. also
    make sure the input and output formats are compatible like explained by
    experts in this post.

    thank you al lfor your valuable inputs.
    Thanks
    Deepak Gattala
    On Tuesday, July 9, 2013 11:57:49 AM UTC-5, Venkata Gattala wrote:

    i am still getting


    *Your query has the following error(s):*

    AnalysisException: Failed to load metadata for table: comb_prod_hier
    CAUSED BY: TableLoadingException: Failed to load metadata for table:
    comb_prod_hier CAUSED BY: RuntimeException: Compressed file not supported
    without compression input format:
    hdfs://nameservice1/user/hive/warehouse/comb_prod_hier/part-m-00000.lzo



    [root@ahad]# rpm -qa | grep impala
    impala-lzo-1.0.1-1.gplextras.p0.84.el5
    impala-1.0.1-1.p0.888.el5
    impala-shell-1.0-1.p0.819.el5
    impala-lzo-debuginfo-1.0.1-1.gplextras.p0.84.el5
    hue-impala-2.2.0+189-1.cdh4.2.0.p0.8.el5



    please advise.
    On Monday, July 8, 2013 6:40:13 PM UTC-5, Venkata Gattala wrote:

    Did u make this work I had same setup in CM to do the lzo compression and
    it works great when I query in hive or in hue beeswax it works great. if I
    run same sql select in impala it will throw a error as you knoe can you
    please guide steps on how to make this work.

    Thanks a million for ur help
    Deepak Gattala
    On May 13, 2013 3:33 PM, "Mark" wrote:

    Got it. There is a "Deploy Configuration" option in the Cluster actions
    dropdown that will update the local config files on all of the nodes. Sweet!



    On May 13, 2013, at 1:01 PM, Alex Behm wrote:

    As far as I know CM should configure the machines in such a way that
    your shell should pick up the proper configuration.
    I'm not an expert an CM, so I've added scm-users@cloudera. They may be
    able to provide you with more details.

    Cheers,

    Alex

    On Fri, May 10, 2013 at 8:48 AM, Mark wrote:

    Thanks for the help. I'll try this sometime today and let you know how
    it works out.

    Would you mind clarifying some things for me though. So it seems
    everything configured via CM doesn't apply when running any command line
    tools from individual nodes, correct? Is there any reason CM doesn't modify
    the appropriate configuration files on the cluster? Will I need to modify
    each nodes local configuration files for these command line tools to work
    or will it be sufficient just modifying the one I'm issuing the command
    from? Is there any wrapper script I can use that will effectively use the
    configurations stored in CM?

    Thanks

    On May 9, 2013, at 7:23 PM, Alex Behm wrote:

    Here is another idea.
    Since you configured LZO support via CM. maybe you can follow the steps
    here to make sure the "hadoop" command that you are running from the shell
    is picking up the proper configuration.


    https://ccp.cloudera.com/display/express37/Generating+Client+Configuration

    Basically, you can use CM to export its config files to a .zip file.
    Then you unzip those configs on your client (where you want to run the
    "hadoop" command from) and set an environment variable HADOOP_CONF_DIR to
    point to those configs.
    This should ensure that the "hadoop" command is picking up the proper
    configs. Can you try that and see if it resolves the issue?

    Let me know if you need help with a specific step in that process.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 5:02 PM, Mark wrote:

    I have made all those changes via the CM in Hue and I've confined that
    LZO support does indeed work when running tasks through the Oozie workflow
    manager. I've also confirmed that Impala works over LZO (unindexed) files
    via Hue.

    Seems like everything through the traditional command line hasn't been
    configured with LZO support though. This is exactly what I'm running into
    with this:
    https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/hPKf5C-0yaM

    Do I need to configure the cluster via CM and on the filesystem?



    On May 9, 2013, at 4:55 PM, Alex Behm wrote:

    Hi Mark,

    the error indicates that the job was not able to find an applicable
    compression codec based on the ".lzo" file extension. The job consults the
    hadoop configuration files to determine such codecs.
    To run the indexed it is required that you follow the steps in:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    In particular, you should make the changes to the core-site.xml
    configuration files. Those changes are also required to make the indexer
    run.

    Cheers,

    Alex

    On Thu, May 9, 2013 at 1:45 PM, Marcel Kornacker wrote:

    Mark, you might also want to take a look at our documentation about
    how to use lzo-compressed files with Impala:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html

    On Thu, May 9, 2013 at 4:22 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    you can point the LZO indexer to an HDFS directory and the indexer job will
    traverse all sub-directories. It will look for files with the ".lzo"
    extension and create a ".index" file for each such file.
    For example, suppose you had the following HDFS directory structure for
    managing your "searches" table partitioned by "day":
    /searches/1/file_a.lzo
    /searches/1/file_b.lzo
    /searches/2/file_a.lzo
    /searches/3/file_a.lzo

    You can point the LZO indexer to the "/searches/" directory and it will
    create appropriate index files resulting in:
    /searches/1/file_a.lzo
    /searches/1/file_a.index
    /searches/1/file_b.lzo
    /searches/1/file_b.index
    /searches/2/file_a.lzo
    /searches/2/file_a.index
    /searches/3/file_a.lzo
    /searches/3/file_a.index

    Hope it helps!

    Cheers,

    Alex






    On Thu, May 9, 2013 at 11:37 AM, Mark <static.void.dev@gmail.com>
    wrote:
    So I'm using an external table with partitions by day. Will I need
    to run
    this indexer over each partition? I'm guessing so since there is
    no "real"
    table anywhere. Also, where do these index files get stored?

    Thanks

    On May 8, 2013, at 5:41 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Sure, you can run the indexer on external tables.

    You can follow the steps documented here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    Basically, you need to install another software package, change a
    few
    configuration options, and then run the indexer on whatever
    compressed LZO
    tables you have.
    The indexer is nothing but a Hadoop MapReduce job.

    Cheers,

    Alex




    On Wed, May 8, 2013 at 3:56 PM, Mark <static.void.dev@gmail.com>
    wrote:
    Thanks. Could you please explain the indexing processes a bit
    further?
    Can this be used on an external table?

    On May 8, 2013, at 2:50 PM, Alex Behm <alex.behm@cloudera.com>
    wrote:
    Hi Mark,

    the error indicates that the table metadata obtained from the Hive
    meteatore is inconsistent with the ".lzo" file suffix, i.e., the
    table
    metadata says that the file format is not LZO compressed text.
    When creating the external table did you use the following proper
    'stored
    as' clause?

    STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    Btw, Impala is able to query unindexed LZO compressed text files,
    but it
    is strongly encouraged to index the data for performance reasons.

    Cheers,

    Alex


    On Wed, May 8, 2013 at 2:01 AM, Harsh J <harsh@cloudera.com>
    wrote:
    Best to ask Impala questions on the Impala user lists
    (impala-user@cloudera.org), which I've added here.

    Yes Impala can query LZO tables iff they are also indexed. Have
    you
    installed the gplextras packages required for this
    functionality, as
    documented at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_lzo.html
    On Wed, May 8, 2013 at 3:19 AM, StaticVoid <
    static.void.dev@gmail.com>
    wrote:
    Can you use impala on an external table that has LZO
    compressed files?
    Your query has the following error(s):

    AnalysisException: Failed to load metadata for table: searches
    CAUSED
    BY:
    TableLoadingException: Failed to load metadata for table:
    searches
    CAUSED
    BY: RuntimeException: Compressed file not supported without
    compression
    input format:

    hdfs://
    hadoop-master.mycompany.com:8020/user/root/rails/archive/search/2013/05/06/part-r-00000.lzo
    --



    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 9, '13 at 11:55p
activeJul 9, '13 at 5:01p
posts9
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase