FAQ
Thanks for the pointer. I'm not specifying encoding anywhere, so I'm pretty
sure it's the default. I'll ask on the Parquet list, but do you have any
idea how to specify plain encoding?
On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne wrote:

It looks like your Parquet file uses the bit-packed data encoding, and
Impala currently only supports the default encoding (sorry for the
confusion, the error message could stand to be a little clearer). Try
changing the encoding and re-running your MR job.

FYI, the count(*) worked because the file scanner doesn't have to decode
the data in order to get the row counts.

Skye


On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc <coli...@gmail.com<javascript:>
wrote:
I created a parquet file with a mapreduce job and used `create external
table` to add it to impala. I can `select count(*)` from the table just
fine, but when I try to `select * limit 1`, it gives me:

ERROR: File hdfs://.../part-r-00000.parquet uses an unsupported
encoding: 4 for column 0
ERROR: Invalid query handle

Am I doing something wrong when I'm creating the file?

Search Discussions

  • John Russell at Apr 26, 2013 at 9:21 pm
    I've clarified this restriction in the 'Parquet' topic in the docs.

    John
    On Apr 26, 2013, at 2:10 PM, Skye Wanderman-Milne wrote:

    It looks like your Parquet file uses the bit-packed data encoding, and Impala currently only supports the default encoding (sorry for the confusion, the error message could stand to be a little clearer). Try changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to decode the data in order to get the row counts.

    Skye


    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:
    I created a parquet file with a mapreduce job and used `create external table` to add it to impala. I can `select count(*)` from the table just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.parquet uses an unsupported encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Skye Wanderman-Milne at Apr 26, 2013 at 9:27 pm
    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?

    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne wrote:

    It looks like your Parquet file uses the bit-packed data encoding, and
    Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to decode
    the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create external
    table` to add it to impala. I can `select count(*)` from the table just
    fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parquet uses an unsupported
    encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Colin Marc at Apr 27, 2013 at 12:01 am
    Following up from the other thread, it looks like this may be a bug in
    impala. To summarize:

    ColumnMetaData.encodings contains the encodings in a given column of a
    parquet file, and includes both the definition level (dl) and repetition
    level (rl) encodings as well as the value encodings themselves. The dl/rl
    encoding is BIT_PACKED, and as far as I can tell, this is expected by
    impala. However, it checks that list of encodings to ensure that the value
    encoding is plain:

    parquet::ColumnChunk& file_data =
    file_metadata_.row_groups[0].columns[col_idx];

    // Check the encodings are supported
    vector<parquet::Encoding::type>& encodings = file_data.meta_data.encodings;
    for (int i = 0; i < encodings.size(); ++i) {
       if (encodings[i] != parquet::Encoding::PLAIN) {
         stringstream ss;
         ss << "File " << stream_->filename() << " uses an unsupported encoding:
    "
            << encodings[i] << " for column " << col_idx;
         return Status(ss.str());
       }
    }

    By removing the definition level and repetition level encodings from
    ColumnMetadata.encodings (I detail this hack in the other thread),
    recompiling parquet-mr, and rebuilding the file, I can get the query to run
    successfully, but:

    - if I try to do 'select * from parquet_table limit 1', I get an empty
    result
    - if I try 'select foo from parquet_table limit 1', Impala crashes (output
    attached).

    I don't think it's a problem with the file; I can still count(*) just fine,
    and use the file as a source in a map/reduce job.
    On Friday, April 26, 2013 2:56:16 PM UTC-7, Colin Marc wrote:

    Heh - it seems there is some disagreement on what the default encoding is
    =)

    https://groups.google.com/forum/?fromgroups=#!topic/parquet-dev/FjzYFxuEq5E
    On Friday, April 26, 2013 2:27:00 PM UTC-7, Skye Wanderman-Milne wrote:

    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?

    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne wrote:

    It looks like your Parquet file uses the bit-packed data encoding, and
    Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to
    decode the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create
    external table` to add it to impala. I can `select count(*)` from the table
    just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parquet uses an unsupported
    encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Nong Li at Apr 29, 2013 at 8:56 pm
    Colin,

    This does look like an issue in our implementation. Can you share the
    schema for your table? We've identified the
    compatibility issue and it will be fixed in our next release. Just to
    confirm, can you share with us your schema?

    Thanks
    Nong

    On Fri, Apr 26, 2013 at 5:19 PM, Skye Wanderman-Milne wrote:

    Hi Colin,

    Could you send us the impalad log after running these queries?

    Thanks,
    Skye

    On Fri, Apr 26, 2013 at 5:01 PM, Colin Marc wrote:

    Following up from the other thread, it looks like this may be a bug in
    impala. To summarize:

    ColumnMetaData.encodings contains the encodings in a given column of a
    parquet file, and includes both the definition level (dl) and repetition
    level (rl) encodings as well as the value encodings themselves. The dl/rl
    encoding is BIT_PACKED, and as far as I can tell, this is expected by
    impala. However, it checks that list of encodings to ensure that the value
    encoding is plain:

    parquet::ColumnChunk& file_data = file_metadata_.row_groups[0].**
    columns[col_idx];

    // Check the encodings are supported
    vector<parquet::Encoding::**type>& encodings =
    file_data.meta_data.encodings;
    for (int i = 0; i < encodings.size(); ++i) {
    if (encodings[i] != parquet::Encoding::PLAIN) {
    stringstream ss;
    ss << "File " << stream_->filename() << " uses an unsupported
    encoding: "
    << encodings[i] << " for column " << col_idx;
    return Status(ss.str());
    }
    }

    By removing the definition level and repetition level encodings from
    ColumnMetadata.encodings (I detail this hack in the other thread),
    recompiling parquet-mr, and rebuilding the file, I can get the query to run
    successfully, but:

    - if I try to do 'select * from parquet_table limit 1', I get an empty
    result
    - if I try 'select foo from parquet_table limit 1', Impala crashes
    (output attached).

    I don't think it's a problem with the file; I can still count(*) just
    fine, and use the file as a source in a map/reduce job.
    On Friday, April 26, 2013 2:56:16 PM UTC-7, Colin Marc wrote:

    Heh - it seems there is some disagreement on what the default encoding
    is =)

    https://groups.google.com/**forum/?fromgroups=#!topic/**
    parquet-dev/FjzYFxuEq5E<https://groups.google.com/forum/?fromgroups=#!topic/parquet-dev/FjzYFxuEq5E>
    On Friday, April 26, 2013 2:27:00 PM UTC-7, Skye Wanderman-Milne wrote:

    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?

    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne wrote:

    It looks like your Parquet file uses the bit-packed data encoding,
    and Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to
    decode the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create
    external table` to add it to impala. I can `select count(*)` from the table
    just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parque**t uses an
    unsupported encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Bewang Tech at May 1, 2013 at 5:51 pm
    I encountered the exact same problem. Is this fixed in 1.0? This issue
    blocks us from running queries.

    Follow Colin's method, but still got the same error.
    On Monday, April 29, 2013 1:56:21 PM UTC-7, Nong wrote:

    Colin,

    This does look like an issue in our implementation. Can you share the
    schema for your table? We've identified the
    compatibility issue and it will be fixed in our next release. Just to
    confirm, can you share with us your schema?

    Thanks
    Nong


    On Fri, Apr 26, 2013 at 5:19 PM, Skye Wanderman-Milne <sk...@cloudera.com<javascript:>
    wrote:
    Hi Colin,

    Could you send us the impalad log after running these queries?

    Thanks,
    Skye


    On Fri, Apr 26, 2013 at 5:01 PM, Colin Marc <coli...@gmail.com<javascript:>
    wrote:
    Following up from the other thread, it looks like this may be a bug in
    impala. To summarize:

    ColumnMetaData.encodings contains the encodings in a given column of a
    parquet file, and includes both the definition level (dl) and repetition
    level (rl) encodings as well as the value encodings themselves. The dl/rl
    encoding is BIT_PACKED, and as far as I can tell, this is expected by
    impala. However, it checks that list of encodings to ensure that the value
    encoding is plain:

    parquet::ColumnChunk& file_data = file_metadata_.row_groups[0].**
    columns[col_idx];

    // Check the encodings are supported
    vector<parquet::Encoding::**type>& encodings =
    file_data.meta_data.encodings;
    for (int i = 0; i < encodings.size(); ++i) {
    if (encodings[i] != parquet::Encoding::PLAIN) {
    stringstream ss;
    ss << "File " << stream_->filename() << " uses an unsupported
    encoding: "
    << encodings[i] << " for column " << col_idx;
    return Status(ss.str());
    }
    }

    By removing the definition level and repetition level encodings from
    ColumnMetadata.encodings (I detail this hack in the other thread),
    recompiling parquet-mr, and rebuilding the file, I can get the query to run
    successfully, but:

    - if I try to do 'select * from parquet_table limit 1', I get an empty
    result
    - if I try 'select foo from parquet_table limit 1', Impala crashes
    (output attached).

    I don't think it's a problem with the file; I can still count(*) just
    fine, and use the file as a source in a map/reduce job.
    On Friday, April 26, 2013 2:56:16 PM UTC-7, Colin Marc wrote:

    Heh - it seems there is some disagreement on what the default encoding
    is =)

    https://groups.google.com/**forum/?fromgroups=#!topic/**
    parquet-dev/FjzYFxuEq5E<https://groups.google.com/forum/?fromgroups=#!topic/parquet-dev/FjzYFxuEq5E>
    On Friday, April 26, 2013 2:27:00 PM UTC-7, Skye Wanderman-Milne wrote:

    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?


    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne
    wrote:
    It looks like your Parquet file uses the bit-packed data encoding,
    and Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to
    decode the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create
    external table` to add it to impala. I can `select count(*)` from the table
    just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parque**t uses an
    unsupported encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Bewang Tech at May 1, 2013 at 9:27 pm
    I didn't update parquet-hadoop jar correctly so the mapreduce job still
    used the old jar without Colin's changes. That is why I got the same error.

    Once the jar is updated, I got a new error:

    Query aborted, unable to fetch data

    Backend 3:Corrupt data page
    Backend 5:Corrupt data page

    On Wednesday, May 1, 2013 10:51:53 AM UTC-7, bewan...@gmail.com wrote:

    I encountered the exact same problem. Is this fixed in 1.0? This issue
    blocks us from running queries.

    Follow Colin's method, but still got the same error.
    On Monday, April 29, 2013 1:56:21 PM UTC-7, Nong wrote:

    Colin,

    This does look like an issue in our implementation. Can you share the
    schema for your table? We've identified the
    compatibility issue and it will be fixed in our next release. Just to
    confirm, can you share with us your schema?

    Thanks
    Nong


    On Fri, Apr 26, 2013 at 5:19 PM, Skye Wanderman-Milne <sk...@cloudera.com
    wrote:
    Hi Colin,

    Could you send us the impalad log after running these queries?

    Thanks,
    Skye

    On Fri, Apr 26, 2013 at 5:01 PM, Colin Marc wrote:

    Following up from the other thread, it looks like this may be a bug in
    impala. To summarize:

    ColumnMetaData.encodings contains the encodings in a given column of a
    parquet file, and includes both the definition level (dl) and repetition
    level (rl) encodings as well as the value encodings themselves. The dl/rl
    encoding is BIT_PACKED, and as far as I can tell, this is expected by
    impala. However, it checks that list of encodings to ensure that the value
    encoding is plain:

    parquet::ColumnChunk& file_data = file_metadata_.row_groups[0].**
    columns[col_idx];

    // Check the encodings are supported
    vector<parquet::Encoding::**type>& encodings =
    file_data.meta_data.encodings;
    for (int i = 0; i < encodings.size(); ++i) {
    if (encodings[i] != parquet::Encoding::PLAIN) {
    stringstream ss;
    ss << "File " << stream_->filename() << " uses an unsupported
    encoding: "
    << encodings[i] << " for column " << col_idx;
    return Status(ss.str());
    }
    }

    By removing the definition level and repetition level encodings from
    ColumnMetadata.encodings (I detail this hack in the other thread),
    recompiling parquet-mr, and rebuilding the file, I can get the query to run
    successfully, but:

    - if I try to do 'select * from parquet_table limit 1', I get an empty
    result
    - if I try 'select foo from parquet_table limit 1', Impala crashes
    (output attached).

    I don't think it's a problem with the file; I can still count(*) just
    fine, and use the file as a source in a map/reduce job.
    On Friday, April 26, 2013 2:56:16 PM UTC-7, Colin Marc wrote:

    Heh - it seems there is some disagreement on what the default encoding
    is =)

    https://groups.google.com/**forum/?fromgroups=#!topic/**
    parquet-dev/FjzYFxuEq5E<https://groups.google.com/forum/?fromgroups=#!topic/parquet-dev/FjzYFxuEq5E>
    On Friday, April 26, 2013 2:27:00 PM UTC-7, Skye Wanderman-Milne wrote:

    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?


    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne
    wrote:
    It looks like your Parquet file uses the bit-packed data encoding,
    and Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to
    decode the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create
    external table` to add it to impala. I can `select count(*)` from the table
    just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parque**t uses an
    unsupported encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?
  • Aries lei at May 14, 2013 at 1:54 am
    Hi nong:
      i'm encounter the same problem as colin. The parquet file is generated
    by parquet-pig. The attachment is the impala log when I execute the query.

    and here's the schema in pig:
    {ip: chararray,appid: int,releaseversion: chararray,commandid:
    chararray,apn: chararray,resultcode: int,device: chararray,sdkversion:
    chararray,touin: int,tmcost: int}


    在 2013年4月30日星期二UTC+8上午4时56分21秒,Nong写道:
    Colin,

    This does look like an issue in our implementation. Can you share the
    schema for your table? We've identified the
    compatibility issue and it will be fixed in our next release. Just to
    confirm, can you share with us your schema?

    Thanks
    Nong


    On Fri, Apr 26, 2013 at 5:19 PM, Skye Wanderman-Milne <sk...@cloudera.com<javascript:>
    wrote:
    Hi Colin,

    Could you send us the impalad log after running these queries?

    Thanks,
    Skye


    On Fri, Apr 26, 2013 at 5:01 PM, Colin Marc <coli...@gmail.com<javascript:>
    wrote:
    Following up from the other thread, it looks like this may be a bug in
    impala. To summarize:

    ColumnMetaData.encodings contains the encodings in a given column of a
    parquet file, and includes both the definition level (dl) and repetition
    level (rl) encodings as well as the value encodings themselves. The dl/rl
    encoding is BIT_PACKED, and as far as I can tell, this is expected by
    impala. However, it checks that list of encodings to ensure that the value
    encoding is plain:

    parquet::ColumnChunk& file_data = file_metadata_.row_groups[0].**
    columns[col_idx];

    // Check the encodings are supported
    vector<parquet::Encoding::**type>& encodings =
    file_data.meta_data.encodings;
    for (int i = 0; i < encodings.size(); ++i) {
    if (encodings[i] != parquet::Encoding::PLAIN) {
    stringstream ss;
    ss << "File " << stream_->filename() << " uses an unsupported
    encoding: "
    << encodings[i] << " for column " << col_idx;
    return Status(ss.str());
    }
    }

    By removing the definition level and repetition level encodings from
    ColumnMetadata.encodings (I detail this hack in the other thread),
    recompiling parquet-mr, and rebuilding the file, I can get the query to run
    successfully, but:

    - if I try to do 'select * from parquet_table limit 1', I get an empty
    result
    - if I try 'select foo from parquet_table limit 1', Impala crashes
    (output attached).

    I don't think it's a problem with the file; I can still count(*) just
    fine, and use the file as a source in a map/reduce job.
    On Friday, April 26, 2013 2:56:16 PM UTC-7, Colin Marc wrote:

    Heh - it seems there is some disagreement on what the default encoding
    is =)

    https://groups.google.com/**forum/?fromgroups=#!topic/**
    parquet-dev/FjzYFxuEq5E<https://groups.google.com/forum/?fromgroups=#!topic/parquet-dev/FjzYFxuEq5E>
    On Friday, April 26, 2013 2:27:00 PM UTC-7, Skye Wanderman-Milne wrote:

    I do not, sorry. The Parquet list is probably your best bet.

    On Fri, Apr 26, 2013 at 2:17 PM, Colin Marc wrote:

    Thanks for the pointer. I'm not specifying encoding anywhere, so I'm
    pretty sure it's the default. I'll ask on the Parquet list, but do you have
    any idea how to specify plain encoding?


    On Friday, April 26, 2013 2:10:55 PM UTC-7, Skye Wanderman-Milne
    wrote:
    It looks like your Parquet file uses the bit-packed data encoding,
    and Impala currently only supports the default encoding (sorry for the
    confusion, the error message could stand to be a little clearer). Try
    changing the encoding and re-running your MR job.

    FYI, the count(*) worked because the file scanner doesn't have to
    decode the data in order to get the row counts.

    Skye

    On Thu, Apr 25, 2013 at 8:28 PM, Colin Marc wrote:

    I created a parquet file with a mapreduce job and used `create
    external table` to add it to impala. I can `select count(*)` from the table
    just fine, but when I try to `select * limit 1`, it gives me:

    ERROR: File hdfs://.../part-r-00000.**parque**t uses an
    unsupported encoding: 4 for column 0
    ERROR: Invalid query handle

    Am I doing something wrong when I'm creating the file?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 26, '13 at 9:17p
activeMay 14, '13 at 1:54a
posts8
users6
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase