FAQ
There was an error I was facing where Impala can't seem to detect rows of a
table whose data is stored in the RCfile format. But the new release impala
1.1-1.p0.8 seemed to have fixed it. However, now certain queried that did
work don't anymore. For example, tpch query #1 (also stored in rcfile
format):

select L_RETURNFLAG,
        L_LINESTATUS,
        SUM(L_QUANTITY),
        SUM(L_EXTENDEDPRICE),
        SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
        SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)),
        AVG(L_QUANTITY),
        AVG(L_EXTENDEDPRICE),
        AVG(L_DISCOUNT),
        COUNT(1)
FROM lineitem
WHERE L_SHIPDATE<='1998-09-02'
GROUP BY L_RETURNFLAG,
          L_LINESTATUS
ORDER BY L_RETURNFLAG,
          L_LINESTATUS limit 1000

results in the following error:
Backend 25:Memory limit exceeded
Format error in record or block header at end of file.
Format error in record or block header at end of file.
Format error in record or block header at end of file.
Format error in record or block header at end of file.
.....


Did the RCfile format fix cause this?

Search Discussions

  • Skye Wanderman-Milne at Jul 30, 2013 at 12:35 am
    Hi Amlesh, can you provide the impalad log?

    Thanks,
    Skye

    On Mon, Jul 29, 2013 at 5:26 PM, Amlesh Jayakumar wrote:

    There was an error I was facing where Impala can't seem to detect rows of
    a table whose data is stored in the RCfile format. But the new release
    impala 1.1-1.p0.8 seemed to have fixed it. However, now certain queried
    that did work don't anymore. For example, tpch query #1 (also stored in
    rcfile format):

    select L_RETURNFLAG,
    L_LINESTATUS,
    SUM(L_QUANTITY),
    SUM(L_EXTENDEDPRICE),
    SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
    SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)),
    AVG(L_QUANTITY),
    AVG(L_EXTENDEDPRICE),
    AVG(L_DISCOUNT),
    COUNT(1)
    FROM lineitem
    WHERE L_SHIPDATE<='1998-09-02'
    GROUP BY L_RETURNFLAG,
    L_LINESTATUS
    ORDER BY L_RETURNFLAG,
    L_LINESTATUS limit 1000

    results in the following error:
    Backend 25:Memory limit exceeded
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    .....


    Did the RCfile format fix cause this?
  • Amlesh Jayakumar at Jul 30, 2013 at 7:17 pm
    GLOG_v is already set to 1 (I'm using Cloudera manager to maintain my
    cluster of machines). It already maintains this as 1. But have other users
    reported a similar issue? Because it seems like whenever the number of
    precess rows gets too big it exits with the above message.
  • Skye Wanderman-Milne at Jul 30, 2013 at 7:51 pm
    Oh I see, sorry about that. It looks like the log doesn't contain the
    problematic query though, so could you run the query again and send the
    resulting log file? I'm trying to determine what's causing the "Format
    error in record or block header at end of file" messages (it may be related
    to the memory limit being hit, the RCfile bug that was fixed, or something
    we don't know about yet).

    Thanks,
    Skye

    On Tue, Jul 30, 2013 at 12:17 PM, Amlesh Jayakumar wrote:

    GLOG_v is already set to 1 (I'm using Cloudera manager to maintain my
    cluster of machines). It already maintains this as 1. But have other users
    reported a similar issue? Because it seems like whenever the number of
    precess rows gets too big it exits with the above message.
  • Amlesh Jayakumar at Jul 31, 2013 at 12:07 am
    Oh, here's the actual issue:
    I0730 17:02:09.135352 3933 thrift-util.cc:97] TSocket::read() recv()
    <Host: 10.8.177.27 Port: 56201>Connection reset by peer
    I0730 17:02:09.135356 5917 thrift-util.cc:97] TSocket::read() recv()
    <Host: 10.8.177.27 Port: 44311>Connection reset by peer
    I0730 17:02:09.138114 2250 thrift-util.cc:97] TSocket::read() recv()
    <Host: 10.8.177.21 Port: 42209>Connection reset by peer
  • Amlesh Jayakumar at Jul 31, 2013 at 1:50 am
    Now the issue is that the daemon doesn't die upon running the above query
    (it only dies on unformatted queries) but yeah that was still an issue that
    I encountered. In this case what happens is that it just exits with the
    following message and produces one log which seems normal (the things I
    posted above was a result of setting GLOG_v to 2 and not 1):

    Backend 7:Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    First error while processing:
    hdfs://test001:8020/user/hive/warehouse/lineitem/001042_0 at offset:
    135600128
    Format error in record or block header at end of file.
    Format error in record or block header at end of file.
    First error while processing:
    hdfs://test001:8020/user/hive/warehouse/lineitem/000220_0 at offset:
    135804928
    Format error in record or block header at end of file.
    First error while processing:
    hdfs://test001:8020/user/hive/warehouse/lineitem/000929_0 at offset:
    135610368
    First error while processing:
    hdfs://test001:8020/user/hive/warehouse/lineitem/000898_0 at offset:
    134893568
    Format error in record or block header at end of file.

    Seeing as the 'lineitem' table is stored in RCFile format, and seeing as
    this query did succeed for the earlier Impala release, I think it might be
    that issue.
  • Skye Wanderman-Milne at Aug 1, 2013 at 4:55 pm
    Hi Amlesh, thanks for continuing to investigate this. Unfortunately this
    log file doesn't contain the specific error message I'm looking for that
    will identify what the problem is (the message will only appear in the log
    of the impalad that encountered the error, which is not necessarily the
    impalad that you're connecting to). Can you use Cloudera Manager to search
    all Impala logs for "status.cc" and send me anything that comes up?

    Sorry this is so difficult -- we should really print the specific error
    message to the shell in addition to logging it; I'll try to get this in for
    our next release.

    Thanks,
    Skye

    On Tue, Jul 30, 2013 at 7:12 PM, Amlesh Jayakumar wrote:

    Actually, there was a log file I had to hunt down that depicts the issue.
    See attached.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJul 30, '13 at 12:26a
activeAug 1, '13 at 4:55p
posts7
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase