FAQ
Hi,

I've seen this error in earlier posts but it might be a different one.

I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
12.04, Impala 1.1.1-1.p0.17)

I uploaded a text file to hdfs, 6.5 million records 20 fields all strings.
The file size is 2.5 GiB. It has ',' separator '\\' as escape.

Create external table works fine:

create external table text2 (a1 string, a2 string, a3 string, a4 string, a5
string, a6 string, a7 string, a8 string, a9 string, a10 string, a11 string,
a12 string, a13 string, a14 string, a15 string, a16 string, a17 string, a18
string, a19 string, a20 string) row format delimited fields terminated by
',' escaped by '\\' stored as textfile location '/tmp/text2';

I can query the table, tested with more queries, it looks ok.

Than I created a parquet file:

create table b1 like text2 stored as parquetfile;

than insert the records from text2 to b1:

insert into b1 select * from text2 limit 900000;

Query: insert into b1 select * from text2 limit 900000
Unknown Exception : [Errno 104] Connection reset by peer
Query failed

If I insert fewer records (like 700000) it works. Around 900000 the error
happens. I tried to find the exact number of record I can insert but it
turned out to be non-deterministic.

/var/log/impalad does not show any errors.

Thank you,
Gyorgy




To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Skye Wanderman-Milne at Nov 20, 2013 at 12:25 am
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My guess is
    that your glog level is set too low. Are you running Impala without CM? If
    so, try setting the environment variable GLOG_v=1 and try your query again.
    The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory. Try
    monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a solution or
    workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
    12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all strings.
    The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4 string,
    a5 string, a6 string, a7 string, a8 string, a9 string, a10 string, a11
    string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the error
    happens. I tried to find the exact number of record I can insert but it
    turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • György Balogh at Nov 20, 2013 at 12:29 pm
    Thank you!

    I could reproduce this problem on two different machines with two different
    data sets. I always insert data from a large text table to a parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing happens.
    So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load the
    firs 10 million records to the parquet table with no problem. I can load it
    many times with no prob.

    I cannot load the next 6 million records. I can query the source text table
    with no problem (i double checked i don't see any format problem, escaping
    problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is is
    not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops down to
    6 when it fails.

    Please let me know what other information i should collect to help solving
    this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My guess is
    that your glog level is set too low. Are you running Impala without CM? If
    so, try setting the environment variable GLOG_v=1 and try your query again.
    The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory. Try
    monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a solution or
    workaround from there.

    Skye


    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh <bogy...@gmail.com<javascript:>
    wrote:
    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
    12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4 string,
    a5 string, a6 string, a7 string, a8 string, a9 string, a10 string, a11
    string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the error
    happens. I tried to find the exact number of record I can insert but it
    turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • György Balogh at Nov 20, 2013 at 12:35 pm
    I forgot, GLOG is set to 1 according to CM.
    Gyorgy

    On Wed, Nov 20, 2013 at 1:29 PM, György Balogh wrote:

    Thank you!

    I could reproduce this problem on two different machines with two
    different data sets. I always insert data from a large text table to a
    parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing happens.
    So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load the
    firs 10 million records to the parquet table with no problem. I can load it
    many times with no prob.

    I cannot load the next 6 million records. I can query the source text
    table with no problem (i double checked i don't see any format problem,
    escaping problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is is
    not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops down to
    6 when it fails.

    Please let me know what other information i should collect to help solving
    this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My guess
    is that your glog level is set too low. Are you running Impala without CM?
    If so, try setting the environment variable GLOG_v=1 and try your query
    again. The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory. Try
    monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a solution
    or workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
    12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4 string,
    a5 string, a6 string, a7 string, a8 string, a9 string, a10 string, a11
    string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the
    error happens. I tried to find the exact number of record I can insert but
    it turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • György Balogh at Nov 20, 2013 at 12:51 pm
    I run it with debug mode impala with the same result:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Error communicating with impalad: TSocket read 0 bytes
    Query failed
    [Not connected] >

    I don't know where to look for core dump file if there is any.
    Gyorgy

    On Wednesday, November 20, 2013 1:35:19 PM UTC+1, György Balogh wrote:

    I forgot, GLOG is set to 1 according to CM.
    Gyorgy

    On Wed, Nov 20, 2013 at 1:29 PM, György Balogh wrote:

    Thank you!

    I could reproduce this problem on two different machines with two
    different data sets. I always insert data from a large text table to a
    parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing happens.
    So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load the
    firs 10 million records to the parquet table with no problem. I can load it
    many times with no prob.

    I cannot load the next 6 million records. I can query the source text
    table with no problem (i double checked i don't see any format problem,
    escaping problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is is
    not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops down
    to 6 when it fails.

    Please let me know what other information i should collect to help
    solving this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My guess
    is that your glog level is set too low. Are you running Impala without CM?
    If so, try setting the environment variable GLOG_v=1 and try your query
    again. The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory.
    Try monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a solution
    or workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
    12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4
    string, a5 string, a6 string, a7 string, a8 string, a9 string, a10 string,
    a11 string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the
    error happens. I tried to find the exact number of record I can insert but
    it turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • György Balogh at Nov 20, 2013 at 1:13 pm
    Some more investigations: same thing happens if I insert it to text table
    instead of parquet.
    Gyorgy
    On Wednesday, November 20, 2013 1:51:21 PM UTC+1, György Balogh wrote:

    I run it with debug mode impala with the same result:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Error communicating with impalad: TSocket read 0 bytes
    Query failed
    [Not connected] >

    I don't know where to look for core dump file if there is any.
    Gyorgy

    On Wednesday, November 20, 2013 1:35:19 PM UTC+1, György Balogh wrote:

    I forgot, GLOG is set to 1 according to CM.
    Gyorgy

    On Wed, Nov 20, 2013 at 1:29 PM, György Balogh wrote:

    Thank you!

    I could reproduce this problem on two different machines with two
    different data sets. I always insert data from a large text table to a
    parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing
    happens. So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load
    the firs 10 million records to the parquet table with no problem. I can
    load it many times with no prob.

    I cannot load the next 6 million records. I can query the source text
    table with no problem (i double checked i don't see any format problem,
    escaping problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is is
    not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops down
    to 6 when it fails.

    Please let me know what other information i should collect to help
    solving this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My guess
    is that your glog level is set too low. Are you running Impala without CM?
    If so, try setting the environment variable GLOG_v=1 and try your query
    again. The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory.
    Try monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a solution
    or workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7, Ubuntu
    12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4
    string, a5 string, a6 string, a7 string, a8 string, a9 string, a10 string,
    a11 string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the
    error happens. I tried to find the exact number of record I can insert but
    it turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Greg Rahn at Nov 20, 2013 at 7:09 pm
    Out of curiosity, can you successfully query 100% of the source table?
    One possibility might be that something in the source file is causing a
    backend crash when reading the source table.

    On Wed, Nov 20, 2013 at 5:13 AM, György Balogh wrote:

    Some more investigations: same thing happens if I insert it to text table
    instead of parquet.
    Gyorgy

    On Wednesday, November 20, 2013 1:51:21 PM UTC+1, György Balogh wrote:

    I run it with debug mode impala with the same result:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Error communicating with impalad: TSocket read 0 bytes
    Query failed
    [Not connected] >

    I don't know where to look for core dump file if there is any.
    Gyorgy

    On Wednesday, November 20, 2013 1:35:19 PM UTC+1, György Balogh wrote:

    I forgot, GLOG is set to 1 according to CM.
    Gyorgy

    On Wed, Nov 20, 2013 at 1:29 PM, György Balogh wrote:

    Thank you!

    I could reproduce this problem on two different machines with two
    different data sets. I always insert data from a large text table to a
    parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing
    happens. So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load
    the firs 10 million records to the parquet table with no problem. I can
    load it many times with no prob.

    I cannot load the next 6 million records. I can query the source text
    table with no problem (i double checked i don't see any format problem,
    escaping problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is
    is not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops down
    to 6 when it fails.

    Please let me know what other information i should collect to help
    solving this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My
    guess is that your glog level is set too low. Are you running Impala
    without CM? If so, try setting the environment variable GLOG_v=1 and try
    your query again. The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory.
    Try monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a
    solution or workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7,
    Ubuntu 12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4
    string, a5 string, a6 string, a7 string, a8 string, a9 string, a10 string,
    a11 string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the
    error happens. I tried to find the exact number of record I can insert but
    it turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alex Behm at Nov 20, 2013 at 7:23 pm
    Quick follow up on this.
    If you are running from the Impala shell, you can try doing:

    set abort_on_error=1;

    That option will abort queries when a parse/conversion error is
    encountered, showing you the reason for the parse error.

    You could try running a query like:
    select * from t order by somecolumn limit 10

    Note that something like 'select count(*)' will not work because we can
    avoid converting most columns for such queries.

    Hope it helps!

    Cheers,

    Alex

    On Wed, Nov 20, 2013 at 11:09 AM, Greg Rahn wrote:

    Out of curiosity, can you successfully query 100% of the source table?
    One possibility might be that something in the source file is causing a
    backend crash when reading the source table.

    On Wed, Nov 20, 2013 at 5:13 AM, György Balogh wrote:

    Some more investigations: same thing happens if I insert it to text table
    instead of parquet.
    Gyorgy

    On Wednesday, November 20, 2013 1:51:21 PM UTC+1, György Balogh wrote:

    I run it with debug mode impala with the same result:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Error communicating with impalad: TSocket read 0 bytes
    Query failed
    [Not connected] >

    I don't know where to look for core dump file if there is any.
    Gyorgy

    On Wednesday, November 20, 2013 1:35:19 PM UTC+1, György Balogh wrote:

    I forgot, GLOG is set to 1 according to CM.
    Gyorgy

    On Wed, Nov 20, 2013 at 1:29 PM, György Balogh wrote:

    Thank you!

    I could reproduce this problem on two different machines with two
    different data sets. I always insert data from a large text table to a
    parquet file.

    I tried it with impala 1.1 and with the 1.2 beta too. Same thing
    happens. So here are the detailes.

    Last operation:

    [bogyom-hadup1.kurthq.local:21000] > insert into pix_p select * from
    pix_load1;
    Query: insert into pix_p select * from pix_load1
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed
    [Not connected] >

    I removed all logs before and attached the non empty ones in
    /var/log/impalad generated during this query.

    I use CM. I have a larger machine now with 32 GB Ram and with 6 disks.

    I suspect it is related to the content of the data. I can always load
    the firs 10 million records to the parquet table with no problem. I can
    load it many times with no prob.

    I cannot load the next 6 million records. I can query the source text
    table with no problem (i double checked i don't see any format problem,
    escaping problems etc, it is a tab separated csv).

    But I narrowed it down and around some point the error happens but is
    is not dereministic so I cannot point out a single source record.

    htop shows a 8GB mem usage (out of 32) during the insert and drops
    down to 6 when it fails.

    Please let me know what other information i should collect to help
    solving this.
    Thank you
    Gyorgy






    On Wednesday, November 20, 2013 1:24:57 AM UTC+1, Skye Wanderman-Milne
    wrote:
    Hi Gyorgy,

    It's strange you're not seeing any errors in the impalad logs. My
    guess is that your glog level is set too low. Are you running Impala
    without CM? If so, try setting the environment variable GLOG_v=1 and try
    your query again. The log should provide some clues as to what's going on.

    As to the actual problem, I suspect you may be running out of memory.
    Try monitoring impalad's memory usage with top, or the log file should also
    confirm this.

    Let me know if you discover anything, and we can try to find a
    solution or workaround from there.

    Skye

    On Sat, Nov 16, 2013 at 8:59 AM, György Balogh wrote:

    Hi,

    I've seen this error in earlier posts but it might be a different
    one.

    I have a one machine installation (one disk, 8 GB RAM, Core I7,
    Ubuntu 12.04, Impala 1.1.1-1.p0.17)

    I uploaded a text file to hdfs, 6.5 million records 20 fields all
    strings. The file size is 2.5 GiB. It has ',' separator '\\' as escape.

    Create external table works fine:

    create external table text2 (a1 string, a2 string, a3 string, a4
    string, a5 string, a6 string, a7 string, a8 string, a9 string, a10 string,
    a11 string, a12 string, a13 string, a14 string, a15 string, a16 string, a17
    string, a18 string, a19 string, a20 string) row format delimited fields
    terminated by ',' escaped by '\\' stored as textfile location '/tmp/text2';

    I can query the table, tested with more queries, it looks ok.

    Than I created a parquet file:

    create table b1 like text2 stored as parquetfile;

    than insert the records from text2 to b1:

    insert into b1 select * from text2 limit 900000;

    Query: insert into b1 select * from text2 limit 900000
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    If I insert fewer records (like 700000) it works. Around 900000 the
    error happens. I tried to find the exact number of record I can insert but
    it turned out to be non-deterministic.

    /var/log/impalad does not show any errors.

    Thank you,
    Gyorgy




    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedNov 16, '13 at 4:59p
activeNov 20, '13 at 7:23p
posts8
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase