FAQ
Hi,

Is there any way to troubleshoot this error (log, ...)? or this is
considered normal behavior?

Thanks,

Manuel
On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

Hi,

I execute the following command to populate a parquet based table with the
content of a regular text file based table (10 Millions of lines).

[localhost.localdomain:21000] > insert into table pflows select * from
flows;
Query: insert into table pflows select * from flows
Unknown Exception : [Errno 104] Connection reset by peer
Query failed

the last lines in log file are:

I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed: 83846446
I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
instance_id=f740b9f93ad75f7d:3a4801886800adbb
I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator waiting for
backends to finish, 1 remaining

Is there anyway to troubleshoot this?

Thanks,

Manuel

Search Discussions

  • Marcel Kornacker at Jun 27, 2013 at 8:38 pm
    Could you send us the complete log file(s)?
    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table with the
    content of a regular text file based table (10 Millions of lines).

    [localhost.localdomain:21000] > insert into table pflows select * from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed: 83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
    instance_id=f740b9f93ad75f7d:3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator waiting for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel
  • Manuel Stopnicki at Jun 28, 2013 at 5:33 am
    Here it is. Let me know if you need anything else.

    For info the same query with a text base table works perfectly fine. looks
    like the issue is with the parquetfile format.

    Thanks,

    Manuel

    On Thursday, June 27, 2013 1:38:30 PM UTC-7, Marcel Kornacker wrote:

    Could you send us the complete log file(s)?
    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table with
    the
    content of a regular text file based table (10 Millions of lines).

    [localhost.localdomain:21000] > insert into table pflows select * from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed:
    83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
    instance_id=f740b9f93ad75f7d:3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator waiting for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel
  • Manuel Stopnicki at Jul 2, 2013 at 3:30 am
    Hi Alan,

    Unfortunately with a smaller dataset (1M instead of 10M lines) it works
    fine, and I cannot send the largest set, sorry. I tried to enable core
    dumps on the cloudera virtual machine but without much success. Are dumps
    enabled by default, if yes where should I look? If not directions to enable
    them?

    Thanks,

    Manuel

    On Mon, Jul 1, 2013 at 3:48 PM, Alan Choi wrote:

    Hi Manuel,

    The log indicates that Impala is in the middle of the execution. Does
    Impala crash? If so, do you have a core-dump? Or, even better if you can
    repro it with a small data set and share the data set and table definition
    with us?

    Thanks,
    Alan

    On Thu, Jun 27, 2013 at 10:33 PM, Manuel Stopnicki wrote:

    Here it is. Let me know if you need anything else.

    For info the same query with a text base table works perfectly fine.
    looks like the issue is with the parquetfile format.

    Thanks,

    Manuel

    On Thursday, June 27, 2013 1:38:30 PM UTC-7, Marcel Kornacker wrote:

    Could you send us the complete log file(s)?

    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki <mhs...@gmail.com>
    wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table
    with the
    content of a regular text file based table (10 Millions of lines).

    [localhost.localdomain:21000] > insert into table pflows select *
    from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed:
    83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
    instance_id=f740b9f93ad75f7d:**3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator waiting
    for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel

    --
    Best Regards,

    Manuel
  • Manuel Stopnicki at Jul 3, 2013 at 2:26 am
    Alan,

    Did follow the instructions but did not have luck in generating a core file
    in the indicated folder.

    What would be the other best way to get a csv file into impala with parquet
    storage?

    Thanks,

    Manuel

    On Tue, Jul 2, 2013 at 4:21 PM, Alan Choi wrote:


    Hi Manuel,

    Can you try the following to enable core dump?

    1. go to /usr/lib/impala
    2. rm sbin
    3. ln -s sbin-debug sbin

    Restart impala from CM. The core file should be in
    /var/run/cloudera-scm-agent/process/IMPALA-<id>

    Thanks,
    Alan

    On Mon, Jul 1, 2013 at 8:30 PM, Manuel Stopnicki wrote:

    Hi Alan,

    Unfortunately with a smaller dataset (1M instead of 10M lines) it works
    fine, and I cannot send the largest set, sorry. I tried to enable core
    dumps on the cloudera virtual machine but without much success. Are dumps
    enabled by default, if yes where should I look? If not directions to enable
    them?

    Thanks,

    Manuel

    On Mon, Jul 1, 2013 at 3:48 PM, Alan Choi wrote:

    Hi Manuel,

    The log indicates that Impala is in the middle of the execution. Does
    Impala crash? If so, do you have a core-dump? Or, even better if you can
    repro it with a small data set and share the data set and table definition
    with us?

    Thanks,
    Alan

    On Thu, Jun 27, 2013 at 10:33 PM, Manuel Stopnicki wrote:

    Here it is. Let me know if you need anything else.

    For info the same query with a text base table works perfectly fine.
    looks like the issue is with the parquetfile format.

    Thanks,

    Manuel

    On Thursday, June 27, 2013 1:38:30 PM UTC-7, Marcel Kornacker wrote:

    Could you send us the complete log file(s)?

    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki <mhs...@gmail.com>
    wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table
    with the
    content of a regular text file based table (10 Millions of lines).

    [localhost.localdomain:21000] > insert into table pflows select *
    from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed:
    83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
    instance_id=f740b9f93ad75f7d:**3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator waiting
    for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel

    --
    Best Regards,

    Manuel

    --
    Best Regards,

    Manuel
  • Alan Choi at Jul 3, 2013 at 9:08 pm
    Hi Manuel,

    My bad. I gave you the instruction. In CM, go to Impala -> Configuration,
    then search for "core". Enable it and restart.

    Thanks,
    Alan



    On Tue, Jul 2, 2013 at 7:24 PM, Manuel Stopnicki wrote:

    Alan,

    Did follow the instructions but did not have luck in generating a core
    file in the indicated folder.

    What would be the other best way to get a csv file into impala with
    parquet storage?

    Thanks,

    Manuel

    On Tue, Jul 2, 2013 at 4:21 PM, Alan Choi wrote:


    Hi Manuel,

    Can you try the following to enable core dump?

    1. go to /usr/lib/impala
    2. rm sbin
    3. ln -s sbin-debug sbin

    Restart impala from CM. The core file should be in
    /var/run/cloudera-scm-agent/process/IMPALA-<id>

    Thanks,
    Alan

    On Mon, Jul 1, 2013 at 8:30 PM, Manuel Stopnicki wrote:

    Hi Alan,

    Unfortunately with a smaller dataset (1M instead of 10M lines) it works
    fine, and I cannot send the largest set, sorry. I tried to enable core
    dumps on the cloudera virtual machine but without much success. Are dumps
    enabled by default, if yes where should I look? If not directions to enable
    them?

    Thanks,

    Manuel

    On Mon, Jul 1, 2013 at 3:48 PM, Alan Choi wrote:

    Hi Manuel,

    The log indicates that Impala is in the middle of the execution. Does
    Impala crash? If so, do you have a core-dump? Or, even better if you can
    repro it with a small data set and share the data set and table definition
    with us?

    Thanks,
    Alan

    On Thu, Jun 27, 2013 at 10:33 PM, Manuel Stopnicki wrote:

    Here it is. Let me know if you need anything else.

    For info the same query with a text base table works perfectly fine.
    looks like the issue is with the parquetfile format.

    Thanks,

    Manuel

    On Thursday, June 27, 2013 1:38:30 PM UTC-7, Marcel Kornacker wrote:

    Could you send us the complete log file(s)?

    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki <mhs...@gmail.com>
    wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table
    with the
    content of a regular text file based table (10 Millions of lines).

    [localhost.localdomain:21000] > insert into table pflows select *
    from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed:
    83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210] Open():
    instance_id=f740b9f93ad75f7d:**3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator
    waiting for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel

    --
    Best Regards,

    Manuel

    --
    Best Regards,

    Manuel
  • Manuel Stopnicki at Jul 4, 2013 at 8:58 pm
    Alan,

    Just tried with the new settings and still not able to get core files. the
    process does stop and restart because I see different pid before and after
    executing the query but no core (tried both normal and debug server)
    anything to do with the Cloudera VM?

    Thanks,

    Manuel

    On Wed, Jul 3, 2013 at 4:07 PM, Alan Choi wrote:

    Hi Manuel,

    My bad. I gave you the instruction. In CM, go to Impala -> Configuration,
    then search for "core". Enable it and restart.

    Thanks,
    Alan



    On Tue, Jul 2, 2013 at 7:24 PM, Manuel Stopnicki wrote:

    Alan,

    Did follow the instructions but did not have luck in generating a core
    file in the indicated folder.

    What would be the other best way to get a csv file into impala with
    parquet storage?

    Thanks,

    Manuel

    On Tue, Jul 2, 2013 at 4:21 PM, Alan Choi wrote:


    Hi Manuel,

    Can you try the following to enable core dump?

    1. go to /usr/lib/impala
    2. rm sbin
    3. ln -s sbin-debug sbin

    Restart impala from CM. The core file should be in
    /var/run/cloudera-scm-agent/process/IMPALA-<id>

    Thanks,
    Alan

    On Mon, Jul 1, 2013 at 8:30 PM, Manuel Stopnicki wrote:

    Hi Alan,

    Unfortunately with a smaller dataset (1M instead of 10M lines) it works
    fine, and I cannot send the largest set, sorry. I tried to enable core
    dumps on the cloudera virtual machine but without much success. Are dumps
    enabled by default, if yes where should I look? If not directions to enable
    them?

    Thanks,

    Manuel

    On Mon, Jul 1, 2013 at 3:48 PM, Alan Choi wrote:

    Hi Manuel,

    The log indicates that Impala is in the middle of the execution. Does
    Impala crash? If so, do you have a core-dump? Or, even better if you can
    repro it with a small data set and share the data set and table definition
    with us?

    Thanks,
    Alan

    On Thu, Jun 27, 2013 at 10:33 PM, Manuel Stopnicki wrote:

    Here it is. Let me know if you need anything else.

    For info the same query with a text base table works perfectly fine.
    looks like the issue is with the parquetfile format.

    Thanks,

    Manuel

    On Thursday, June 27, 2013 1:38:30 PM UTC-7, Marcel Kornacker wrote:

    Could you send us the complete log file(s)?

    On Thu, Jun 27, 2013 at 1:33 PM, Manuel Stopnicki <mhs...@gmail.com>
    wrote:
    Hi,

    Is there any way to troubleshoot this error (log, ...)? or this is
    considered normal behavior?

    Thanks,

    Manuel
    On Monday, June 24, 2013 8:20:45 PM UTC-7, Manuel Stopnicki wrote:

    Hi,

    I execute the following command to populate a parquet based table
    with the
    content of a regular text file based table (10 Millions of
    lines).
    [localhost.localdomain:21000] > insert into table pflows select *
    from
    flows;
    Query: insert into table pflows select * from flows
    Unknown Exception : [Errno 104] Connection reset by peer
    Query failed

    the last lines in log file are:

    I0624 23:17:13.446447 31902 hdfs-table-sink.cc:86] Random seed:
    83846446
    I0624 23:17:13.517690 31904 plan-fragment-executor.cc:210]
    Open():
    instance_id=f740b9f93ad75f7d:**3a4801886800adbb
    I0624 23:17:13.518174 31906 coordinator.cc:588] Coordinator
    waiting for
    backends to finish, 1 remaining

    Is there anyway to troubleshoot this?

    Thanks,

    Manuel

    --
    Best Regards,

    Manuel

    --
    Best Regards,

    Manuel

    --
    Best Regards,

    Manuel
  • Vikas Singh at Jul 5, 2013 at 3:34 pm
    Hi Manuel,

    At what location did you look for core? By default the core file goes in
    the cwd of the process which is a subdirectory of
    /var/run/cloudera-scm-agent/process. You can also find the cwd of process
    by looking at its environment in /proc/pid/environ.
    You can also confirm the ulimit of the process to make sure that dumping
    core is enabled.

    - Vikas
  • Manuel Stopnicki at Jul 6, 2013 at 5:47 pm
    Hi Vikas,

    I believe I'm looking at the right place:

    /var/run/cloudera-scm-agent/process/31-impala-IMPALAD

    sudo cat /proc/57055/limits
    Limit Soft Limit Hard Limit Units

    Max cpu time unlimited unlimited seconds

    Max file size unlimited unlimited bytes

    Max data size unlimited unlimited bytes

    Max stack size 10485760 unlimited bytes

    Max core file size unlimited unlimited bytes

    Max resident set unlimited unlimited bytes

    Max processes 65536 65536
      processes
    Max open files 32768 32768 files

    Max locked memory 65536 65536 bytes

    Max address space unlimited unlimited bytes

    Max file locks unlimited unlimited locks

    Max pending signals 43676 43676 signals

    Max msgqueue size 819200 819200 bytes

    Max nice priority 0 0
    Max realtime priority 0 0
    Max realtime timeout unlimited unlimited us

    Seems to indicate that the core file can be generated. But when I reproduce
    the error (and get a new impala process) the folder contains only:

    [root@localhost 31-impala-IMPALAD]# ls
    cloudera-monitor.properties hbase-conf impala-conf logs
    hadoop-conf hive-conf impala.keytab log-whitelist.json

    What am'I doing wrong?

    Thanks,

    Manuel

    On Fri, Jul 5, 2013 at 10:34 AM, Vikas Singh wrote:

    Hi Manuel,

    At what location did you look for core? By default the core file goes in
    the cwd of the process which is a subdirectory of
    /var/run/cloudera-scm-agent/process. You can also find the cwd of process
    by looking at its environment in /proc/pid/environ.
    You can also confirm the ulimit of the process to make sure that dumping
    core is enabled.

    - Vikas

    --
    Best Regards,

    Manuel
  • Vikas Singh at Jul 7, 2013 at 1:35 am
    Hi Manuel,

    Can you check the value of core_pattern to make sure its not set to dump
    core to a different location (cat /proc/sys/kernel/core_pattern).

    As the ulimit seem to be set correctly, can you run 'kill -s SIGSEGV <pid>"
    and confirm that a core is generated. That will ensure that the setup is
    correct.

    Please note that each time impalad is killed/crashes, it is restarted
    automatically (by supervisord) and its pwd changes. You can run 'ls -ltr'
    on the '/var/run/cloudera-scm-agent/process' directory to find the latest
    running Impala process.

    - Vikas
  • Manuel Stopnicki at Jul 7, 2013 at 4:28 am
    Vikas,

    This is what I get:

    [cloudera@localhost ~]$ cat /proc/sys/kernel/core_pattern
    /usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e
    Normal?

    Thanks

    Manuel

    On Sat, Jul 6, 2013 at 8:35 PM, Vikas Singh wrote:

    Hi Manuel,

    Can you check the value of core_pattern to make sure its not set to dump
    core to a different location (cat /proc/sys/kernel/core_pattern).

    As the ulimit seem to be set correctly, can you run 'kill -s SIGSEGV
    <pid>" and confirm that a core is generated. That will ensure that the
    setup is correct.

    Please note that each time impalad is killed/crashes, it is restarted
    automatically (by supervisord) and its pwd changes. You can run 'ls -ltr'
    on the '/var/run/cloudera-scm-agent/process' directory to find the latest
    running Impala process.
    - Vikas


    --
    Best Regards,

    Manuel
  • Alan Choi at Jul 10, 2013 at 9:41 pm
    Hi Manuel,

    Any luck creating the core dump?

    Thanks,
    Alan

    On Sun, Jul 7, 2013 at 9:01 PM, Vikas Singh wrote:

    Hi Manuel,

    It seems like you have Redhat's automated bug reporting tool running
    (abrtd service). The location of core file is specified as DumpLocation in
    the /etc/abrt/abrt.conf file. Default is /var/spool/abrt/. Can you please
    check if there are core dumps in that directory.

    I don't have a Redhat system with me right now, so can't test this, but
    setting "core_pattern" to normal value of "core" (sudo echo core >
    /proc/sys/kernel/core_pattern) should start generating core in the pwd of
    the process.

    - Vikas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 27, '13 at 8:33p
activeJul 10, '13 at 9:41p
posts12
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase