FAQ
Hi,
I'm experiencing crash when running some simple aggregate Impala queries.
This might turn out to be just memory allocation problem, but thought I
would ask.
As an example, I running:

select employee_record.zip as zip , min(employee_record.yearly_gross) as
min_salary, max(employee_record.yearly_gross) as max_salary from
employee_record group by employee_record.zip;

over a test test consisting of 44M records. These are simulated data
approximating gross salary by zip.

This is on a test 3 node cluster on EC2 spun up by whirr per the
instructions at
http://blog.cloudera.com/blog/2013/02/from-zero-to-impala-in-minutes/

Although I can do select count against the set successfully, I receive the
following for the aggregate

Unknown Exception : [Errno 104] Connection reset by peer
Query aborted, unable to fetch data
[Not connected]

when running in impala-shell.

When I examine nohup.out for impala:

Plan Fragment 1
RANDOM
STREAM DATA SINK
EXCHANGE ID: 2
UNPARTITIONED

AGGREGATE
OUTPUT: MIN(employee_record.yearly_gross),
MAX(employee_record.yearly_gross)
GROUP BY: employee_record.zip
TUPLE IDS: 1
SCAN HDFS table=default.employee_record #partitions=1 size=1.33GB (0)
TUPLE IDS: 0

13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating DomainSocket
java.net.ConnectException: connect(2) error: No such file or directory when
trying to connect to '/var/run/hadoop-hdfs/dn.50010'
at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
at
org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:234)
at
org.apache.hadoop.hdfs.DomainSocketFactory.create(DomainSocketFactory.java:116)
at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:947)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:713)
at
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:123)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGILL (0x4) at pc=0x00007f9d5b6d25ac, pid=31684, tid=140313778509568
#
# JRE version: 6.0_24-b24
# Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64
compressed oops)
# Derivative: IcedTea6 1.11.3
# Distribution: CentOS release 6.2 (Final), package
rhel-1.48.1.11.3.el6_2-x86_64
# Problematic frame:
# C 0x00007f9d5b6d25ac
#
# An error report file with more information is saved as:
# /home/users/impala/hs_err_pid31684.log

Suggestions?
C

Search Discussions

  • Marcel Kornacker at Mar 14, 2013 at 2:19 pm
    You may indeed be running out of memory. In order to verify this,
    could you monitor the memory consumption of the impalad process while
    the query is running?

    We'll be introducing per-query and per-process memory limits in the
    next version (0.7), which we're planning on releasing next week.
    On Wed, Mar 13, 2013 at 9:53 PM, Charles Earl wrote:
    Hi,
    I'm experiencing crash when running some simple aggregate Impala queries.
    This might turn out to be just memory allocation problem, but thought I
    would ask.
    As an example, I running:

    select employee_record.zip as zip , min(employee_record.yearly_gross) as
    min_salary, max(employee_record.yearly_gross) as max_salary from
    employee_record group by employee_record.zip;

    over a test test consisting of 44M records. These are simulated data
    approximating gross salary by zip.

    This is on a test 3 node cluster on EC2 spun up by whirr per the
    instructions at
    http://blog.cloudera.com/blog/2013/02/from-zero-to-impala-in-minutes/

    Although I can do select count against the set successfully, I receive the
    following for the aggregate

    Unknown Exception : [Errno 104] Connection reset by peer
    Query aborted, unable to fetch data
    [Not connected]

    when running in impala-shell.

    When I examine nohup.out for impala:

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: MIN(employee_record.yearly_gross),
    MAX(employee_record.yearly_gross)
    GROUP BY: employee_record.zip
    TUPLE IDS: 1
    SCAN HDFS table=default.employee_record #partitions=1 size=1.33GB (0)
    TUPLE IDS: 0

    13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating DomainSocket
    java.net.ConnectException: connect(2) error: No such file or directory when
    trying to connect to '/var/run/hadoop-hdfs/dn.50010'
    at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
    at
    org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:234)
    at
    org.apache.hadoop.hdfs.DomainSocketFactory.create(DomainSocketFactory.java:116)
    at
    org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:947)
    at
    org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
    at
    org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
    at
    org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:713)
    at
    org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:123)
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGILL (0x4) at pc=0x00007f9d5b6d25ac, pid=31684, tid=140313778509568
    #
    # JRE version: 6.0_24-b24
    # Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64
    compressed oops)
    # Derivative: IcedTea6 1.11.3
    # Distribution: CentOS release 6.2 (Final), package
    rhel-1.48.1.11.3.el6_2-x86_64
    # Problematic frame:
    # C 0x00007f9d5b6d25ac
    #
    # An error report file with more information is saved as:
    # /home/users/impala/hs_err_pid31684.log

    Suggestions?
    C
  • Charles Earl at Mar 14, 2013 at 2:56 pm
    Thanks,
    I'm assuming I'll be able to point to .7 next week by editing impala.properties?
    C
    On Mar 14, 2013, at 10:19 AM, Marcel Kornacker wrote:

    You may indeed be running out of memory. In order to verify this,
    could you monitor the memory consumption of the impalad process while
    the query is running?

    We'll be introducing per-query and per-process memory limits in the
    next version (0.7), which we're planning on releasing next week.
    On Wed, Mar 13, 2013 at 9:53 PM, Charles Earl wrote:
    Hi,
    I'm experiencing crash when running some simple aggregate Impala queries.
    This might turn out to be just memory allocation problem, but thought I
    would ask.
    As an example, I running:

    select employee_record.zip as zip , min(employee_record.yearly_gross) as
    min_salary, max(employee_record.yearly_gross) as max_salary from
    employee_record group by employee_record.zip;

    over a test test consisting of 44M records. These are simulated data
    approximating gross salary by zip.

    This is on a test 3 node cluster on EC2 spun up by whirr per the
    instructions at
    http://blog.cloudera.com/blog/2013/02/from-zero-to-impala-in-minutes/

    Although I can do select count against the set successfully, I receive the
    following for the aggregate

    Unknown Exception : [Errno 104] Connection reset by peer
    Query aborted, unable to fetch data
    [Not connected]

    when running in impala-shell.

    When I examine nohup.out for impala:

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: MIN(employee_record.yearly_gross),
    MAX(employee_record.yearly_gross)
    GROUP BY: employee_record.zip
    TUPLE IDS: 1
    SCAN HDFS table=default.employee_record #partitions=1 size=1.33GB (0)
    TUPLE IDS: 0

    13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating DomainSocket
    java.net.ConnectException: connect(2) error: No such file or directory when
    trying to connect to '/var/run/hadoop-hdfs/dn.50010'
    at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
    at
    org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:234)
    at
    org.apache.hadoop.hdfs.DomainSocketFactory.create(DomainSocketFactory.java:116)
    at
    org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:947)
    at
    org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
    at
    org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
    at
    org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:713)
    at
    org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:123)
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGILL (0x4) at pc=0x00007f9d5b6d25ac, pid=31684, tid=140313778509568
    #
    # JRE version: 6.0_24-b24
    # Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64
    compressed oops)
    # Derivative: IcedTea6 1.11.3
    # Distribution: CentOS release 6.2 (Final), package
    rhel-1.48.1.11.3.el6_2-x86_64
    # Problematic frame:
    # C 0x00007f9d5b6d25ac
    #
    # An error report file with more information is saved as:
    # /home/users/impala/hs_err_pid31684.log

    Suggestions?
    C
  • Lenni Kuff at Mar 14, 2013 at 3:35 pm
    Hi Charles,
    You should be able to update to the latest Impala version by running "yum
    upgrade <package>" on each node. Alternatively, if you re-run the
    zero-to-impala scripts after v0.7 is released it should automatically
    install the latest Impala version.

    I also think you may be running into a problem with short-circuit reads not
    being enabled properly based on the error:

    13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating DomainSocket
    java.net.ConnectException: connect(2) error: No such file or directory
    when
    trying to connect to '/var/run/hadoop-hdfs/dn.50010'
    Which CDH and Impala versions are you running? Note that Impala requires
    CDH4.2+ (starting with Impala the v0.6 release). I noticed the
    Zero-to-Impala blog has some incorrect information on the supported
    versions as well some updates needed to properly set the core/hdfs-site
    config files. We will be updating the blog post with the new information
    very soon.

    Thanks,
    Lenni
    On Thu, Mar 14, 2013 at 7:56 AM, Charles Earl wrote:

    Thanks,
    I'm assuming I'll be able to point to .7 next week by editing
    impala.properties?
    C
    On Mar 14, 2013, at 10:19 AM, Marcel Kornacker wrote:

    You may indeed be running out of memory. In order to verify this,
    could you monitor the memory consumption of the impalad process while
    the query is running?

    We'll be introducing per-query and per-process memory limits in the
    next version (0.7), which we're planning on releasing next week.
    On Wed, Mar 13, 2013 at 9:53 PM, Charles Earl wrote:
    Hi,
    I'm experiencing crash when running some simple aggregate Impala
    queries.
    This might turn out to be just memory allocation problem, but thought I
    would ask.
    As an example, I running:

    select employee_record.zip as zip , min(employee_record.yearly_gross) as
    min_salary, max(employee_record.yearly_gross) as max_salary from
    employee_record group by employee_record.zip;

    over a test test consisting of 44M records. These are simulated data
    approximating gross salary by zip.

    This is on a test 3 node cluster on EC2 spun up by whirr per the
    instructions at
    http://blog.cloudera.com/blog/2013/02/from-zero-to-impala-in-minutes/

    Although I can do select count against the set successfully, I receive
    the
    following for the aggregate

    Unknown Exception : [Errno 104] Connection reset by peer
    Query aborted, unable to fetch data
    [Not connected]

    when running in impala-shell.

    When I examine nohup.out for impala:

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: MIN(employee_record.yearly_gross),
    MAX(employee_record.yearly_gross)
    GROUP BY: employee_record.zip
    TUPLE IDS: 1
    SCAN HDFS table=default.employee_record #partitions=1 size=1.33GB (0)
    TUPLE IDS: 0

    13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating
    DomainSocket
    java.net.ConnectException: connect(2) error: No such file or directory
    when
    trying to connect to '/var/run/hadoop-hdfs/dn.50010'
    at org.apache.hadoop.net.unix.DomainSocket.connect0(Native
    Method)
    at
    org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:234)
    at
    org.apache.hadoop.hdfs.DomainSocketFactory.create(DomainSocketFactory.java:116)
    at
    org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:947)
    at
    org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
    at
    org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
    at
    org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:713)
    at
    org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:123)
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGILL (0x4) at pc=0x00007f9d5b6d25ac, pid=31684, tid=140313778509568
    #
    # JRE version: 6.0_24-b24
    # Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64
    compressed oops)
    # Derivative: IcedTea6 1.11.3
    # Distribution: CentOS release 6.2 (Final), package
    rhel-1.48.1.11.3.el6_2-x86_64
    # Problematic frame:
    # C 0x00007f9d5b6d25ac
    #
    # An error report file with more information is saved as:
    # /home/users/impala/hs_err_pid31684.log

    Suggestions?
    C

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 14, '13 at 4:53a
activeMar 14, '13 at 3:35p
posts4
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase