Hi,
I'm experiencing crash when running some simple aggregate Impala queries.
This might turn out to be just memory allocation problem, but thought I
would ask.
As an example, I running:
select employee_record.zip as zip , min(employee_record.yearly_gross) as
min_salary, max(employee_record.yearly_gross) as max_salary from
employee_record group by employee_record.zip;
over a test test consisting of 44M records. These are simulated data
approximating gross salary by zip.
This is on a test 3 node cluster on EC2 spun up by whirr per the
instructions at
http://blog.cloudera.com/blog/2013/02/from-zero-to-impala-in-minutes/
Although I can do select count against the set successfully, I receive the
following for the aggregate
Unknown Exception : [Errno 104] Connection reset by peer
Query aborted, unable to fetch data
[Not connected]
when running in impala-shell.
When I examine nohup.out for impala:
Plan Fragment 1
RANDOM
STREAM DATA SINK
EXCHANGE ID: 2
UNPARTITIONED
AGGREGATE
OUTPUT: MIN(employee_record.yearly_gross),
MAX(employee_record.yearly_gross)
GROUP BY: employee_record.zip
TUPLE IDS: 1
SCAN HDFS table=default.employee_record #partitions=1 size=1.33GB (0)
TUPLE IDS: 0
13/03/14 04:44:10 WARN hdfs.DomainSocketFactory: error creating DomainSocket
java.net.ConnectException: connect(2) error: No such file or directory when
trying to connect to '/var/run/hadoop-hdfs/dn.50010'
at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
at
org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:234)
at
org.apache.hadoop.hdfs.DomainSocketFactory.create(DomainSocketFactory.java:116)
at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:947)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:713)
at
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:123)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGILL (0x4) at pc=0x00007f9d5b6d25ac, pid=31684, tid=140313778509568
#
# JRE version: 6.0_24-b24
# Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64
compressed oops)
# Derivative: IcedTea6 1.11.3
# Distribution: CentOS release 6.2 (Final), package
rhel-1.48.1.11.3.el6_2-x86_64
# Problematic frame:
# C 0x00007f9d5b6d25ac
#
# An error report file with more information is saved as:
# /home/users/impala/hs_err_pid31684.log
Suggestions?
C