I was encountered with a problem that Impala query may return wrong results
when data contains '\x00'.
I made a small dataset to reproduce the bug. However, when I ran the select
query, impala-server just failed and impala-shell returned "Error
communicating with impalad: TSocket read 0 bytes", and I had to restart the
I use hive to create table, and the DDL is:
create table mytest(id int, name string, value double)
ROW FORMAT SERDE
STORED AS INPUTFORMAT
My default field terminator is '\x01' and line terminator is '\n';
The dataset is:
^A means '\x01' and ^@ means '\x00' (copied from vim).
The select query is very simple:
select * from mytest;
The results hive returned:
hive> select * from mytest;
1 test 12.3
Version of Impala is 1.0.0 and hadoop is CDH4.2.1.
The error log is (impala-server.log):
# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00000000009484d9, pid=3775, tid=140010238035712
# JRE version: 7.0_15-b20
# Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64
# Problematic frame:
[error occurred during error reporting (printing problematic frame), id 0xb]
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again #
# An error report file with more information is saved as:
[thread 140010295965440 also had an error]
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
But I don't think it is open jdk cause the bug.