I'm encountering the same issue with the latest impala release:
Shell version: Impala Shell v1.0 (d1bf0d1) built on Sun Apr 28 15:33:52 PDT
2013
Server version: impalad version 1.0 RELEASE (build
d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
I have a large amount of test data in gzip text files in s3 that I'm trying
to get into impala / parquet for testing. I've created an external table
in Hive and pulled it into a partitioned RCFile format table that I can
access from impala and query fine. Attempting to insert overwrite into an
equivalent table with PARQUETFILE format results in the shell error: 'Unknown
Exception : [Errno 104] Connection reset by peer Query failed'. The same
query (without the insert) in impala-shell also works.
The impalad.INFO log looks like it's making progress but just stops. The
statestored.INFO log shows that it starts getting connection refused on
port 23000. This is running on AWS.
impalad.INFO log:
INFO0505 19:22:45.016000 Thread-9 com.cloudera.impala.service.JniFrontend]
PLAN FRAGMENT 0
PARTITION: HASH_PARTITIONED: account_id
WRITE TO HDFS table=default.datacube_oct
overwrite=true
partitions: account_id
1:EXCHANGE
tuple ids: 0
PLAN FRAGMENT 1
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 1
HASH_PARTITIONED: account_id
0:SCAN HDFS
table=default.datacube_rc #partitions=253 size=25.74GB
predicates: month(time_id) = 10
tuple ids: 0
...
I0505 19:22:45.173516 24631 plan-fragment-executor.cc:213] Open():
instance_id=d710fdcc1afe4652:b09975f58897eb3e
I0505 19:22:45.173894 24632 coordinator.cc:571] Coordinator waiting for
backends to finish, 2 remaining
I0505 19:23:02.175143 24426 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 2% Complete (13 out of 649)
I0505 19:23:27.176903 24426 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 4% Complete (28 out of 649)
I0505 19:24:07.180315 24426 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 6% Complete (41 out of 649)
I0505 19:25:12.185873 24544 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 8% Complete (56 out of 649)
I0505 19:25:52.188451 24544 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 10% Complete (65 out of 649)
I0505 19:27:07.194136 24426 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 12% Complete (79 out of 649)
I0505 19:27:42.196530 24426 progress-updater.cc:55] Query
d710fdcc1afe4652:b09975f58897eb3c: 14% Complete (93 out of 649)
<log just ends here>
statestored.INFO log:
I0506 18:22:53.527773 8763 client-cache.cc:98] CreateClient(): adding new
client for ip-<ipaddress>.ec2.internal:23000
I0506 18:22:53.528506 8763 thrift-util.cc:85] TSocket::open() connect()
<Host: ip-<ipaddress>.ec2.internal Port: 23000>Connection refused
I0506 18:22:53.573663 8763 status.cc:42] Couldn't open transport for ip-
<ipaddress>.ec2.internal:23000(connect() failed: Connection refused)
@ 0x545fce impala::Status::Status()
@ 0x520256 impala::ThriftClientImpl::Open()
@ 0x4f1ed0 impala::ClientCacheHelper::CreateClient()
@ 0x4f21ac impala::ClientCacheHelper::ReopenClient()
@ 0x533f3d impala::StateStore::ProcessOneSubscriber()
@ 0x536dc4 impala::StateStore::SubscriberUpdateLoop()
@ 0x5d8953 thread_proxy
@ 0x7fd5857e5e9a start_thread
@ 0x7fd5846eb4bd (unknown)
On Wednesday, April 24, 2013 3:05:59 AM UTC-7, jaidee...@inmobi.com wrote:
Hi,
I have a large table with 800M records in RCFile format.
I am creating another table with 'STORED as PARQUETFILE' with schema same
as the first table.
*insert overwrite table pq_network_Fact partition (day_key)*
*select .... from rc_network_fact;*
When I try to insert data into the parquet table, query fails after a
while with 'Unknown Exception : [Errno 104] Connection reset by peer
Query failed'
Any guess as to why this could be happening?
Thanks,
Jaideep