I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
seeing some good performance, although I'm hitting a strange issue with one
query that is returning 5 million rows -- Impala tells me within a few
seconds that the query is finished, but then it takes 20 minutes to fetch
the rows. Is this a known issue with a work-around?
Here are some relevant bits from my query log:
Query Timeline: 21m18s
- Start execution: 3.778us (3.778us)
- Planning finished: 58.781ms (58.778ms)
- Rows available: 728.982ms (670.200ms)
- First row fetched: 731.956ms (2.973ms)
- Unregister query: 21m18s (21m17s)
ImpalaServer:
- RowMaterializationTimer: 14s901ms
Execution Profile 8ad5e4df095341d3:9c105c56e6a5bc7c:(Active: 22s241ms, %
non-child: 0.00%)
- FinalizationTimer: 0ns
Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
- AverageThreadTokens: 1.00
- RowsProduced: 5.23M (5233538)
CodeGen:(Active: 70.136ms, % non-child: 0.33%)
- CodegenTime: 0ns
- CompileTime: 64.257ms
- LoadTime: 5.879ms
- ModuleFileSize: 73.10 KB
EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
- BytesReceived: 96.19 MB
- ConvertRowBatchTime: 158.656ms
- DataArrivalWaitTime: 21s334ms
- DeserializeRowBatchTimer: 698.871ms
- FirstBatchArrivalWaitTime: 0ns
- MemoryUsed: 0.00
- RowsReturned: 5.23M (5233538)
- RowsReturnedRate: 242.92 K/sec
- SendersBlockedTimer: 18m45s
- SendersBlockedTotalTimer(*): 5h22m
I'm happy to send over more of the query profile off-list if that's
helpful, but I was wondering if there's something obvious just from that
much. I'm just piping the results to /dev/null [1] , so I don't think it's
a local disk write issue on the client.
Thanks,
Joe
[1] I'm issuing: time impala-shell -f query1.sql > /dev/null