FAQ
thanks Alan - will it be expected that each time some data is
inserted/loaded, we have to pay a penalty in performance for the first
query, after the first time net-new data has gone in to Impala then, until
this is resolved?

cheers,
Matt
On Tuesday, June 18, 2013 2:23:30 PM UTC-7, Alan wrote:

Hi,

Yes, Impala does take a long time to load such an extremely wide table.
I've filed JIRA IMPALA-428 to track it. Thanks for reporting it.!

Thanks,
Alan


On Tue, Jun 18, 2013 at 4:02 AM, Neeraj Chaplot <gee...@gmail.com<javascript:>
wrote:
The query plan is :

Query (id=c41c881772ac72e:9e1630860f637eba):
Summary:
Start Time: 2013-06-18 15:01:56
End Time: 2013-06-18 15:46:13
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 1.0.1 RELEASE (build df844fb967cec8740f08dfb8b21962bc053527ef)
User: root
Default Db: default
Sql Statement: select count(1) from imp_ext_test
Plan:
----------------
PLAN FRAGMENT 0
PARTITION: UNPARTITIONED

3:AGGREGATE
output: SUM(<slot 0>)
group by:
tuple ids: 1
2:EXCHANGE
tuple ids: 1

PLAN FRAGMENT 1
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 2
UNPARTITIONED

1:AGGREGATE
output: COUNT(1)
group by:
tuple ids: 1
0:SCAN HDFS
table=default.imp_ext_test #partitions=1 size=695.17MB
tuple ids: 0
----------------
Query Timeline: 44m17s
- Start execution: 2.461ms (2.461ms)
- Planning finished: 44m6s (44m5s)
- Rows available: 44m16s (10s822ms)
- First row fetched: 44m17s (443.984ms)
- Unregister query: 44m17s (2.778ms)
ImpalaServer:
- ClientFetchWaitTimer: 444.987ms
- RowMaterializationTimer: 21.879us
Execution Profile c41c881772ac72e:9e1630860f637eba:(Active: 10s821ms, % non-child: 0.00%)
- FinalizationTimer: 0ns
Coordinator Fragment:(Active: 10s589ms, % non-child: 0.00%)
- AverageThreadTokens: 0.00
- RowsProduced: 1
CodeGen:(Active: 109.22ms, % non-child: 1.03%)
- CodegenTime: 506.933us
- CompileTime: 86.490ms
- LoadTime: 22.531ms
- ModuleFileSize: 74.45 KB
AGGREGATION_NODE (id=3):(Active: 10s589ms, % non-child: 0.05%)
ExecOption: Codegen Enabled
- BuildBuckets: 1.02K (1024)
- BuildTime: 2.957us
- GetResultsTime: 3.760us
- LoadFactor: 0.00
- MemoryUsed: 32.01 KB
- RowsReturned: 1
- RowsReturnedRate: 0
EXCHANGE_NODE (id=2):(Active: 10s585ms, % non-child: 99.96%)
- BytesReceived: 16.00 B
- ConvertRowBatchTime: 3.576us
- DataArrivalWaitTime: 10s585ms
- DeserializeRowBatchTimer: 4.917us
- FirstBatchArrivalWaitTime: 0ns
- MemoryUsed: 0.00
- RowsReturned: 1
- RowsReturnedRate: 0
- SendersBlockedTimer: 0ns
- SendersBlockedTotalTimer(*): 0ns
Averaged Fragment 1:(Active: 10s589ms, % non-child: 0.00%)
split sizes: min: 695.17 MB, max: 695.17 MB, avg: 695.17 MB, stddev: 0.00
completion times: min:10s590ms max:10s590ms mean: 10s590ms stddev:0ns
execution rates: min:65.64 MB/sec max:65.64 MB/sec mean:65.64 MB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 10.05
- RowsProduced: 1
CodeGen:(Active: 95.569ms, % non-child: 0.90%)
- CodegenTime: 816.588us
- CompileTime: 88.714ms
- LoadTime: 6.853ms
- ModuleFileSize: 74.45 KB
DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
- BytesSent: 16.00 B
- NetworkThroughput(*): 77.97 KB/sec
- OverallThroughput: 60.36 KB/sec
- SerializeBatchTime: 28.202us
- ThriftTransmitTime(*): 200.390us
- UncompressedRowBatchSize: 16.00 B
AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
- BuildBuckets: 1.02K (1024)
- BuildTime: 97.70us
- GetResultsTime: 4.189us
- LoadFactor: 0.00
- MemoryUsed: 32.01 KB
- RowsReturned: 1
- RowsReturnedRate: 0
HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
- AverageHdfsReadThreadConcurrency: 0.52
- AverageIoMgrQueueCapacity: 244.57
- AverageIoMgrQueueSize: 0.00
- AverageScannerThreadConcurrency: 0.14
- BytesRead: 695.17 MB
- MemoryUsed: 0.00
- NumDisksAccessed: 1
- PerReadThreadRawHdfsThroughput: 118.43 MB/sec
- RowsRead: 10.00K (10000)
- RowsReturned: 10.00K (10000)
- RowsReturnedRate: 944.00 /sec
- ScanRangesComplete: 16
- ScannerThreadsInvoluntaryContextSwitches: 70
- ScannerThreadsTotalWallClockTime: 1m32s
- DelimiterParseTime: 678.996ms
- MaterializeTupleTime(*): 74.412us
- ScannerThreadsSysTime: 12.994ms
- ScannerThreadsUserTime: 713.884ms
- ScannerThreadsVoluntaryContextSwitches: 813
- TotalRawHdfsReadTime(*): 5s869ms
- TotalReadThroughput: 66.21 MB/sec
Fragment 1:
Instance c41c881772ac72e:9e1630860f637ebc (host=impetus-i0060.impetus.co.in:22000):(Active: 10s589ms, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
- AverageThreadTokens: 10.05
- RowsProduced: 1
CodeGen:(Active: 95.569ms, % non-child: 0.90%)
- CodegenTime: 816.588us
- CompileTime: 88.714ms
- LoadTime: 6.853ms
- ModuleFileSize: 74.45 KB
DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
- BytesSent: 16.00 B
- NetworkThroughput(*): 77.97 KB/sec
- OverallThroughput: 60.36 KB/sec
- SerializeBatchTime: 28.202us
- ThriftTransmitTime(*): 200.390us
- UncompressedRowBatchSize: 16.00 B
AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
ExecOption: Codegen Enabled
- BuildBuckets: 1.02K (1024)
- BuildTime: 97.70us
- GetResultsTime: 4.189us
- LoadFactor: 0.00
- MemoryUsed: 32.01 KB
- RowsReturned: 1
- RowsReturnedRate: 0
HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
Hdfs Read Thread Concurrency Bucket: 0:47.62% 1:52.38% 2:0%
File Formats: TEXT/NONE:16
ExecOption: Codegen enabled: 16 out of 16
- AverageHdfsReadThreadConcurrency: 0.52
- AverageIoMgrQueueCapacity: 244.57
- AverageIoMgrQueueSize: 0.00
- AverageScannerThreadConcurrency: 0.14
- BytesRead: 695.17 MB
- MemoryUsed: 0.00
- NumDisksAccessed: 1
- PerReadThreadRawHdfsThroughput: 118.43 MB/sec
- RowsRead: 10.00K (10000)
- RowsReturned: 10.00K (10000)
- RowsReturnedRate: 944.00 /sec
- ScanRangesComplete: 16
- ScannerThreadsInvoluntaryContextSwitches: 70
- ScannerThreadsTotalWallClockTime: 1m32s
- DelimiterParseTime: 678.996ms
- MaterializeTupleTime(*): 74.412us
- ScannerThreadsSysTime: 12.994ms
- ScannerThreadsUserTime: 713.884ms
- ScannerThreadsVoluntaryContextSwitches: 813
- TotalRawHdfsReadTime(*): 5s869ms
- TotalReadThroughput: 66.21 MB/sec



On Tue, Jun 18, 2013 at 4:11 PM, <gee...@gmail.com <javascript:>> wrote:

Hi All,

I have an impala external table which has about 10k column. When I fire
select count(1) on that table it takes more than 10 mins for the first
time. Next time any select or aggregate returns in sub seconds.

I saw impala server logs and found that "catalog.HdfsTable: load table"
is taking lot of time.

When I refresh the impala cache and fire any query it takes lot of time
for first query then onwards it very fast.

Logs :

13/06/18 15:01:56 INFO service.Frontend: analyze query select count(1)
from imp_ext_test
13/06/18 15:01:57 INFO catalog.HdfsTable: load table imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: load partition block md for
imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
PartitionBlockMetadata{#blocks=0, #filenames=0, totalStringLen=0}
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
PartitionBlockMetadata{#blocks=16, #filenames=6, totalStringLen=420}
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded disk ids for table
default.imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: 1

Could you please provide some help on reducing the load table time. My
table has about 10k columns.

Thanks

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 18, '13 at 10:41a
activeJun 18, '13 at 11:33p
posts4
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase