FAQ
Hi Jason,

Out of the 42sec, we're spending 28sec for query planning. Impala takes
around 1sec to do partition pruning on 10k partitions. Given that you've
350k partitions, a 28sec planning time is expected.

In your particular case, we could have improved the planning time by using
binary search or hash when doing partition pruning. However, it's not a
generally applicable technique.

After the planning, Impala spent 14sec before returning a row. I think I
need to look into it.

As for the Hive query time, I can't get Hive to return anything within
5sec. How did you measure the run time?

Thanks,
Alan


On Sun, Mar 9, 2014 at 11:24 PM, jason shih wrote:

Hi Lenni,

forgot attaching the profile, thanks to remind this. hereafter the profile
of query:


Query Runtime Profile:
Query (id=9949bdd38c6d40f4:ed2d322e47dc469a):
Summary:
Session ID: e748c0afe749b5b4:32772a0064219795
Session Type: BEESWAX
Start Time: 2014-03-10 14:11:44.770023000
End Time: 2014-03-10 14:12:27.449625000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 1.2.4 DEBUG (build
ac29ae09d66c1244fe2ceb293083723226e66c1a)
User: impala
Network Address: 192.168.10.10:47589
Default Db: POC
Sql Statement: select * from tdp_detail_new where header_oid =
'c4a46eee-e975-4bdc-bd8c-fd990db475f9' limit 10
Plan:
----------------
Estimated Per-Host Requirements: Memory=32.00MB VCores=1
PLAN FRAGMENT 0
PARTITION: UNPARTITIONED
1:EXCHANGE
limit: 10
cardinality: 10
per-host memory: unavailable
tuple ids: 0
PLAN FRAGMENT 1
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 1
------------------------------


UNPARTITIONED
0:SCAN HDFS
table=poc.tdp_detail_new #partitions=1/362645 size=7.12MB
table stats: unavailable
columns missing stats: idx, oid, x_axis, y_axis, seq_id, bin_id,
para_name, para_value
limit: 10
cardinality: unavailable
per-host memory: 32.00MB
tuple ids: 0
----------------
Query Timeline: 42s681ms
- Start execution: 5.495ms (5.495ms)
- Planning finished: 28s412ms (28s406ms)
- Rows available: 42s650ms (14s238ms)
- First row fetched: 42s652ms (2.184ms)
- Unregister query: 42s679ms (26.705ms)
ImpalaServer:
- ClientFetchWaitTimer: 28.400ms
- RowMaterializationTimer: 63.726us
Execution Profile 9949bdd38c6d40f4:ed2d322e47dc469a:(Active: 11s429ms, %
non-child: 0.00%)
Per Node Peak Memory Usage: impala-13:22000(8.33 MB)
- FinalizationTimer: 0ns
Coordinator Fragment:(Active: 20.345us, % non-child: 0.00%)
MemoryUsage(500.0ms): 0.00 , 0.00 , 8.00 KB, 8.00 KB, 8.00 KB, 8.00
KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB,
8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 12.08 KB
ThreadUsage(500.0ms): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1
- AverageThreadTokens: 1.00
- PeakMemoryUsage: 12.08 KB
- PrepareTime: 57.473us
- RowsProduced: 10
- TotalCpuTime: 10s932ms
- TotalNetworkWaitTime: 4.952us
- TotalStorageWaitTime: 0ns
CodeGen:(Active: 77.367ms, % non-child: 100.00%)
- CodegenTime: 0ns
- CompileTime: 70.656ms
- LoadTime: 6.333ms
- ModuleFileSize: 102.83 KB
EXCHANGE_NODE (id=1):(Active: 13.960us, % non-child: 68.62%)
- BytesReceived: 861.00 B
- ConvertRowBatchTime: 6.165us
- DataArrivalWaitTime: 0ns
- DeserializeRowBatchTimer: 36.172us
- FirstBatchArrivalWaitTime: 0ns
- PeakMemoryUsage: 4.08 KB
- RowsReturned: 10
- RowsReturnedRate: 716.33 K/sec
- SendersBlockedTimer: 0ns
- SendersBlockedTotalTimer(*): 0ns
Averaged Fragment 1:(Active: 389.945ms, % non-child: 0.00%)
split sizes: min: 7.12 MB, max: 7.12 MB, avg: 7.12 MB, stddev: 0.00
completion times: min:391.540ms max:391.540ms mean: 391.540ms
stddev:0ns
execution rates: min:18.20 MB/sec max:18.20 MB/sec mean:18.20
MB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 1.00
- PeakMemoryUsage: 8.33 MB
- PrepareTime: 2.255ms
- RowsProduced: 10
- TotalCpuTime: 959.264ms
- TotalNetworkWaitTime: 313.755ms
- TotalStorageWaitTime: 16.44ms
CodeGen:(Active: 84.888ms, % non-child: 21.77%)
- CodegenTime: 2.550ms
- CompileTime: 78.299ms
- LoadTime: 6.191ms
- ModuleFileSize: 102.83 KB
DataStreamSender (dst_id=1):(Active: 313.631ms, % non-child: 80.43%)
- BytesSent: 865.00 B
- NetworkThroughput(*): 2.69 KB/sec
- OverallThroughput: 2.69 KB/sec
- PeakMemoryUsage: 856.00 B
- SerializeBatchTime: 77.463us
- ThriftTransmitTime(*): 313.560ms
- UncompressedRowBatchSize: 2.43 KB
HDFS_SCAN_NODE (id=0):(Active: 75.912ms, % non-child: 19.47%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 0.00
- BytesRead: 7.12 MB
- BytesReadLocal: 7.12 MB
- BytesReadShortCircuit: 0.00
- NumDisksAccessed: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 8.32 MB
- PerReadThreadRawHdfsThroughput: 559.42 MB/sec
- RowsRead: 53.97K (53966)
- RowsReturned: 10
- RowsReturnedRate: 131.00 /sec
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 1
- ScannerThreadsTotalWallClockTime: 77.205ms
- DelimiterParseTime: 57.309ms
- MaterializeTupleTime(*): 314.721us
- ScannerThreadsSysTime: 0ns
- ScannerThreadsUserTime: 50.0ms
- ScannerThreadsVoluntaryContextSwitches: 5
- TotalRawHdfsReadTime(*): 12.735ms
- TotalReadThroughput: 0.00 /sec
Fragment 1:
Instance 9949bdd38c6d40f4:ed2d322e47dc469c
(host=impala-13:22000):(Active: 389.945ms, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>):
-1:1/7.12 MB
MemoryUsage(500.0ms): 0.00 , 0.00
ThreadUsage(500.0ms): 1, 1
- AverageThreadTokens: 1.00
- PeakMemoryUsage: 8.33 MB
- PrepareTime: 2.255ms
- RowsProduced: 10
- TotalCpuTime: 959.264ms
- TotalNetworkWaitTime: 313.755ms
- TotalStorageWaitTime: 16.44ms
CodeGen:(Active: 84.888ms, % non-child: 21.77%)
- CodegenTime: 2.550ms
- CompileTime: 78.299ms
- LoadTime: 6.191ms
- ModuleFileSize: 102.83 KB
DataStreamSender (dst_id=1):(Active: 313.631ms, % non-child:
80.43%)
- BytesSent: 865.00 B
- NetworkThroughput(*): 2.69 KB/sec
- OverallThroughput: 2.69 KB/sec
- PeakMemoryUsage: 856.00 B
- SerializeBatchTime: 77.463us
- ThriftTransmitTime(*): 313.560ms
- UncompressedRowBatchSize: 2.43 KB
HDFS_SCAN_NODE (id=0):(Active: 75.912ms, % non-child: 19.47%)
ExecOption: Missing Volume Id, Codegen enabled: 1 out of 1
Hdfs split stats (<volume id>:<# splits>/<split lengths>):
-1:1/7.12 MB
Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0%
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 0.00
- BytesRead: 7.12 MB
- BytesReadLocal: 7.12 MB
- BytesReadShortCircuit: 0.00
- NumDisksAccessed: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 8.32 MB
- PerReadThreadRawHdfsThroughput: 559.42 MB/sec
- RowsRead: 53.97K (53966)
- RowsReturned: 10
- RowsReturnedRate: 131.00 /sec
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 1
- ScannerThreadsTotalWallClockTime: 77.205ms
- DelimiterParseTime: 57.309ms
- MaterializeTupleTime(*): 314.721us
- ScannerThreadsSysTime: 0ns
- ScannerThreadsUserTime: 50.0ms
- ScannerThreadsVoluntaryContextSwitches: 5
- TotalRawHdfsReadTime(*): 12.735ms
- TotalReadThroughput: 0.00 /sec



Cheers,
Jason



On Mon, Mar 10, 2014 at 1:21 PM, Lenni Kuff wrote:

Hi Jason,
Could you attach a query profile for the slow query?

Thanks,
Lenni

On Sun, Mar 9, 2014 at 7:14 PM, jason shih wrote:

Dear all,

we have upgrade to 1.2.4 last Sat. and great to learn previous partition
issue have been resolved already.

one of our table actually have more than 350k partitions thus we used to
query directly from hive before 1.2.4. after upgrading to 1.2.4, we're now
able to query all created partitions from impala except we notice the
responding time is much longer than through hive.

the table size around 1.5TB with more than 350k partitions. query itself
is simple select and limit the first 10 lines:
- responding time from impala is ~42 sec.
- responding time from hive is ~1.2 sec

however, the responding time is much the same as from hive, specific for
the other table contain 15k partitions only.

any chance we could improve the responding time querying from impala? we
hope to consolidate all queries through same DSN.

thanks

Cheers,
Jason

To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.


--

Jason Shih

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 10, '14 at 5:21a
activeMar 16, '14 at 9:37a
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase