FAQ
Hi All,

I have an impala external table which has about 10k column. When I fire
select count(1) on that table it takes more than 10 mins for the first
time. Next time any select or aggregate returns in sub seconds.

I saw impala server logs and found that "catalog.HdfsTable: load table" is
taking lot of time.

When I refresh the impala cache and fire any query it takes lot of time for
first query then onwards it very fast.

Logs :

13/06/18 15:01:56 INFO service.Frontend: analyze query select count(1) from
imp_ext_test
13/06/18 15:01:57 INFO catalog.HdfsTable: load table imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: load partition block md for
imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
PartitionBlockMetadata{#blocks=0, #filenames=0, totalStringLen=0}
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
PartitionBlockMetadata{#blocks=16, #filenames=6, totalStringLen=420}
13/06/18 15:46:02 INFO catalog.HdfsTable: loaded disk ids for table
default.imp_ext_test
13/06/18 15:46:02 INFO catalog.HdfsTable: 1

Could you please provide some help on reducing the load table time. My
table has about 10k columns.

Thanks

Search Discussions

  • Neeraj Chaplot at Jun 18, 2013 at 11:08 am
    The query plan is :

    Query (id=c41c881772ac72e:9e1630860f637eba):
       Summary:
         Start Time: 2013-06-18 15:01:56
         End Time: 2013-06-18 15:46:13
         Query Type: QUERY
         Query State: FINISHED
         Query Status: OK
         Impala Version: impalad version 1.0.1 RELEASE (build
    df844fb967cec8740f08dfb8b21962bc053527ef)
         User: root
         Default Db: default
         Sql Statement: select count(1) from imp_ext_test
         Plan:
    ----------------
    PLAN FRAGMENT 0
       PARTITION: UNPARTITIONED

       3:AGGREGATE
    output: SUM(<slot 0>)
    group by:
    tuple ids: 1
       2:EXCHANGE
          tuple ids: 1

    PLAN FRAGMENT 1
       PARTITION: RANDOM

       STREAM DATA SINK
         EXCHANGE ID: 2
         UNPARTITIONED

       1:AGGREGATE
    output: COUNT(1)
    group by:
    tuple ids: 1
       0:SCAN HDFS
          table=default.imp_ext_test #partitions=1 size=695.17MB
          tuple ids: 0
    ----------------
         Query Timeline: 44m17s
            - Start execution: 2.461ms (2.461ms)
            - Planning finished: 44m6s (44m5s)
            - Rows available: 44m16s (10s822ms)
            - First row fetched: 44m17s (443.984ms)
            - Unregister query: 44m17s (2.778ms)
       ImpalaServer:
          - ClientFetchWaitTimer: 444.987ms
          - RowMaterializationTimer: 21.879us
       Execution Profile c41c881772ac72e:9e1630860f637eba:(Active:
    10s821ms, % non-child: 0.00%)
          - FinalizationTimer: 0ns
         Coordinator Fragment:(Active: 10s589ms, % non-child: 0.00%)
            - AverageThreadTokens: 0.00
            - RowsProduced: 1
           CodeGen:(Active: 109.22ms, % non-child: 1.03%)
              - CodegenTime: 506.933us
              - CompileTime: 86.490ms
              - LoadTime: 22.531ms
              - ModuleFileSize: 74.45 KB
           AGGREGATION_NODE (id=3):(Active: 10s589ms, % non-child: 0.05%)
             ExecOption: Codegen Enabled
              - BuildBuckets: 1.02K (1024)
              - BuildTime: 2.957us
              - GetResultsTime: 3.760us
              - LoadFactor: 0.00
              - MemoryUsed: 32.01 KB
              - RowsReturned: 1
              - RowsReturnedRate: 0
           EXCHANGE_NODE (id=2):(Active: 10s585ms, % non-child: 99.96%)
              - BytesReceived: 16.00 B
              - ConvertRowBatchTime: 3.576us
              - DataArrivalWaitTime: 10s585ms
              - DeserializeRowBatchTimer: 4.917us
              - FirstBatchArrivalWaitTime: 0ns
              - MemoryUsed: 0.00
              - RowsReturned: 1
              - RowsReturnedRate: 0
              - SendersBlockedTimer: 0ns
              - SendersBlockedTotalTimer(*): 0ns
         Averaged Fragment 1:(Active: 10s589ms, % non-child: 0.00%)
           split sizes: min: 695.17 MB, max: 695.17 MB, avg: 695.17 MB,
    stddev: 0.00
           completion times: min:10s590ms max:10s590ms mean: 10s590ms stddev:0ns
           execution rates: min:65.64 MB/sec max:65.64 MB/sec mean:65.64
    MB/sec stddev:0.00 /sec
           num instances: 1
            - AverageThreadTokens: 10.05
            - RowsProduced: 1
           CodeGen:(Active: 95.569ms, % non-child: 0.90%)
              - CodegenTime: 816.588us
              - CompileTime: 88.714ms
              - LoadTime: 6.853ms
              - ModuleFileSize: 74.45 KB
           DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
              - BytesSent: 16.00 B
              - NetworkThroughput(*): 77.97 KB/sec
              - OverallThroughput: 60.36 KB/sec
              - SerializeBatchTime: 28.202us
              - ThriftTransmitTime(*): 200.390us
              - UncompressedRowBatchSize: 16.00 B
           AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
              - BuildBuckets: 1.02K (1024)
              - BuildTime: 97.70us
              - GetResultsTime: 4.189us
              - LoadFactor: 0.00
              - MemoryUsed: 32.01 KB
              - RowsReturned: 1
              - RowsReturnedRate: 0
           HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
              - AverageHdfsReadThreadConcurrency: 0.52
              - AverageIoMgrQueueCapacity: 244.57
              - AverageIoMgrQueueSize: 0.00
              - AverageScannerThreadConcurrency: 0.14
              - BytesRead: 695.17 MB
              - MemoryUsed: 0.00
              - NumDisksAccessed: 1
              - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
              - RowsRead: 10.00K (10000)
              - RowsReturned: 10.00K (10000)
              - RowsReturnedRate: 944.00 /sec
              - ScanRangesComplete: 16
              - ScannerThreadsInvoluntaryContextSwitches: 70
              - ScannerThreadsTotalWallClockTime: 1m32s
                - DelimiterParseTime: 678.996ms
                - MaterializeTupleTime(*): 74.412us
                - ScannerThreadsSysTime: 12.994ms
                - ScannerThreadsUserTime: 713.884ms
              - ScannerThreadsVoluntaryContextSwitches: 813
              - TotalRawHdfsReadTime(*): 5s869ms
              - TotalReadThroughput: 66.21 MB/sec
         Fragment 1:
           Instance c41c881772ac72e:9e1630860f637ebc
    (host=impetus-i0060.impetus.co.in:22000):(Active: 10s589ms, %
    non-child: 0.00%)
             Hdfs split stats (<volume id>:<# splits>/<split lengths>):
    0:16/695.17 MB
              - AverageThreadTokens: 10.05
              - RowsProduced: 1
             CodeGen:(Active: 95.569ms, % non-child: 0.90%)
                - CodegenTime: 816.588us
                - CompileTime: 88.714ms
                - LoadTime: 6.853ms
                - ModuleFileSize: 74.45 KB
             DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
                - BytesSent: 16.00 B
                - NetworkThroughput(*): 77.97 KB/sec
                - OverallThroughput: 60.36 KB/sec
                - SerializeBatchTime: 28.202us
                - ThriftTransmitTime(*): 200.390us
                - UncompressedRowBatchSize: 16.00 B
             AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
               ExecOption: Codegen Enabled
                - BuildBuckets: 1.02K (1024)
                - BuildTime: 97.70us
                - GetResultsTime: 4.189us
                - LoadFactor: 0.00
                - MemoryUsed: 32.01 KB
                - RowsReturned: 1
                - RowsReturnedRate: 0
             HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
               Hdfs split stats (<volume id>:<# splits>/<split lengths>):
    0:16/695.17 MB
               Hdfs Read Thread Concurrency Bucket: 0:47.62% 1:52.38% 2:0%
               File Formats: TEXT/NONE:16
               ExecOption: Codegen enabled: 16 out of 16
                - AverageHdfsReadThreadConcurrency: 0.52
                - AverageIoMgrQueueCapacity: 244.57
                - AverageIoMgrQueueSize: 0.00
                - AverageScannerThreadConcurrency: 0.14
                - BytesRead: 695.17 MB
                - MemoryUsed: 0.00
                - NumDisksAccessed: 1
                - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
                - RowsRead: 10.00K (10000)
                - RowsReturned: 10.00K (10000)
                - RowsReturnedRate: 944.00 /sec
                - ScanRangesComplete: 16
                - ScannerThreadsInvoluntaryContextSwitches: 70
                - ScannerThreadsTotalWallClockTime: 1m32s
                  - DelimiterParseTime: 678.996ms
                  - MaterializeTupleTime(*): 74.412us
                  - ScannerThreadsSysTime: 12.994ms
                  - ScannerThreadsUserTime: 713.884ms
                - ScannerThreadsVoluntaryContextSwitches: 813
                - TotalRawHdfsReadTime(*): 5s869ms
                - TotalReadThroughput: 66.21 MB/sec



    On Tue, Jun 18, 2013 at 4:11 PM, wrote:

    Hi All,

    I have an impala external table which has about 10k column. When I fire
    select count(1) on that table it takes more than 10 mins for the first
    time. Next time any select or aggregate returns in sub seconds.

    I saw impala server logs and found that "catalog.HdfsTable: load table" is
    taking lot of time.

    When I refresh the impala cache and fire any query it takes lot of time
    for first query then onwards it very fast.

    Logs :

    13/06/18 15:01:56 INFO service.Frontend: analyze query select count(1)
    from imp_ext_test
    13/06/18 15:01:57 INFO catalog.HdfsTable: load table imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: load partition block md for
    imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=0, #filenames=0, totalStringLen=0}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=16, #filenames=6, totalStringLen=420}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded disk ids for table
    default.imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: 1

    Could you please provide some help on reducing the load table time. My
    table has about 10k columns.

    Thanks
  • Alan Choi at Jun 18, 2013 at 9:52 pm
    Hi,

    Yes, Impala does take a long time to load such an extremely wide table.
    I've filed JIRA IMPALA-428 to track it. Thanks for reporting it.!

    Thanks,
    Alan

    On Tue, Jun 18, 2013 at 4:02 AM, Neeraj Chaplot wrote:

    The query plan is :

    Query (id=c41c881772ac72e:9e1630860f637eba):
    Summary:
    Start Time: 2013-06-18 15:01:56
    End Time: 2013-06-18 15:46:13
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 1.0.1 RELEASE (build df844fb967cec8740f08dfb8b21962bc053527ef)
    User: root
    Default Db: default
    Sql Statement: select count(1) from imp_ext_test
    Plan:
    ----------------
    PLAN FRAGMENT 0
    PARTITION: UNPARTITIONED

    3:AGGREGATE
    output: SUM(<slot 0>)
    group by:
    tuple ids: 1
    2:EXCHANGE
    tuple ids: 1

    PLAN FRAGMENT 1
    PARTITION: RANDOM

    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    1:AGGREGATE
    output: COUNT(1)
    group by:
    tuple ids: 1
    0:SCAN HDFS
    table=default.imp_ext_test #partitions=1 size=695.17MB
    tuple ids: 0
    ----------------
    Query Timeline: 44m17s
    - Start execution: 2.461ms (2.461ms)
    - Planning finished: 44m6s (44m5s)
    - Rows available: 44m16s (10s822ms)
    - First row fetched: 44m17s (443.984ms)
    - Unregister query: 44m17s (2.778ms)
    ImpalaServer:
    - ClientFetchWaitTimer: 444.987ms
    - RowMaterializationTimer: 21.879us
    Execution Profile c41c881772ac72e:9e1630860f637eba:(Active: 10s821ms, % non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 10s589ms, % non-child: 0.00%)
    - AverageThreadTokens: 0.00
    - RowsProduced: 1
    CodeGen:(Active: 109.22ms, % non-child: 1.03%)
    - CodegenTime: 506.933us
    - CompileTime: 86.490ms
    - LoadTime: 22.531ms
    - ModuleFileSize: 74.45 KB
    AGGREGATION_NODE (id=3):(Active: 10s589ms, % non-child: 0.05%)
    ExecOption: Codegen Enabled
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 2.957us
    - GetResultsTime: 3.760us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(Active: 10s585ms, % non-child: 99.96%)
    - BytesReceived: 16.00 B
    - ConvertRowBatchTime: 3.576us
    - DataArrivalWaitTime: 10s585ms
    - DeserializeRowBatchTimer: 4.917us
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 1
    - RowsReturnedRate: 0
    - SendersBlockedTimer: 0ns
    - SendersBlockedTotalTimer(*): 0ns
    Averaged Fragment 1:(Active: 10s589ms, % non-child: 0.00%)
    split sizes: min: 695.17 MB, max: 695.17 MB, avg: 695.17 MB, stddev: 0.00
    completion times: min:10s590ms max:10s590ms mean: 10s590ms stddev:0ns
    execution rates: min:65.64 MB/sec max:65.64 MB/sec mean:65.64 MB/sec stddev:0.00 /sec
    num instances: 1
    - AverageThreadTokens: 10.05
    - RowsProduced: 1
    CodeGen:(Active: 95.569ms, % non-child: 0.90%)
    - CodegenTime: 816.588us
    - CompileTime: 88.714ms
    - LoadTime: 6.853ms
    - ModuleFileSize: 74.45 KB
    DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
    - BytesSent: 16.00 B
    - NetworkThroughput(*): 77.97 KB/sec
    - OverallThroughput: 60.36 KB/sec
    - SerializeBatchTime: 28.202us
    - ThriftTransmitTime(*): 200.390us
    - UncompressedRowBatchSize: 16.00 B
    AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 97.70us
    - GetResultsTime: 4.189us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
    - AverageHdfsReadThreadConcurrency: 0.52
    - AverageIoMgrQueueCapacity: 244.57
    - AverageIoMgrQueueSize: 0.00
    - AverageScannerThreadConcurrency: 0.14
    - BytesRead: 695.17 MB
    - MemoryUsed: 0.00
    - NumDisksAccessed: 1
    - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
    - RowsRead: 10.00K (10000)
    - RowsReturned: 10.00K (10000)
    - RowsReturnedRate: 944.00 /sec
    - ScanRangesComplete: 16
    - ScannerThreadsInvoluntaryContextSwitches: 70
    - ScannerThreadsTotalWallClockTime: 1m32s
    - DelimiterParseTime: 678.996ms
    - MaterializeTupleTime(*): 74.412us
    - ScannerThreadsSysTime: 12.994ms
    - ScannerThreadsUserTime: 713.884ms
    - ScannerThreadsVoluntaryContextSwitches: 813
    - TotalRawHdfsReadTime(*): 5s869ms
    - TotalReadThroughput: 66.21 MB/sec
    Fragment 1:
    Instance c41c881772ac72e:9e1630860f637ebc (host=impetus-i0060.impetus.co.in:22000):(Active: 10s589ms, % non-child: 0.00%)
    Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
    - AverageThreadTokens: 10.05
    - RowsProduced: 1
    CodeGen:(Active: 95.569ms, % non-child: 0.90%)
    - CodegenTime: 816.588us
    - CompileTime: 88.714ms
    - LoadTime: 6.853ms
    - ModuleFileSize: 74.45 KB
    DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
    - BytesSent: 16.00 B
    - NetworkThroughput(*): 77.97 KB/sec
    - OverallThroughput: 60.36 KB/sec
    - SerializeBatchTime: 28.202us
    - ThriftTransmitTime(*): 200.390us
    - UncompressedRowBatchSize: 16.00 B
    AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
    ExecOption: Codegen Enabled
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 97.70us
    - GetResultsTime: 4.189us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
    Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
    Hdfs Read Thread Concurrency Bucket: 0:47.62% 1:52.38% 2:0%
    File Formats: TEXT/NONE:16
    ExecOption: Codegen enabled: 16 out of 16
    - AverageHdfsReadThreadConcurrency: 0.52
    - AverageIoMgrQueueCapacity: 244.57
    - AverageIoMgrQueueSize: 0.00
    - AverageScannerThreadConcurrency: 0.14
    - BytesRead: 695.17 MB
    - MemoryUsed: 0.00
    - NumDisksAccessed: 1
    - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
    - RowsRead: 10.00K (10000)
    - RowsReturned: 10.00K (10000)
    - RowsReturnedRate: 944.00 /sec
    - ScanRangesComplete: 16
    - ScannerThreadsInvoluntaryContextSwitches: 70
    - ScannerThreadsTotalWallClockTime: 1m32s
    - DelimiterParseTime: 678.996ms
    - MaterializeTupleTime(*): 74.412us
    - ScannerThreadsSysTime: 12.994ms
    - ScannerThreadsUserTime: 713.884ms
    - ScannerThreadsVoluntaryContextSwitches: 813
    - TotalRawHdfsReadTime(*): 5s869ms
    - TotalReadThroughput: 66.21 MB/sec



    On Tue, Jun 18, 2013 at 4:11 PM, wrote:

    Hi All,

    I have an impala external table which has about 10k column. When I fire
    select count(1) on that table it takes more than 10 mins for the first
    time. Next time any select or aggregate returns in sub seconds.

    I saw impala server logs and found that "catalog.HdfsTable: load table"
    is taking lot of time.

    When I refresh the impala cache and fire any query it takes lot of time
    for first query then onwards it very fast.

    Logs :

    13/06/18 15:01:56 INFO service.Frontend: analyze query select count(1)
    from imp_ext_test
    13/06/18 15:01:57 INFO catalog.HdfsTable: load table imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: load partition block md for
    imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=0, #filenames=0, totalStringLen=0}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=16, #filenames=6, totalStringLen=420}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded disk ids for table
    default.imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: 1

    Could you please provide some help on reducing the load table time. My
    table has about 10k columns.

    Thanks
  • matt Lieber at Jun 18, 2013 at 11:33 pm
    thanks Alan - will it be expected that each time some data is
    inserted/loaded, we have to pay a penalty in performance for the first
    query, after the first time net-new data has gone in to Impala then, until
    this is resolved?

    cheers,
    Matt
    On Tuesday, June 18, 2013 2:23:30 PM UTC-7, Alan wrote:

    Hi,

    Yes, Impala does take a long time to load such an extremely wide table.
    I've filed JIRA IMPALA-428 to track it. Thanks for reporting it.!

    Thanks,
    Alan


    On Tue, Jun 18, 2013 at 4:02 AM, Neeraj Chaplot <gee...@gmail.com<javascript:>
    wrote:
    The query plan is :

    Query (id=c41c881772ac72e:9e1630860f637eba):
    Summary:
    Start Time: 2013-06-18 15:01:56
    End Time: 2013-06-18 15:46:13
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 1.0.1 RELEASE (build df844fb967cec8740f08dfb8b21962bc053527ef)
    User: root
    Default Db: default
    Sql Statement: select count(1) from imp_ext_test
    Plan:
    ----------------
    PLAN FRAGMENT 0
    PARTITION: UNPARTITIONED

    3:AGGREGATE
    output: SUM(<slot 0>)
    group by:
    tuple ids: 1
    2:EXCHANGE
    tuple ids: 1

    PLAN FRAGMENT 1
    PARTITION: RANDOM

    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    1:AGGREGATE
    output: COUNT(1)
    group by:
    tuple ids: 1
    0:SCAN HDFS
    table=default.imp_ext_test #partitions=1 size=695.17MB
    tuple ids: 0
    ----------------
    Query Timeline: 44m17s
    - Start execution: 2.461ms (2.461ms)
    - Planning finished: 44m6s (44m5s)
    - Rows available: 44m16s (10s822ms)
    - First row fetched: 44m17s (443.984ms)
    - Unregister query: 44m17s (2.778ms)
    ImpalaServer:
    - ClientFetchWaitTimer: 444.987ms
    - RowMaterializationTimer: 21.879us
    Execution Profile c41c881772ac72e:9e1630860f637eba:(Active: 10s821ms, % non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 10s589ms, % non-child: 0.00%)
    - AverageThreadTokens: 0.00
    - RowsProduced: 1
    CodeGen:(Active: 109.22ms, % non-child: 1.03%)
    - CodegenTime: 506.933us
    - CompileTime: 86.490ms
    - LoadTime: 22.531ms
    - ModuleFileSize: 74.45 KB
    AGGREGATION_NODE (id=3):(Active: 10s589ms, % non-child: 0.05%)
    ExecOption: Codegen Enabled
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 2.957us
    - GetResultsTime: 3.760us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(Active: 10s585ms, % non-child: 99.96%)
    - BytesReceived: 16.00 B
    - ConvertRowBatchTime: 3.576us
    - DataArrivalWaitTime: 10s585ms
    - DeserializeRowBatchTimer: 4.917us
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 1
    - RowsReturnedRate: 0
    - SendersBlockedTimer: 0ns
    - SendersBlockedTotalTimer(*): 0ns
    Averaged Fragment 1:(Active: 10s589ms, % non-child: 0.00%)
    split sizes: min: 695.17 MB, max: 695.17 MB, avg: 695.17 MB, stddev: 0.00
    completion times: min:10s590ms max:10s590ms mean: 10s590ms stddev:0ns
    execution rates: min:65.64 MB/sec max:65.64 MB/sec mean:65.64 MB/sec stddev:0.00 /sec
    num instances: 1
    - AverageThreadTokens: 10.05
    - RowsProduced: 1
    CodeGen:(Active: 95.569ms, % non-child: 0.90%)
    - CodegenTime: 816.588us
    - CompileTime: 88.714ms
    - LoadTime: 6.853ms
    - ModuleFileSize: 74.45 KB
    DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
    - BytesSent: 16.00 B
    - NetworkThroughput(*): 77.97 KB/sec
    - OverallThroughput: 60.36 KB/sec
    - SerializeBatchTime: 28.202us
    - ThriftTransmitTime(*): 200.390us
    - UncompressedRowBatchSize: 16.00 B
    AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 97.70us
    - GetResultsTime: 4.189us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
    - AverageHdfsReadThreadConcurrency: 0.52
    - AverageIoMgrQueueCapacity: 244.57
    - AverageIoMgrQueueSize: 0.00
    - AverageScannerThreadConcurrency: 0.14
    - BytesRead: 695.17 MB
    - MemoryUsed: 0.00
    - NumDisksAccessed: 1
    - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
    - RowsRead: 10.00K (10000)
    - RowsReturned: 10.00K (10000)
    - RowsReturnedRate: 944.00 /sec
    - ScanRangesComplete: 16
    - ScannerThreadsInvoluntaryContextSwitches: 70
    - ScannerThreadsTotalWallClockTime: 1m32s
    - DelimiterParseTime: 678.996ms
    - MaterializeTupleTime(*): 74.412us
    - ScannerThreadsSysTime: 12.994ms
    - ScannerThreadsUserTime: 713.884ms
    - ScannerThreadsVoluntaryContextSwitches: 813
    - TotalRawHdfsReadTime(*): 5s869ms
    - TotalReadThroughput: 66.21 MB/sec
    Fragment 1:
    Instance c41c881772ac72e:9e1630860f637ebc (host=impetus-i0060.impetus.co.in:22000):(Active: 10s589ms, % non-child: 0.00%)
    Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
    - AverageThreadTokens: 10.05
    - RowsProduced: 1
    CodeGen:(Active: 95.569ms, % non-child: 0.90%)
    - CodegenTime: 816.588us
    - CompileTime: 88.714ms
    - LoadTime: 6.853ms
    - ModuleFileSize: 74.45 KB
    DataStreamSender (dst_id=2):(Active: 258.881us, % non-child: 0.00%)
    - BytesSent: 16.00 B
    - NetworkThroughput(*): 77.97 KB/sec
    - OverallThroughput: 60.36 KB/sec
    - SerializeBatchTime: 28.202us
    - ThriftTransmitTime(*): 200.390us
    - UncompressedRowBatchSize: 16.00 B
    AGGREGATION_NODE (id=1):(Active: 10s589ms, % non-child: 0.05%)
    ExecOption: Codegen Enabled
    - BuildBuckets: 1.02K (1024)
    - BuildTime: 97.70us
    - GetResultsTime: 4.189us
    - LoadFactor: 0.00
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(Active: 10s584ms, % non-child: 99.95%)
    Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/695.17 MB
    Hdfs Read Thread Concurrency Bucket: 0:47.62% 1:52.38% 2:0%
    File Formats: TEXT/NONE:16
    ExecOption: Codegen enabled: 16 out of 16
    - AverageHdfsReadThreadConcurrency: 0.52
    - AverageIoMgrQueueCapacity: 244.57
    - AverageIoMgrQueueSize: 0.00
    - AverageScannerThreadConcurrency: 0.14
    - BytesRead: 695.17 MB
    - MemoryUsed: 0.00
    - NumDisksAccessed: 1
    - PerReadThreadRawHdfsThroughput: 118.43 MB/sec
    - RowsRead: 10.00K (10000)
    - RowsReturned: 10.00K (10000)
    - RowsReturnedRate: 944.00 /sec
    - ScanRangesComplete: 16
    - ScannerThreadsInvoluntaryContextSwitches: 70
    - ScannerThreadsTotalWallClockTime: 1m32s
    - DelimiterParseTime: 678.996ms
    - MaterializeTupleTime(*): 74.412us
    - ScannerThreadsSysTime: 12.994ms
    - ScannerThreadsUserTime: 713.884ms
    - ScannerThreadsVoluntaryContextSwitches: 813
    - TotalRawHdfsReadTime(*): 5s869ms
    - TotalReadThroughput: 66.21 MB/sec



    On Tue, Jun 18, 2013 at 4:11 PM, <gee...@gmail.com <javascript:>> wrote:

    Hi All,

    I have an impala external table which has about 10k column. When I fire
    select count(1) on that table it takes more than 10 mins for the first
    time. Next time any select or aggregate returns in sub seconds.

    I saw impala server logs and found that "catalog.HdfsTable: load table"
    is taking lot of time.

    When I refresh the impala cache and fire any query it takes lot of time
    for first query then onwards it very fast.

    Logs :

    13/06/18 15:01:56 INFO service.Frontend: analyze query select count(1)
    from imp_ext_test
    13/06/18 15:01:57 INFO catalog.HdfsTable: load table imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: load partition block md for
    imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=0, #filenames=0, totalStringLen=0}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded partition
    PartitionBlockMetadata{#blocks=16, #filenames=6, totalStringLen=420}
    13/06/18 15:46:02 INFO catalog.HdfsTable: loaded disk ids for table
    default.imp_ext_test
    13/06/18 15:46:02 INFO catalog.HdfsTable: 1

    Could you please provide some help on reducing the load table time. My
    table has about 10k columns.

    Thanks

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 18, '13 at 10:41a
activeJun 18, '13 at 11:33p
posts4
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase