FAQ
I've done some tests for finding out the effect of the option
-num_threads_per_disk.
Contrary to my expectations, the result is that adding more number of
threads per disk will decrease the overall query performance a little bit
rather than improve.

a. -num_threads_per_disk=1 : HDFS_SCAN_NODE takes 12s231ms (without os
caching)
b. -num_threads_per_disk=24 : HDFS_SCAN_NODE takes 13s954ms (without os
caching)

My cluster consists of 8 DNs, and each node is equipped with 24-core CPU,
64 GB memory and 6 disks.
The par_test1 table contains 40M rows with 112 Parquet files. Each Parquet
file size is about 200Mb.

I think the performance decrease is mainly because of two reasons: First
concurrent reads from different read threads
will increase seek operations for reading the file blocks introducing an
overhead which could slow down the HDFS_SCAN operation
and second many read threads will increase the cost of context switches.
I don’t know if my interpretation is correct. If I am missing something,
please correct me.

Some part of query profiles of each different condition is shown below.

1. set -num_threads_per_disk=1 and query "select <24 columns> from
par_test1 where col2='GXG201' limit 100;"

     Averaged Fragment 1:(Active: 12s232ms, % non-child: 0.00%)
       split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
204.96 MB
       completion times: min:9s556ms max:18s748ms mean: 12s585ms
  stddev:3s270ms
       execution rates: min:111.27 MB/sec max:266.61 MB/sec mean:207.75
MB/sec stddev:52.87 MB/sec
       num instances: 8
        - AverageThreadTokens: 6.99
        - RowsProduced: 0
       CodeGen:(Active: 483.932ms, % non-child: 5.90%)
          - CodegenTime: 43.720ms
          - CompileTime: 241.83ms
          - LoadTime: 242.846ms
          - ModuleFileSize: 73.10 KB
       DataStreamSender (dst_id=1):(Active: 20.838us, % non-child: 0.00%)
          - BytesSent: 0.00
          - NetworkThroughput(*): 0.00 /sec
          - OverallThroughput: 0.00 /sec
          - SerializeBatchTime: 0ns
          - ThriftTransmitTime(*): 0ns
          - UncompressedRowBatchSize: 0.00
       HDFS_SCAN_NODE (id=0):(Active: *12s231ms*, % non-child: 99.99%)
          - AverageHdfsReadThreadConcurrency: *1.96* // not IO
bounded
            - HdfsReadThreadConcurrencyCountPercentage=0: 26.50
            - HdfsReadThreadConcurrencyCountPercentage=1: 24.29
            - HdfsReadThreadConcurrencyCountPercentage=2: 13.58
            - HdfsReadThreadConcurrencyCountPercentage=3: 9.77
            - HdfsReadThreadConcurrencyCountPercentage=4: 16.08
            - HdfsReadThreadConcurrencyCountPercentage=5: 7.58
            - HdfsReadThreadConcurrencyCountPercentage=6: 2.20
            - HdfsReadThreadConcurrencyCountPercentage=7: 0.00
            - HdfsReadThreadConcurrencyCountPercentage=8: 0.00
          - AverageIoMgrQueueCapcity: 246.08 // Is this a queue
which containing the ScanRange objects?
          - AverageIoMgrQueueSize: 0.61
          - AverageScannerThreadConcurrency: 4.62
          - BytesRead: 2.39 GB
          - DecompressionTime: 14s205ms
          - MemoryUsed: 0.00
          - NumColumns: 24
          - NumDisksAccessed: 6
          - PerReadThreadRawHdfsThroughput: 106.12 MB/sec
          - RowsRead: 51.22M (51221152)
          - RowsReturned: 0
          - RowsReturnedRate: 0
          - ScanRangesComplete: 14
          - ScannerThreadsInvoluntaryContextSwitches: 1.10K (1098)
          - ScannerThreadsTotalWallClockTime: 1m35s
            - MaterializeTupleTime(*): 39s836ms
            - ScannerThreadsSysTime: 514.290ms
            - ScannerThreadsUserTime: 53s864ms
          - ScannerThreadsVoluntaryContextSwitches: 2.40K (2396)
          - TotalRawHdfsReadTime(*): *23s547ms*
          - TotalReadThroughput: 203.31 MB/sec


2. set -num_threads_per_disk=24 and query "select <24 columns> from
par_test1 where col2='GXG201' limit 100;"

     Averaged Fragment 1:(Active: 13s956ms, % non-child: 0.00%)
       split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
204.96 MB
       completion times: min:11s052ms max:17s233ms mean: 14s310ms
  stddev:2s086ms
       execution rates: min:116.85 MB/sec max:230.53 MB/sec mean:175.80
MB/sec stddev:34.31 MB/sec
       num instances: 8
        - AverageThreadTokens: 7.67
        - RowsProduced: 0
       CodeGen:(Active: 269.13ms, % non-child: 1.83%)
          - CodegenTime: 16.262ms
          - CompileTime: 229.842ms
          - LoadTime: 39.169ms
          - ModuleFileSize: 73.10 KB
       DataStreamSender (dst_id=1):(Active: 18.305us, % non-child: 0.00%)
          - BytesSent: 0.00
          - NetworkThroughput(*): 0.00 /sec
          - OverallThroughput: 0.00 /sec
          - SerializeBatchTime: 0ns
          - ThriftTransmitTime(*): 0ns
          - UncompressedRowBatchSize: 0.00
       HDFS_SCAN_NODE (id=0):(Active: *13s954ms*, % non-child: 99.99%)
          - AverageHdfsReadThreadConcurrency: *13.65* *// too
many hdfs read threads*
            - HdfsReadThreadConcurrencyCountPercentage=0: 20.94
            - HdfsReadThreadConcurrencyCountPercentage=1: 9.44
            - HdfsReadThreadConcurrencyCountPercentage=2: 3.43
            - HdfsReadThreadConcurrencyCountPercentage=3: 5.31
            - HdfsReadThreadConcurrencyCountPercentage=4: 0.46
            - HdfsReadThreadConcurrencyCountPercentage=5: 1.56
            - HdfsReadThreadConcurrencyCountPercentage=6: 2.31
            - HdfsReadThreadConcurrencyCountPercentage=7: 1.30
            - HdfsReadThreadConcurrencyCountPercentage=8: 55.23
          - AverageIoMgrQueueCapcity: 256.00
          - AverageIoMgrQueueSize: 0.27
          - AverageScannerThreadConcurrency: 4.31
          - BytesRead: 2.39 GB
          - DecompressionTime: 14s034ms
          - MemoryUsed: 0.00
          - NumColumns: 24
          - NumDisksAccessed: 6
          - PerReadThreadRawHdfsThroughput: 21.64 MB/sec
          - RowsRead: 51.22M (51221152)
          - RowsReturned: 0
          - RowsReturnedRate: 0
          - ScanRangesComplete: 14
          - ScannerThreadsInvoluntaryContextSwitches: 729
          - ScannerThreadsTotalWallClockTime: 1m55s
            - MaterializeTupleTime(*): 45s059ms
            - ScannerThreadsSysTime: 126.847ms
            - ScannerThreadsUserTime: 59s447ms
          - ScannerThreadsVoluntaryContextSwitches: 431
          - TotalRawHdfsReadTime(*): *3m5s * // This time value
is much increased comparing to the last result.
          - TotalReadThroughput: 172.11 MB/sec


I would be grateful if anybody can give me some examples when to use the
option -num_threads_per_disk for improving query performance.

Thanks in advance.

Search Discussions

  • Jung-Yup Lee at May 28, 2013 at 8:56 pm
    Thanks for replying, Nong.
    You're right, with SSDs if we set the number of readers to the number of
    CPU cores, we would expect to see maximized reading performance.
    As you already know, SSDs can fast random access for reading, as there is
    no read/write head to move.

    On Wednesday, May 29, 2013 12:49:54 AM UTC+9, Nong wrote:

    Hi,

    This option is an internal/debug option we used for development. You'll
    likely see it removed in
    future releases and Impala will determine the optimal value based on the
    system setup.

    Increasing the number of concurrent readers on a single disk does not
    improvement throughput as
    you have seen. Issuing multiple IO requests increases the amount of seeks
    and the amount of
    work that can actually be parallelized is minimal. I would expect to see
    improved performance on
    SSDs, but again, Impala can determine that on startup.

    Thanks
    Nong


    On Mon, May 27, 2013 at 5:48 PM, Jung-Yup Lee <ljy...@gmail.com<javascript:>
    wrote:
    I've done some tests for finding out the effect of the option
    -num_threads_per_disk.
    Contrary to my expectations, the result is that adding more number of
    threads per disk will decrease the overall query performance a little bit
    rather than improve.

    a. -num_threads_per_disk=1 : HDFS_SCAN_NODE takes 12s231ms (without os
    caching)
    b. -num_threads_per_disk=24 : HDFS_SCAN_NODE takes 13s954ms (without os
    caching)

    My cluster consists of 8 DNs, and each node is equipped with 24-core CPU,
    64 GB memory and 6 disks.
    The par_test1 table contains 40M rows with 112 Parquet files. Each
    Parquet file size is about 200Mb.

    I think the performance decrease is mainly because of two reasons: First
    concurrent reads from different read threads
    will increase seek operations for reading the file blocks introducing an
    overhead which could slow down the HDFS_SCAN operation
    and second many read threads will increase the cost of context switches.
    I don’t know if my interpretation is correct. If I am missing something,
    please correct me.

    Some part of query profiles of each different condition is shown below.

    1. set -num_threads_per_disk=1 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 12s232ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:9s556ms max:18s748ms mean: 12s585ms
    stddev:3s270ms
    execution rates: min:111.27 MB/sec max:266.61 MB/sec mean:207.75
    MB/sec stddev:52.87 MB/sec
    num instances: 8
    - AverageThreadTokens: 6.99
    - RowsProduced: 0
    CodeGen:(Active: 483.932ms, % non-child: 5.90%)
    - CodegenTime: 43.720ms
    - CompileTime: 241.83ms
    - LoadTime: 242.846ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 20.838us, % non-child: 0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *12s231ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurrency: *1.96* // not IO
    bounded
    - HdfsReadThreadConcurrencyCountPercentage=0: 26.50
    - HdfsReadThreadConcurrencyCountPercentage=1: 24.29
    - HdfsReadThreadConcurrencyCountPercentage=2: 13.58
    - HdfsReadThreadConcurrencyCountPercentage=3: 9.77
    - HdfsReadThreadConcurrencyCountPercentage=4: 16.08
    - HdfsReadThreadConcurrencyCountPercentage=5: 7.58
    - HdfsReadThreadConcurrencyCountPercentage=6: 2.20
    - HdfsReadThreadConcurrencyCountPercentage=7: 0.00
    - HdfsReadThreadConcurrencyCountPercentage=8: 0.00
    - AverageIoMgrQueueCapcity: 246.08 // Is this a queue
    which containing the ScanRange objects?
    - AverageIoMgrQueueSize: 0.61
    - AverageScannerThreadConcurrency: 4.62
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s205ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput: 106.12 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryContextSwitches: 1.10K (1098)
    - ScannerThreadsTotalWallClockTime: 1m35s
    - MaterializeTupleTime(*): 39s836ms
    - ScannerThreadsSysTime: 514.290ms
    - ScannerThreadsUserTime: 53s864ms
    - ScannerThreadsVoluntaryContextSwitches: 2.40K (2396)
    - TotalRawHdfsReadTime(*): *23s547ms*
    - TotalReadThroughput: 203.31 MB/sec


    2. set -num_threads_per_disk=24 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 13s956ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:11s052ms max:17s233ms mean: 14s310ms
    stddev:2s086ms
    execution rates: min:116.85 MB/sec max:230.53 MB/sec mean:175.80
    MB/sec stddev:34.31 MB/sec
    num instances: 8
    - AverageThreadTokens: 7.67
    - RowsProduced: 0
    CodeGen:(Active: 269.13ms, % non-child: 1.83%)
    - CodegenTime: 16.262ms
    - CompileTime: 229.842ms
    - LoadTime: 39.169ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 18.305us, % non-child: 0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *13s954ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurrency: *13.65* *//
    too many hdfs read threads*
    - HdfsReadThreadConcurrencyCountPercentage=0: 20.94
    - HdfsReadThreadConcurrencyCountPercentage=1: 9.44
    - HdfsReadThreadConcurrencyCountPercentage=2: 3.43
    - HdfsReadThreadConcurrencyCountPercentage=3: 5.31
    - HdfsReadThreadConcurrencyCountPercentage=4: 0.46
    - HdfsReadThreadConcurrencyCountPercentage=5: 1.56
    - HdfsReadThreadConcurrencyCountPercentage=6: 2.31
    - HdfsReadThreadConcurrencyCountPercentage=7: 1.30
    - HdfsReadThreadConcurrencyCountPercentage=8: 55.23
    - AverageIoMgrQueueCapcity: 256.00
    - AverageIoMgrQueueSize: 0.27
    - AverageScannerThreadConcurrency: 4.31
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s034ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput: 21.64 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryContextSwitches: 729
    - ScannerThreadsTotalWallClockTime: 1m55s
    - MaterializeTupleTime(*): 45s059ms
    - ScannerThreadsSysTime: 126.847ms
    - ScannerThreadsUserTime: 59s447ms
    - ScannerThreadsVoluntaryContextSwitches: 431
    - TotalRawHdfsReadTime(*): *3m5s * // This time
    value is much increased comparing to the last result.
    - TotalReadThroughput: 172.11 MB/sec


    I would be grateful if anybody can give me some examples when to use the
    option -num_threads_per_disk for improving query performance.

    Thanks in advance.
  • Franck Gallos at May 30, 2013 at 12:06 pm
    Hi Lenny,

    Thank you very much.
    On result of your suggested quick search for the next readers :
    http://grokbase.com/t/hadoop/hdfs-user/132bewmh3f/mutiple-dfs-data-dir-vs-raid0


    best regards

    Franck


    2013/5/29 Lenni Kuff <lskuff@cloudera.com>
    Hi Franck,
    As Nong mentioned, you should not need to explicitly set the
    --num_threads_per_disk option since Impala detects the number of disks and
    sets this to an appropriate internal value for you. That being said, you
    should see improved performance if you replace the raid configuration with
    separate disks and stripe HDFS data directories across them. In general, we
    do not recommend using RAID with HDFS for a number of reasons (a quick
    search of "hdfs raid0" will give you more info if you are interested). In
    short, Impala (and HDFS) will also be better able to parallelize it's I/O
    outside of a RAID configuration.

    Once you make the change Impala will detect you have more disks without
    you needing to set the --num_threads_per_disk flag.

    Thanks,
    Lenni

    On Wed, May 29, 2013 at 5:28 AM, Franck Gallos wrote:

    Hi,

    Very interesting thought.
    My datanodes have hdfs dn storage on 6 disks raid 0 (stripping)
    configured. I'm wondering if I replace the raid 0 by 6 separate disks and I
    specify /dev/sdb1, sdb2, ...,sd6 on hadoop configuration, can I set
    --num_threads_per_disk option to 6 ?
    In yours opinions, it's the best way to optimize reads and hardware
    utilization ?

    Regards,

    Franck


    2013/5/28 Jung-Yup Lee <ljykjh@gmail.com>
    Thanks for replying, Nong.
    You're right, with SSDs if we set the number of readers to the number of
    CPU cores, we would expect to see maximized reading performance.
    As you already know, SSDs can fast random access for reading, as there
    is no read/write head to move.

    On Wednesday, May 29, 2013 12:49:54 AM UTC+9, Nong wrote:

    Hi,

    This option is an internal/debug option we used for development.
    You'll likely see it removed in
    future releases and Impala will determine the optimal value based on
    the system setup.

    Increasing the number of concurrent readers on a single disk does not
    improvement throughput as
    you have seen. Issuing multiple IO requests increases the amount of
    seeks and the amount of
    work that can actually be parallelized is minimal. I would expect to
    see improved performance on
    SSDs, but again, Impala can determine that on startup.

    Thanks
    Nong

    On Mon, May 27, 2013 at 5:48 PM, Jung-Yup Lee wrote:

    I've done some tests for finding out the effect of the option
    -num_threads_per_disk.
    Contrary to my expectations, the result is that adding more number of
    threads per disk will decrease the overall query performance a little bit
    rather than improve.

    a. -num_threads_per_disk=1 : HDFS_SCAN_NODE takes 12s231ms (without
    os caching)
    b. -num_threads_per_disk=24 : HDFS_SCAN_NODE takes 13s954ms (without
    os caching)

    My cluster consists of 8 DNs, and each node is equipped with 24-core
    CPU, 64 GB memory and 6 disks.
    The par_test1 table contains 40M rows with 112 Parquet files. Each
    Parquet file size is about 200Mb.

    I think the performance decrease is mainly because of two reasons:
    First concurrent reads from different read threads
    will increase seek operations for reading the file blocks introducing
    an overhead which could slow down the HDFS_SCAN operation
    and second many read threads will increase the cost of context
    switches.
    I don’t know if my interpretation is correct. If I am missing
    something, please correct me.

    Some part of query profiles of each different condition is shown below.

    1. set -num_threads_per_disk=1 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 12s232ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:9s556ms max:18s748ms mean: 12s585ms
    stddev:3s270ms
    execution rates: min:111.27 MB/sec max:266.61 MB/sec
    mean:207.75 MB/sec stddev:52.87 MB/sec
    num instances: 8
    - AverageThreadTokens: 6.99
    - RowsProduced: 0
    CodeGen:(Active: 483.932ms, % non-child: 5.90%)
    - CodegenTime: 43.720ms
    - CompileTime: 241.83ms
    - LoadTime: 242.846ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 20.838us, % non-child:
    0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *12s231ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurren**cy: *1.96* //
    not IO bounded
    - HdfsReadThreadConcurrencyCount**Percentage=0: 26.50
    - HdfsReadThreadConcurrencyCount**Percentage=1: 24.29
    - HdfsReadThreadConcurrencyCount**Percentage=2: 13.58
    - HdfsReadThreadConcurrencyCount**Percentage=3: 9.77
    - HdfsReadThreadConcurrencyCount**Percentage=4: 16.08
    - HdfsReadThreadConcurrencyCount**Percentage=5: 7.58
    - HdfsReadThreadConcurrencyCount**Percentage=6: 2.20
    - HdfsReadThreadConcurrencyCount**Percentage=7: 0.00
    - HdfsReadThreadConcurrencyCount**Percentage=8: 0.00
    - AverageIoMgrQueueCapcity: 246.08 // Is this a queue
    which containing the ScanRange objects?
    - AverageIoMgrQueueSize: 0.61
    - AverageScannerThreadConcurrenc**y: 4.62
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s205ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput**: 106.12 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryConte**xtSwitches: 1.10K (1098)
    - ScannerThreadsTotalWallClockTi**me: 1m35s
    - MaterializeTupleTime(*): 39s836ms
    - ScannerThreadsSysTime: 514.290ms
    - ScannerThreadsUserTime: 53s864ms
    - ScannerThreadsVoluntaryContext**Switches: 2.40K (2396)
    - TotalRawHdfsReadTime(*): *23s547ms*
    - TotalReadThroughput: 203.31 MB/sec


    2. set -num_threads_per_disk=24 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 13s956ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:11s052ms max:17s233ms mean: 14s310ms
    stddev:2s086ms
    execution rates: min:116.85 MB/sec max:230.53 MB/sec
    mean:175.80 MB/sec stddev:34.31 MB/sec
    num instances: 8
    - AverageThreadTokens: 7.67
    - RowsProduced: 0
    CodeGen:(Active: 269.13ms, % non-child: 1.83%)
    - CodegenTime: 16.262ms
    - CompileTime: 229.842ms
    - LoadTime: 39.169ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 18.305us, % non-child:
    0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *13s954ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurren**cy: *13.65* *//
    too many hdfs read threads*
    - HdfsReadThreadConcurrencyCount**Percentage=0: 20.94
    - HdfsReadThreadConcurrencyCount**Percentage=1: 9.44
    - HdfsReadThreadConcurrencyCount**Percentage=2: 3.43
    - HdfsReadThreadConcurrencyCount**Percentage=3: 5.31
    - HdfsReadThreadConcurrencyCount**Percentage=4: 0.46
    - HdfsReadThreadConcurrencyCount**Percentage=5: 1.56
    - HdfsReadThreadConcurrencyCount**Percentage=6: 2.31
    - HdfsReadThreadConcurrencyCount**Percentage=7: 1.30
    - HdfsReadThreadConcurrencyCount**Percentage=8: 55.23
    - AverageIoMgrQueueCapcity: 256.00
    - AverageIoMgrQueueSize: 0.27
    - AverageScannerThreadConcurrenc**y: 4.31
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s034ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput**: 21.64 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryConte**xtSwitches: 729
    - ScannerThreadsTotalWallClockTi**me: 1m55s
    - MaterializeTupleTime(*): 45s059ms
    - ScannerThreadsSysTime: 126.847ms
    - ScannerThreadsUserTime: 59s447ms
    - ScannerThreadsVoluntaryContext**Switches: 431
    - TotalRawHdfsReadTime(*): *3m5s * // This time
    value is much increased comparing to the last result.
    - TotalReadThroughput: 172.11 MB/sec


    I would be grateful if anybody can give me some examples when to use
    the option -num_threads_per_disk for improving query performance.

    Thanks in advance.
  • Lenni Kuff at May 30, 2013 at 3:00 pm
    Thanks for posting the link. There is also a good explanation in Tom
    White's book: Hadoop: The Definitive Guide:

    "HDFS clusters do not benefit from using RAID (Redundant Array of
    Independent Disks) for datanode storage (although RAID is recommended for
    the namenode’s disks, to protect against corruption of its metadata). The
    redundancy that RAID provides is not needed, since HDFS handles it by
    replication between nodes.

    Furthermore, RAID striping (RAID 0), which is commonly used to increase
    performance, turns out to be slower than the JBOD (Just a Bunch Of Disks)
    configuration used by HDFS, which round-robins HDFS blocks between all
    disks. The reason for this is that RAID 0 read and write operations are
    limited by the speed of the slowest disk in the RAID array. In JBOD, disk
    operations are independent, so the average speed of operations is greater
    than that of the slowest disk. Disk performance often shows considerable
    variation in practice, even for disks of the same model. In some
    benchmarking carried out on a Yahoo! cluster (
    http://markmail.org/message/xmzc45zi25htr7ry), JBOD performed 10% faster
    than RAID 0 in one test (Gridmix), and 30% better in another (HDFS write
    throughput).

    Finally, if a disk fails in a JBOD configuration, HDFS can continue to
    operate without the failed disk, whereas with RAID, failure of a single
    disk causes the whole array (and hence the node) to become unavailable."

    Thanks,
    Lenni

    On Thu, May 30, 2013 at 5:06 AM, Franck Gallos wrote:

    Hi Lenny,

    Thank you very much.
    On result of your suggested quick search for the next readers :

    http://grokbase.com/t/hadoop/hdfs-user/132bewmh3f/mutiple-dfs-data-dir-vs-raid0


    best regards

    Franck


    2013/5/29 Lenni Kuff <lskuff@cloudera.com>
    Hi Franck,
    As Nong mentioned, you should not need to explicitly set the
    --num_threads_per_disk option since Impala detects the number of disks and
    sets this to an appropriate internal value for you. That being said, you
    should see improved performance if you replace the raid configuration with
    separate disks and stripe HDFS data directories across them. In general, we
    do not recommend using RAID with HDFS for a number of reasons (a quick
    search of "hdfs raid0" will give you more info if you are interested). In
    short, Impala (and HDFS) will also be better able to parallelize it's I/O
    outside of a RAID configuration.

    Once you make the change Impala will detect you have more disks without
    you needing to set the --num_threads_per_disk flag.

    Thanks,
    Lenni

    On Wed, May 29, 2013 at 5:28 AM, Franck Gallos wrote:

    Hi,

    Very interesting thought.
    My datanodes have hdfs dn storage on 6 disks raid 0 (stripping)
    configured. I'm wondering if I replace the raid 0 by 6 separate disks and I
    specify /dev/sdb1, sdb2, ...,sd6 on hadoop configuration, can I set
    --num_threads_per_disk option to 6 ?
    In yours opinions, it's the best way to optimize reads and hardware
    utilization ?

    Regards,

    Franck


    2013/5/28 Jung-Yup Lee <ljykjh@gmail.com>
    Thanks for replying, Nong.
    You're right, with SSDs if we set the number of readers to the number
    of CPU cores, we would expect to see maximized reading performance.
    As you already know, SSDs can fast random access for reading, as there
    is no read/write head to move.

    On Wednesday, May 29, 2013 12:49:54 AM UTC+9, Nong wrote:

    Hi,

    This option is an internal/debug option we used for development.
    You'll likely see it removed in
    future releases and Impala will determine the optimal value based on
    the system setup.

    Increasing the number of concurrent readers on a single disk does not
    improvement throughput as
    you have seen. Issuing multiple IO requests increases the amount of
    seeks and the amount of
    work that can actually be parallelized is minimal. I would expect to
    see improved performance on
    SSDs, but again, Impala can determine that on startup.

    Thanks
    Nong

    On Mon, May 27, 2013 at 5:48 PM, Jung-Yup Lee wrote:

    I've done some tests for finding out the effect of the option
    -num_threads_per_disk.
    Contrary to my expectations, the result is that adding more number of
    threads per disk will decrease the overall query performance a little bit
    rather than improve.

    a. -num_threads_per_disk=1 : HDFS_SCAN_NODE takes 12s231ms
    (without os caching)
    b. -num_threads_per_disk=24 : HDFS_SCAN_NODE takes 13s954ms
    (without os caching)

    My cluster consists of 8 DNs, and each node is equipped with 24-core
    CPU, 64 GB memory and 6 disks.
    The par_test1 table contains 40M rows with 112 Parquet files. Each
    Parquet file size is about 200Mb.

    I think the performance decrease is mainly because of two reasons:
    First concurrent reads from different read threads
    will increase seek operations for reading the file blocks introducing
    an overhead which could slow down the HDFS_SCAN operation
    and second many read threads will increase the cost of context
    switches.
    I don’t know if my interpretation is correct. If I am missing
    something, please correct me.

    Some part of query profiles of each different condition is shown
    below.

    1. set -num_threads_per_disk=1 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 12s232ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:9s556ms max:18s748ms mean: 12s585ms
    stddev:3s270ms
    execution rates: min:111.27 MB/sec max:266.61 MB/sec
    mean:207.75 MB/sec stddev:52.87 MB/sec
    num instances: 8
    - AverageThreadTokens: 6.99
    - RowsProduced: 0
    CodeGen:(Active: 483.932ms, % non-child: 5.90%)
    - CodegenTime: 43.720ms
    - CompileTime: 241.83ms
    - LoadTime: 242.846ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 20.838us, % non-child:
    0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *12s231ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurren**cy: *1.96* //
    not IO bounded
    - HdfsReadThreadConcurrencyCount**Percentage=0: 26.50
    - HdfsReadThreadConcurrencyCount**Percentage=1: 24.29
    - HdfsReadThreadConcurrencyCount**Percentage=2: 13.58
    - HdfsReadThreadConcurrencyCount**Percentage=3: 9.77
    - HdfsReadThreadConcurrencyCount**Percentage=4: 16.08
    - HdfsReadThreadConcurrencyCount**Percentage=5: 7.58
    - HdfsReadThreadConcurrencyCount**Percentage=6: 2.20
    - HdfsReadThreadConcurrencyCount**Percentage=7: 0.00
    - HdfsReadThreadConcurrencyCount**Percentage=8: 0.00
    - AverageIoMgrQueueCapcity: 246.08 // Is this a
    queue which containing the ScanRange objects?
    - AverageIoMgrQueueSize: 0.61
    - AverageScannerThreadConcurrenc**y: 4.62
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s205ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput**: 106.12 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryConte**xtSwitches: 1.10K (1098)
    - ScannerThreadsTotalWallClockTi**me: 1m35s
    - MaterializeTupleTime(*): 39s836ms
    - ScannerThreadsSysTime: 514.290ms
    - ScannerThreadsUserTime: 53s864ms
    - ScannerThreadsVoluntaryContext**Switches: 2.40K (2396)
    - TotalRawHdfsReadTime(*): *23s547ms*
    - TotalReadThroughput: 203.31 MB/sec


    2. set -num_threads_per_disk=24 and query "select <24 columns> from
    par_test1 where col2='GXG201' limit 100;"

    Averaged Fragment 1:(Active: 13s956ms, % non-child: 0.00%)
    split sizes: min: 1.89 GB, max: 2.54 GB, avg: 2.39 GB, stddev:
    204.96 MB
    completion times: min:11s052ms max:17s233ms mean: 14s310ms
    stddev:2s086ms
    execution rates: min:116.85 MB/sec max:230.53 MB/sec
    mean:175.80 MB/sec stddev:34.31 MB/sec
    num instances: 8
    - AverageThreadTokens: 7.67
    - RowsProduced: 0
    CodeGen:(Active: 269.13ms, % non-child: 1.83%)
    - CodegenTime: 16.262ms
    - CompileTime: 229.842ms
    - LoadTime: 39.169ms
    - ModuleFileSize: 73.10 KB
    DataStreamSender (dst_id=1):(Active: 18.305us, % non-child:
    0.00%)
    - BytesSent: 0.00
    - NetworkThroughput(*): 0.00 /sec
    - OverallThroughput: 0.00 /sec
    - SerializeBatchTime: 0ns
    - ThriftTransmitTime(*): 0ns
    - UncompressedRowBatchSize: 0.00
    HDFS_SCAN_NODE (id=0):(Active: *13s954ms*, % non-child: 99.99%)
    - AverageHdfsReadThreadConcurren**cy: *13.65* *//
    too many hdfs read threads*
    - HdfsReadThreadConcurrencyCount**Percentage=0: 20.94
    - HdfsReadThreadConcurrencyCount**Percentage=1: 9.44
    - HdfsReadThreadConcurrencyCount**Percentage=2: 3.43
    - HdfsReadThreadConcurrencyCount**Percentage=3: 5.31
    - HdfsReadThreadConcurrencyCount**Percentage=4: 0.46
    - HdfsReadThreadConcurrencyCount**Percentage=5: 1.56
    - HdfsReadThreadConcurrencyCount**Percentage=6: 2.31
    - HdfsReadThreadConcurrencyCount**Percentage=7: 1.30
    - HdfsReadThreadConcurrencyCount**Percentage=8: 55.23
    - AverageIoMgrQueueCapcity: 256.00
    - AverageIoMgrQueueSize: 0.27
    - AverageScannerThreadConcurrenc**y: 4.31
    - BytesRead: 2.39 GB
    - DecompressionTime: 14s034ms
    - MemoryUsed: 0.00
    - NumColumns: 24
    - NumDisksAccessed: 6
    - PerReadThreadRawHdfsThroughput**: 21.64 MB/sec
    - RowsRead: 51.22M (51221152)
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 14
    - ScannerThreadsInvoluntaryConte**xtSwitches: 729
    - ScannerThreadsTotalWallClockTi**me: 1m55s
    - MaterializeTupleTime(*): 45s059ms
    - ScannerThreadsSysTime: 126.847ms
    - ScannerThreadsUserTime: 59s447ms
    - ScannerThreadsVoluntaryContext**Switches: 431
    - TotalRawHdfsReadTime(*): *3m5s * // This time
    value is much increased comparing to the last result.
    - TotalReadThroughput: 172.11 MB/sec


    I would be grateful if anybody can give me some examples when to use
    the option -num_threads_per_disk for improving query performance.

    Thanks in advance.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 27, '13 at 9:48p
activeMay 30, '13 at 3:00p
posts4
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase