FAQ
I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
seeing some good performance, although I'm hitting a strange issue with one
query that is returning 5 million rows -- Impala tells me within a few
seconds that the query is finished, but then it takes 20 minutes to fetch
the rows. Is this a known issue with a work-around?

Here are some relevant bits from my query log:

     Query Timeline: 21m18s
        - Start execution: 3.778us (3.778us)
        - Planning finished: 58.781ms (58.778ms)
        - Rows available: 728.982ms (670.200ms)
        - First row fetched: 731.956ms (2.973ms)
        - Unregister query: 21m18s (21m17s)
   ImpalaServer:
      - RowMaterializationTimer: 14s901ms
   Execution Profile 8ad5e4df095341d3:9c105c56e6a5bc7c:(Active: 22s241ms, %
non-child: 0.00%)
      - FinalizationTimer: 0ns
     Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
        - AverageThreadTokens: 1.00
        - RowsProduced: 5.23M (5233538)
       CodeGen:(Active: 70.136ms, % non-child: 0.33%)
          - CodegenTime: 0ns
          - CompileTime: 64.257ms
          - LoadTime: 5.879ms
          - ModuleFileSize: 73.10 KB
       EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
          - BytesReceived: 96.19 MB
          - ConvertRowBatchTime: 158.656ms
          - DataArrivalWaitTime: 21s334ms
          - DeserializeRowBatchTimer: 698.871ms
          - FirstBatchArrivalWaitTime: 0ns
          - MemoryUsed: 0.00
          - RowsReturned: 5.23M (5233538)
          - RowsReturnedRate: 242.92 K/sec
          - SendersBlockedTimer: 18m45s
          - SendersBlockedTotalTimer(*): 5h22m

I'm happy to send over more of the query profile off-list if that's
helpful, but I was wondering if there's something obvious just from that
much. I'm just piping the results to /dev/null [1] , so I don't think it's
a local disk write issue on the client.

Thanks,
Joe

[1] I'm issuing: time impala-shell -f query1.sql > /dev/null

Search Discussions

  • Alan at May 16, 2013 at 8:22 pm
    Hi Joe,

    It's very likely that the Impala shell is taking too much time to consume
    and pretty print the output, even though you are directing the output to
    dev null. It's not an indication of performance problem on the Impala
    server.

    Can you share with me what you're trying to accomplish? Are you trying to
    fetch all 5millions rows and put it in a local file, or you're simply
    trying to do some query benchmark?

    Thanks,
    Alan
    On Tuesday, May 14, 2013 1:58:10 PM UTC-7, Joe Crobak wrote:

    I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
    seeing some good performance, although I'm hitting a strange issue with one
    query that is returning 5 million rows -- Impala tells me within a few
    seconds that the query is finished, but then it takes 20 minutes to fetch
    the rows. Is this a known issue with a work-around?

    Here are some relevant bits from my query log:

    Query Timeline: 21m18s
    - Start execution: 3.778us (3.778us)
    - Planning finished: 58.781ms (58.778ms)
    - Rows available: 728.982ms (670.200ms)
    - First row fetched: 731.956ms (2.973ms)
    - Unregister query: 21m18s (21m17s)
    ImpalaServer:
    - RowMaterializationTimer: 14s901ms
    Execution Profile 8ad5e4df095341d3:9c105c56e6a5bc7c:(Active: 22s241ms, %
    non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
    - AverageThreadTokens: 1.00
    - RowsProduced: 5.23M (5233538)
    CodeGen:(Active: 70.136ms, % non-child: 0.33%)
    - CodegenTime: 0ns
    - CompileTime: 64.257ms
    - LoadTime: 5.879ms
    - ModuleFileSize: 73.10 KB
    EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
    - BytesReceived: 96.19 MB
    - ConvertRowBatchTime: 158.656ms
    - DataArrivalWaitTime: 21s334ms
    - DeserializeRowBatchTimer: 698.871ms
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 5.23M (5233538)
    - RowsReturnedRate: 242.92 K/sec
    - SendersBlockedTimer: 18m45s
    - SendersBlockedTotalTimer(*): 5h22m

    I'm happy to send over more of the query profile off-list if that's
    helpful, but I was wondering if there's something obvious just from that
    much. I'm just piping the results to /dev/null [1] , so I don't think it's
    a local disk write issue on the client.

    Thanks,
    Joe

    [1] I'm issuing: time impala-shell -f query1.sql > /dev/null
  • Joe Crobak at May 16, 2013 at 10:41 pm

    On Thu, May 16, 2013 at 4:22 PM, Alan wrote:

    Hi Joe,

    It's very likely that the Impala shell is taking too much time to consume
    and pretty print the output, even though you are directing the output to
    dev null. It's not an indication of performance problem on the Impala
    server.

    OK, thanks for the info. What's a reasonable-sized result set for the
    impala-shell? When I run this same query via shell on vertica and redirect
    to /dev/null, it finishes in 30s.

    Can you share with me what you're trying to accomplish? Are you trying to
    fetch all 5millions rows and put it in a local file, or you're simply
    trying to do some query benchmark?

    I'm trying to do some benchmarking. The application triggering this type
    of query would communicate with impala via jdbc. Is there a better way to
    benchmark that isn't too complicated?

    Thanks,
    Alan

    On Tuesday, May 14, 2013 1:58:10 PM UTC-7, Joe Crobak wrote:

    I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
    seeing some good performance, although I'm hitting a strange issue with one
    query that is returning 5 million rows -- Impala tells me within a few
    seconds that the query is finished, but then it takes 20 minutes to fetch
    the rows. Is this a known issue with a work-around?

    Here are some relevant bits from my query log:

    Query Timeline: 21m18s
    - Start execution: 3.778us (3.778us)
    - Planning finished: 58.781ms (58.778ms)
    - Rows available: 728.982ms (670.200ms)
    - First row fetched: 731.956ms (2.973ms)
    - Unregister query: 21m18s (21m17s)
    ImpalaServer:
    - RowMaterializationTimer: 14s901ms
    Execution Profile 8ad5e4df095341d3:**9c105c56e6a5bc7c:(Active:
    22s241ms, % non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
    - AverageThreadTokens: 1.00
    - RowsProduced: 5.23M (5233538)
    CodeGen:(Active: 70.136ms, % non-child: 0.33%)
    - CodegenTime: 0ns
    - CompileTime: 64.257ms
    - LoadTime: 5.879ms
    - ModuleFileSize: 73.10 KB
    EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
    - BytesReceived: 96.19 MB
    - ConvertRowBatchTime: 158.656ms
    - DataArrivalWaitTime: 21s334ms
    - DeserializeRowBatchTimer: 698.871ms
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 5.23M (5233538)
    - RowsReturnedRate: 242.92 K/sec
    - SendersBlockedTimer: 18m45s
    - SendersBlockedTotalTimer(*): 5h22m

    I'm happy to send over more of the query profile off-list if that's
    helpful, but I was wondering if there's something obvious just from that
    much. I'm just piping the results to /dev/null [1] , so I don't think it's
    a local disk write issue on the client.

    Thanks,
    Joe

    [1] I'm issuing: time impala-shell -f query1.sql > /dev/null
  • Henry Robinson at May 16, 2013 at 10:50 pm
    Hi Joe -

    If you're comfortable editing the python shell source, we can set up a
    slightly more realistic benchmarking environment.

    The problem with the shell is the call to "print table" on line ~530 of
    impala_shell.py. If you comment that out, that will disable rendering the
    results in a 'pretty' structured table. You'll then a get a good
    measurement for how long the query takes, including the time taken to fetch
    the rows, without including the expensive rendering step in the client.

    In an upcoming release, we will have a better way of configuring the shell
    to suppress writing the output to the console.

    Let me know if you have any questions -

    Henry

    On 16 May 2013 15:41, Joe Crobak wrote:
    On Thu, May 16, 2013 at 4:22 PM, Alan wrote:

    Hi Joe,

    It's very likely that the Impala shell is taking too much time to consume
    and pretty print the output, even though you are directing the output to
    dev null. It's not an indication of performance problem on the Impala
    server.

    OK, thanks for the info. What's a reasonable-sized result set for the
    impala-shell? When I run this same query via shell on vertica and redirect
    to /dev/null, it finishes in 30s.

    Can you share with me what you're trying to accomplish? Are you trying to
    fetch all 5millions rows and put it in a local file, or you're simply
    trying to do some query benchmark?

    I'm trying to do some benchmarking. The application triggering this type
    of query would communicate with impala via jdbc. Is there a better way to
    benchmark that isn't too complicated?

    Thanks,
    Alan

    On Tuesday, May 14, 2013 1:58:10 PM UTC-7, Joe Crobak wrote:

    I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
    seeing some good performance, although I'm hitting a strange issue with one
    query that is returning 5 million rows -- Impala tells me within a few
    seconds that the query is finished, but then it takes 20 minutes to fetch
    the rows. Is this a known issue with a work-around?

    Here are some relevant bits from my query log:

    Query Timeline: 21m18s
    - Start execution: 3.778us (3.778us)
    - Planning finished: 58.781ms (58.778ms)
    - Rows available: 728.982ms (670.200ms)
    - First row fetched: 731.956ms (2.973ms)
    - Unregister query: 21m18s (21m17s)
    ImpalaServer:
    - RowMaterializationTimer: 14s901ms
    Execution Profile 8ad5e4df095341d3:**9c105c56e6a5bc7c:(Active:
    22s241ms, % non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
    - AverageThreadTokens: 1.00
    - RowsProduced: 5.23M (5233538)
    CodeGen:(Active: 70.136ms, % non-child: 0.33%)
    - CodegenTime: 0ns
    - CompileTime: 64.257ms
    - LoadTime: 5.879ms
    - ModuleFileSize: 73.10 KB
    EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
    - BytesReceived: 96.19 MB
    - ConvertRowBatchTime: 158.656ms
    - DataArrivalWaitTime: 21s334ms
    - DeserializeRowBatchTimer: 698.871ms
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 5.23M (5233538)
    - RowsReturnedRate: 242.92 K/sec
    - SendersBlockedTimer: 18m45s
    - SendersBlockedTotalTimer(*): 5h22m

    I'm happy to send over more of the query profile off-list if that's
    helpful, but I was wondering if there's something obvious just from that
    much. I'm just piping the results to /dev/null [1] , so I don't think it's
    a local disk write issue on the client.

    Thanks,
    Joe

    [1] I'm issuing: time impala-shell -f query1.sql > /dev/null

    --
    Henry Robinson
    Software Engineer
    Cloudera
    415-994-6679
  • Joe Crobak at May 17, 2013 at 4:01 pm
    Great, thanks Henry.

    On Thu, May 16, 2013 at 6:50 PM, Henry Robinson wrote:

    Hi Joe -

    If you're comfortable editing the python shell source, we can set up a
    slightly more realistic benchmarking environment.

    The problem with the shell is the call to "print table" on line ~530 of
    impala_shell.py. If you comment that out, that will disable rendering the
    results in a 'pretty' structured table. You'll then a get a good
    measurement for how long the query takes, including the time taken to fetch
    the rows, without including the expensive rendering step in the client.

    In an upcoming release, we will have a better way of configuring the shell
    to suppress writing the output to the console.

    Let me know if you have any questions -

    Henry

    On 16 May 2013 15:41, Joe Crobak wrote:
    On Thu, May 16, 2013 at 4:22 PM, Alan wrote:

    Hi Joe,

    It's very likely that the Impala shell is taking too much time to
    consume and pretty print the output, even though you are directing the
    output to dev null. It's not an indication of performance problem on the
    Impala server.

    OK, thanks for the info. What's a reasonable-sized result set for the
    impala-shell? When I run this same query via shell on vertica and redirect
    to /dev/null, it finishes in 30s.

    Can you share with me what you're trying to accomplish? Are you trying
    to fetch all 5millions rows and put it in a local file, or you're simply
    trying to do some query benchmark?

    I'm trying to do some benchmarking. The application triggering this type
    of query would communicate with impala via jdbc. Is there a better way to
    benchmark that isn't too complicated?

    Thanks,
    Alan

    On Tuesday, May 14, 2013 1:58:10 PM UTC-7, Joe Crobak wrote:

    I'm trying out some of our DW queries on Impala 1.0 with CDH 4.1.4. I'm
    seeing some good performance, although I'm hitting a strange issue with one
    query that is returning 5 million rows -- Impala tells me within a few
    seconds that the query is finished, but then it takes 20 minutes to fetch
    the rows. Is this a known issue with a work-around?

    Here are some relevant bits from my query log:

    Query Timeline: 21m18s
    - Start execution: 3.778us (3.778us)
    - Planning finished: 58.781ms (58.778ms)
    - Rows available: 728.982ms (670.200ms)
    - First row fetched: 731.956ms (2.973ms)
    - Unregister query: 21m18s (21m17s)
    ImpalaServer:
    - RowMaterializationTimer: 14s901ms
    Execution Profile 8ad5e4df095341d3:**9c105c56e6a5bc7c:(Active:
    22s241ms, % non-child: 0.00%)
    - FinalizationTimer: 0ns
    Coordinator Fragment:(Active: 21s548ms, % non-child: 0.00%)
    - AverageThreadTokens: 1.00
    - RowsProduced: 5.23M (5233538)
    CodeGen:(Active: 70.136ms, % non-child: 0.33%)
    - CodegenTime: 0ns
    - CompileTime: 64.257ms
    - LoadTime: 5.879ms
    - ModuleFileSize: 73.10 KB
    EXCHANGE_NODE (id=4):(Active: 21s543ms, % non-child: 99.98%)
    - BytesReceived: 96.19 MB
    - ConvertRowBatchTime: 158.656ms
    - DataArrivalWaitTime: 21s334ms
    - DeserializeRowBatchTimer: 698.871ms
    - FirstBatchArrivalWaitTime: 0ns
    - MemoryUsed: 0.00
    - RowsReturned: 5.23M (5233538)
    - RowsReturnedRate: 242.92 K/sec
    - SendersBlockedTimer: 18m45s
    - SendersBlockedTotalTimer(*): 5h22m

    I'm happy to send over more of the query profile off-list if that's
    helpful, but I was wondering if there's something obvious just from that
    much. I'm just piping the results to /dev/null [1] , so I don't think it's
    a local disk write issue on the client.

    Thanks,
    Joe

    [1] I'm issuing: time impala-shell -f query1.sql > /dev/null

    --
    Henry Robinson
    Software Engineer
    Cloudera
    415-994-6679

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 14, '13 at 8:58p
activeMay 17, '13 at 4:01p
posts5
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase