FAQ

comparison between Hive and Impala

Lyrebird1999
Dec 28, 2012 at 3:55 pm
My test environment
node1 datanode(impalad) VM 4CPU 4G mem
node2 datanode(impalad) VM 4CPU 4G mem
node3 datanode(impalad) VM 4CPU 8G mem
node4 namenode(statestored)


my sql like this:
select avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
FROM store_sales


table store_sales is a text file, with a file size 39GB

the log shows: node1 takes 3m54s to finish the execution, but node2 takes
10m2s and node3 takes 10m8s to finish the execution.

I paste the log in node3(the coordinator), could anyone tell me why it
takes such a long time?

--
reply

Search Discussions

3 responses

  • Lyrebird1999 at Dec 30, 2012 at 2:00 am
    In this case, Hive takes 5 minutes, less than Impala.
    In the log file, the HDFS SCAN in one datanode is much faster than the
    other tow.
    Could anyone tell me why?

    在 2012年12月28日星期五UTC-5上午10时55分14秒,lyrebird1999写道:
    My test environment
    node1 datanode(impalad) VM 4CPU 4G mem
    node2 datanode(impalad) VM 4CPU 4G mem
    node3 datanode(impalad) VM 4CPU 8G mem
    node4 namenode(statestored)


    my sql like this:
    select avg(ss_quantity) agg1,
    avg(ss_list_price) agg2,
    avg(ss_coupon_amt) agg3,
    avg(ss_sales_price) agg4
    FROM store_sales


    table store_sales is a text file, with a file size 39GB

    the log shows: node1 takes 3m54s to finish the execution, but node2 takes
    10m2s and node3 takes 10m8s to finish the execution.

    I paste the log in node3(the coordinator), could anyone tell me why it
    takes such a long time?
    --
  • Marcel Kornacker at Dec 31, 2012 at 7:25 pm

    On Sat, Dec 29, 2012 at 6:00 PM, lyrebird1999 wrote:
    In this case, Hive takes 5 minutes, less than Impala.
    In the log file, the HDFS SCAN in one datanode is much faster than the other
    tow.
    Could anyone tell me why?
    It's impossible to tell what's going on, given only the profile. What
    else were you running on the machine? In particular, since you were
    running this in a VM, were there other VMs accessing the same disk
    while you were running this?

    One scan node sees this:
    - PerDiskReadThroughput: 56.53 MB/sec

    The others see this:
    - PerDiskReadThroughput: 21.62 MB/sec

    While this is much lower than the first node, even 50 MB/sec is low
    (~50% of what we'd expect). I suggest re-running this on the hardware
    directly, without any VMs.

    Marcel

    在 2012年12月28日星期五UTC-5上午10时55分14秒,lyrebird1999写道:
    My test environment
    node1 datanode(impalad) VM 4CPU 4G mem
    node2 datanode(impalad) VM 4CPU 4G mem
    node3 datanode(impalad) VM 4CPU 8G mem
    node4 namenode(statestored)


    my sql like this:
    select avg(ss_quantity) agg1,
    avg(ss_list_price) agg2,
    avg(ss_coupon_amt) agg3,
    avg(ss_sales_price) agg4
    FROM store_sales


    table store_sales is a text file, with a file size 39GB

    the log shows: node1 takes 3m54s to finish the execution, but node2 takes
    10m2s and node3 takes 10m8s to finish the execution.

    I paste the log in node3(the coordinator), could anyone tell me why it
    takes such a long time?
    --
    --
  • Lyrebird1999 at Dec 31, 2012 at 5:24 am
    Why nobody replied to my problem? please help!

    在 2012年12月28日星期五UTC-5上午10时55分14秒,lyrebird1999写道:
    My test environment
    node1 datanode(impalad) VM 4CPU 4G mem
    node2 datanode(impalad) VM 4CPU 4G mem
    node3 datanode(impalad) VM 4CPU 8G mem
    node4 namenode(statestored)


    my sql like this:
    select avg(ss_quantity) agg1,
    avg(ss_list_price) agg2,
    avg(ss_coupon_amt) agg3,
    avg(ss_sales_price) agg4
    FROM store_sales


    table store_sales is a text file, with a file size 39GB

    the log shows: node1 takes 3m54s to finish the execution, but node2 takes
    10m2s and node3 takes 10m8s to finish the execution.

    I paste the log in node3(the coordinator), could anyone tell me why it
    takes such a long time?
    --

Related Discussions

Discussion Navigation
viewthread | post

2 users in discussion

Lyrebird1999: 3 posts Marcel Kornacker: 1 post