FAQ
Could you provide the query profiles for the queries you're testing? These
will help identify where the bottleneck is. (Query profiles can be accessed
via the impalad web interface at http://your-impalad-server:25000/queries.)

On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

I've setup a bench platform ona physical machine. I'm running CH 4.2.0 and
Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5 6x150GB
SAS drives, running RedHat 6.3.
I'm running Impala tests on it and the results are in some way
disappointing.

I'm using a table having the following schema :

delivery_srv string
media_type string
server_name string
error_code string
dte string
count_sessions int
cmd string
status_code string
client_ip string
client_type string
session_id string
asset_name string
count_sent int
indicator int
interface string
server_ip string
customer_id string

I've loaded content into this table through hive as follows :

SET hive.exec.compress.output=true;
SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

create table logs_text(delivery_pop STRING, media_library STRING,
videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
STRING, status_code STRING, client_ip STRING,browser_type STRING,
session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t' LINES TERMINATED by '\n';
load data local inpath '/tmp/logs/*.txt' into table logs_text;

create table logs_sf_snappy (delivery_srv string,
media_type string, server_name string,error_code string,dte
string,count_sessions int,cmd string,status_code
string,client_ip string,client_type string,session_id
string,asset_name string,count_sent int,indicator
int,interface string,server_ip string,customer_id string )
STORED AS SEQUENCEFILE;
insert overwrite table logs_sf_snappy select * from logs_text;

The log files contain around 37 millions rows.

Here are some results :
--> Impala :
Req#1
SELECT COUNT(1) FROM logs_sf_snappy;
Query time : 2.83sec (counted 37 millions rows)
Req#2
SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
5000;
Query time : 6.43sec (grouped 37 millions rows)

--> Postgresql 9.2 (same hw) :
Req#1
Query time : 8.3 sec

Req#2
Query time : 20 sec

These are basically the kind of request I'll run for my use case.

Although Impala is around 300% faster than Postgresql9.2, I was expecting
performance difference of at least one order of magnitude (in other words,
I was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm
wondering if my setup is wrong.
I've setup short-circuit reads and I'm using pseudo-distributed
configuration.

I'm not sure I'm getting most out of this very basic Impala setup, and I
wish anyone could provide a comparison basis, and/or help point out
configuration items that may be wrong.

iotop shows very light hard drive activity at the start of the request, so
disk i/o is most likely not the bottleneck. Of course, running these
queries on the logs_text table is much more longer (eg 42 secs for Req#2).
Regarding cpu, the load average remains quite low (aroud 0.35) during the
request processing, so I'm quite puzzled :(


Any help appreciated !

Thank you

Search Discussions

  • Greg Rahn at Mar 8, 2013 at 9:52 pm
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?
    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

    I've setup a bench platform ona physical machine. I'm running CH 4.2.0 and
    Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5 6x150GB
    SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING, status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
    interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface string,server_ip string,customer_id string )
    STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was expecting
    performance difference of at least one order of magnitude (in other words,
    I was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm
    wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request, so
    disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during the
    request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Sékine Coulibaly at Mar 11, 2013 at 4:21 pm
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?
    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING, status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
    interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface string,server_ip string,customer_id string )
    STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was expecting
    performance difference of at least one order of magnitude (in other words,
    I was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm
    wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during the
    request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Nong Li at Mar 12, 2013 at 1:13 am
    Based on the runtime profile, Impala is running as fast as the hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on IO.

    Are you expecting more IO throughput? I see you have multiple disks but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

      Fragment 1:
           Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
             Hdfs split stats (:<# splits>/): 0:43/2.20B
                <-- One disk
             HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
               File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
                - BytesRead: 2.05 GB
                - PerDiskReadThroughput: 82.83 MB/sec
                <-- Disk throughput

    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly wrote:

    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?
    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING, status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
    interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface string,server_ip string,customer_id string )
    STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Sékine Coulibaly at Mar 12, 2013 at 4:36 pm
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given the
    query type, RC format could be a good guess. I expect loading as little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...



    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on IO.

    Are you expecting more IO throughput? I see you have multiple disks but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.


    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec <-- Disk throughput

    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly wrote:

    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?
    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING, status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
    interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface string,server_ip string,customer_id string )
    STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and
    I wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Marcel Kornacker at Mar 13, 2013 at 4:48 pm
    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024 count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?
    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given the
    query type, RC format could be a good guess. I expect loading as little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on IO.

    Are you expecting more IO throughput? I see you have multiple disks but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING, session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator int,interface
    string,server_ip string,customer_id string ) STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and
    I wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Todd Lipcon at Mar 13, 2013 at 4:57 pm
    Depending on the amount of RAM on the machine, you probably should actually
    use a larger file for this test. To see actual disk throughput I usually
    like to use a file at least 2x RAM, preferably 10x RAM if there is free
    disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before running the
    second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably pretty
    accurate.

    -Todd

    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker wrote:

    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024 count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?
    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given the
    query type, RC format could be a good guess. I expect loading as little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and we
    are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on IO.

    Are you expecting more IO throughput? I see you have multiple disks but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly <scoulibaly@gmail.com
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <
    scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING,
    status_code STRING, client_ip STRING,browser_type STRING, session_id
    STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of magnitude
    (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup,
    and
    I wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course, running
    these
    queries on the logs_text table is much more longer (eg 42 secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Greg Rahn at Mar 13, 2013 at 6:22 pm
    Another option is just to use direct I/O via the dd option "oflag=direct"
    (or iflag=direct for direct path reading)

    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon wrote:

    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before running the
    second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably pretty
    accurate.

    -Todd


    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker wrote:

    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024 count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given the
    query type, RC format could be a good guess. I expect loading as little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and we
    are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on
    IO.
    Are you expecting more IO throughput? I see you have multiple disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly <
    scoulibaly@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <
    scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of magnitude
    (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup,
    and
    I wish anyone could provide a comparison basis, and/or help point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course, running
    these
    queries on the logs_text table is much more longer (eg 42 secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Sékine Coulibaly at Mar 20, 2013 at 10:08 am
    I setup the HDFS data on the RAID array (disregard what I said previously,
    these tests were done on a single drive HDFS), are are the results :

    A raw 100GB file write shows:
    real 11m20
    user 46sec
    sys 1m12

    The HDFS read is 11 minutes long (so around 150MB/s).
    The queries execution time are identical (compared to when they were
    performed on a non-RAID disk), this make me think that the system is not
    I/O bound (I expect more throghput from the RAID array than from a single
    drive).

    When looking at the profile, I can see that most of the time is used in :
      AGGREGATION_NODE (id=1):(5s894ms 94.81%)
      - BuildTime: 5s547ms

    Not sure I can enhance BuildTime from my end. Is there some place for
    enhancement here ?

    Here is the profiling :

    Query (id=d29b07c309d34f27:9ea3d18614178baa):
        - PlanningTime: 7ms
       Query d29b07c309d34f27:9ea3d18614178baa:(260ms 0.00%)
         Aggregate Profile:
         Coordinator Fragment:(5s903ms 0.00%)
            - RowsProduced: 5.00K
           CodeGen:
              - CodegenTime: 2ms
              - CompileTime: 118ms
              - LoadTime: 4ms
              - ModuleFileSize: 44.61 KB
           AGGREGATION_NODE (id=3):(5s906ms 0.59%)
              - BuildBuckets: 32.77K
              - BuildTime: 10ms
              - GetResultsTime: 541K clock cycles
              - MemoryUsed: 2.79 MB
              - RowsReturned: 5.00K
              - RowsReturnedRate: 846.00 /sec
           EXCHANGE_NODE (id=2):(5s871ms 99.46%)
              - BytesReceived: 1.75 MB
              - ConvertRowBatchTime: 1010K clock cycles
              - DeserializeRowBatchTimer: 1ms
              - MemoryUsed: 0.00
              - RowsReturned: 28.37K
              - RowsReturnedRate: 4.83 K/sec
         Averaged Fragment 1:(5s891ms 0.00%)
           completion times: min:5s888ms max:5s888ms mean: 5s888ms stddev:0
           execution rates: min:356.10 MB/sec max:356.10 MB/sec mean:356.10
    MB/sec stddev:0.00 /sec
           split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
            - RowsProduced: 28.37K
           CodeGen:
              - CodegenTime: 4ms
              - CompileTime: 122ms
              - LoadTime: 5ms
              - ModuleFileSize: 44.61 KB
           DataStreamSender:
              - BytesSent: 1.75 MB
              - DataSinkTime: 10ms
              - SerializeBatchTime: 3ms
              - ThriftTransmitTime: 11ms
           AGGREGATION_NODE (id=1):(5s894ms 94.81%)
              - BuildBuckets: 32.77K
              - BuildTime: 5s547ms
              - GetResultsTime: 1ms
              - MemoryUsed: 2.79 MB
              - RowsReturned: 28.37K
              - RowsReturnedRate: 4.81 K/sec
           HDFS_SCAN_NODE (id=0):(309ms 5.25%)
              - BytesRead: 2.05 GB
              - DecompressionTime: 11s149ms
              - MaterializeTupleTime: 4s297ms
              - MemoryUsed: 0.00
              - PerDiskReadThroughput: 2.05 GB/sec
              - RowsReturned: 37.47M
              - RowsReturnedRate: 121.12 M/sec
              - ScanRangesComplete: 43
              - ScannerThreadsReadTime: 996ms
              - TotalReadThroughput: 329.91 MB/sec
         Fragment 1:
           Instance d29b07c309d34f27:9ea3d18614178bac:(5s891ms 0.00%)
             Hdfs split stats (:<# splits>/): 0:43/2.20B
              - RowsProduced: 28.37K
             CodeGen:
                - CodegenTime: 4ms
                - CompileTime: 122ms
                - LoadTime: 5ms
                - ModuleFileSize: 44.61 KB
             DataStreamSender:
                - BytesSent: 1.75 MB
                - DataSinkTime: 10ms
                - SerializeBatchTime: 3ms
                - ThriftTransmitTime: 11ms
             AGGREGATION_NODE (id=1):(5s894ms 94.81%)
                - BuildBuckets: 32.77K
                - BuildTime: 5s547ms
                - GetResultsTime: 1ms
                - MemoryUsed: 2.79 MB
                - RowsReturned: 28.37K
                - RowsReturnedRate: 4.81 K/sec
             HDFS_SCAN_NODE (id=0):(309ms 5.25%)
               File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
                - BytesRead: 2.05 GB
                - DecompressionTime: 11s149ms
                - MaterializeTupleTime: 4s297ms
                - MemoryUsed: 0.00
                - PerDiskReadThroughput: 2.05 GB/sec
                - RowsReturned: 37.47M
                - RowsReturnedRate: 121.12 M/sec
                - ScanRangesComplete: 43
                - ScannerThreadsReadTime: 996ms
                - TotalReadThroughput: 329.91 MB/sec

    Thank you !

    Sekine


    2013/3/13 Greg Rahn <grahn@cloudera.com>
    Another option is just to use direct I/O via the dd option "oflag=direct"
    (or iflag=direct for direct path reading)

    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon wrote:

    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before running
    the second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably pretty
    accurate.

    -Todd


    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker wrote:

    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024 count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given the
    query type, RC format could be a good guess. I expect loading as little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the
    hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and
    we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of
    data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on
    IO.
    Are you expecting more IO throughput? I see you have multiple disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly <
    scoulibaly@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <
    scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of
    magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup,
    and
    I wish anyone could provide a comparison basis, and/or help point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course, running
    these
    queries on the logs_text table is much more longer (eg 42 secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Marcel Kornacker at Mar 22, 2013 at 3:25 pm

    On Wed, Mar 20, 2013 at 3:07 AM, Sékine Coulibaly wrote:
    I setup the HDFS data on the RAID array (disregard what I said previously,
    these tests were done on a single drive HDFS), are are the results :

    A raw 100GB file write shows:
    real 11m20
    user 46sec
    sys 1m12

    The HDFS read is 11 minutes long (so around 150MB/s).
    The queries execution time are identical (compared to when they were
    performed on a non-RAID disk), this make me think that the system is not I/O
    bound (I expect more throghput from the RAID array than from a single
    drive).

    When looking at the profile, I can see that most of the time is used in :
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildTime: 5s547ms

    Not sure I can enhance BuildTime from my end. Is there some place for
    enhancement here ?
    Hi Sekine,

    unfortunately there's no way at the moment to make this query run
    faster. The reason is that aggregation is single-threaded within a
    single plan fragment executing on an impalad.

    However, we are planning on adding multi-threaded join and aggregation
    execution sometime after the GA release, although we can't promise any
    specific dates.

    Marcel
    Here is the profiling :

    Query (id=d29b07c309d34f27:9ea3d18614178baa):
    - PlanningTime: 7ms
    Query d29b07c309d34f27:9ea3d18614178baa:(260ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(5s903ms 0.00%)
    - RowsProduced: 5.00K
    CodeGen:
    - CodegenTime: 2ms
    - CompileTime: 118ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(5s906ms 0.59%)
    - BuildBuckets: 32.77K
    - BuildTime: 10ms
    - GetResultsTime: 541K clock cycles
    - MemoryUsed: 2.79 MB
    - RowsReturned: 5.00K
    - RowsReturnedRate: 846.00 /sec
    EXCHANGE_NODE (id=2):(5s871ms 99.46%)
    - BytesReceived: 1.75 MB
    - ConvertRowBatchTime: 1010K clock cycles
    - DeserializeRowBatchTimer: 1ms
    - MemoryUsed: 0.00
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.83 K/sec
    Averaged Fragment 1:(5s891ms 0.00%)
    completion times: min:5s888ms max:5s888ms mean: 5s888ms stddev:0
    execution rates: min:356.10 MB/sec max:356.10 MB/sec mean:356.10
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec
    Fragment 1:
    Instance d29b07c309d34f27:9ea3d18614178bac:(5s891ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec

    Thank you !

    Sekine


    2013/3/13 Greg Rahn <grahn@cloudera.com>
    Another option is just to use direct I/O via the dd option "oflag=direct"
    (or iflag=direct for direct path reading)

    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon wrote:

    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before running
    the second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably pretty
    accurate.

    -Todd



    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker <marcel@cloudera.com>
    wrote:
    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024 count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's seen
    by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement, given
    the
    query type, RC format could be a good guess. I expect loading as
    little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <nong@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the
    hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk and
    we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of
    data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked on
    IO.

    Are you expecting more IO throughput? I see you have multiple disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly
    <scoulibaly@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <grahn@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly
    <scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET

    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of
    magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup,
    and
    I wish anyone could provide a comparison basis, and/or help point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course, running
    these
    queries on the logs_text table is much more longer (eg 42 secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Venkata Gattala at Dec 1, 2013 at 4:06 pm
    I have a situation where i cannot read any snappy compressed files in
    Impala,

    All i am trying to do is to convert the existing Snappy compressed files
    and turn them into parquet can you please help how can i do that.

    [ausgtmhadoop06:21000] > select * from hivetest3;
    Query: select * from hivetest3
    ERROR: AnalysisException: Failed to load metadata for table:
    parquet_compression.hivetest3
    CAUSED BY: TableLoadingException: Failed to load metadata for table:
    hivetest3
    CAUSED BY: RuntimeException: Compressed text files are not supported:
    hdfs://nameservice1/user/hive/warehouse/parquet_compression.db/hivetest3/000000_0.snappy

    Thanks
    Deepak Gattala
    On Friday, March 22, 2013 10:25:20 AM UTC-5, Marcel Kornacker wrote:
    On Wed, Mar 20, 2013 at 3:07 AM, Sékine Coulibaly wrote:
    I setup the HDFS data on the RAID array (disregard what I said
    previously,
    these tests were done on a single drive HDFS), are are the results :

    A raw 100GB file write shows:
    real 11m20
    user 46sec
    sys 1m12

    The HDFS read is 11 minutes long (so around 150MB/s).
    The queries execution time are identical (compared to when they were
    performed on a non-RAID disk), this make me think that the system is not I/O
    bound (I expect more throghput from the RAID array than from a single
    drive).

    When looking at the profile, I can see that most of the time is used in :
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildTime: 5s547ms

    Not sure I can enhance BuildTime from my end. Is there some place for
    enhancement here ?
    Hi Sekine,

    unfortunately there's no way at the moment to make this query run
    faster. The reason is that aggregation is single-threaded within a
    single plan fragment executing on an impalad.

    However, we are planning on adding multi-threaded join and aggregation
    execution sometime after the GA release, although we can't promise any
    specific dates.

    Marcel
    Here is the profiling :

    Query (id=d29b07c309d34f27:9ea3d18614178baa):
    - PlanningTime: 7ms
    Query d29b07c309d34f27:9ea3d18614178baa:(260ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(5s903ms 0.00%)
    - RowsProduced: 5.00K
    CodeGen:
    - CodegenTime: 2ms
    - CompileTime: 118ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(5s906ms 0.59%)
    - BuildBuckets: 32.77K
    - BuildTime: 10ms
    - GetResultsTime: 541K clock cycles
    - MemoryUsed: 2.79 MB
    - RowsReturned: 5.00K
    - RowsReturnedRate: 846.00 /sec
    EXCHANGE_NODE (id=2):(5s871ms 99.46%)
    - BytesReceived: 1.75 MB
    - ConvertRowBatchTime: 1010K clock cycles
    - DeserializeRowBatchTimer: 1ms
    - MemoryUsed: 0.00
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.83 K/sec
    Averaged Fragment 1:(5s891ms 0.00%)
    completion times: min:5s888ms max:5s888ms mean: 5s888ms stddev:0
    execution rates: min:356.10 MB/sec max:356.10 MB/sec mean:356.10
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec
    Fragment 1:
    Instance d29b07c309d34f27:9ea3d18614178bac:(5s891ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec

    Thank you !

    Sekine


    2013/3/13 Greg Rahn <gr...@cloudera.com <javascript:>>
    Another option is just to use direct I/O via the dd option
    "oflag=direct"
    (or iflag=direct for direct path reading)

    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon wrote:

    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk
    throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if
    there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before running
    the second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be
    especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably
    pretty
    accurate.

    -Todd



    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker <mar...@cloudera.com<javascript:>>
    wrote:
    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024
    count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <scoul...@gmail.com<javascript:>>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's
    seen
    by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement,
    given
    the
    query type, RC format could be a good guess. I expect loading as
    little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull though...


    2013/3/12 Nong Li <no...@cloudera.com <javascript:>>
    Based on the runtime profile, Impala is running as fast as the
    hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk
    and
    we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB of
    data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked
    on
    IO.

    Are you expecting more IO throughput? I see you have multiple
    disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly
    <scoul...@gmail.com <javascript:>>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for
    my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <gr...@cloudera.com <javascript:>>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly
    <scoul...@gmail.com <javascript:>>
    wrote:
    I've setup a bench platform ona physical machine. I'm running
    CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB
    RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library
    STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions
    INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY
    asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use
    case.
    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting performance difference of at least one order of
    magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala
    setup,
    and
    I wish anyone could provide a comparison basis, and/or help
    point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course,
    running
    these
    queries on the logs_text table is much more longer (eg 42 secs
    for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you



    --
    Todd Lipcon
    Software Engineer, Cloudera
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Lenni Kuff at Dec 2, 2013 at 12:23 am
    As you see from the error, Impala does not support snappy compressed text
    files (Impala only supports LZO compressed text). Depending on the amount
    of data you have, you can either:

    1) Reload the data in a format Impala can read and use Impala to convert
    the data to Parquet.
    2) Use the Hive Parquet SerDe to convert your data to Parquet using Hive.

    Thanks,
    Lenni

    On Sun, Dec 1, 2013 at 8:06 AM, Venkata Gattala wrote:

    I have a situation where i cannot read any snappy compressed files in
    Impala,

    All i am trying to do is to convert the existing Snappy compressed files
    and turn them into parquet can you please help how can i do that.

    [ausgtmhadoop06:21000] > select * from hivetest3;
    Query: select * from hivetest3
    ERROR: AnalysisException: Failed to load metadata for table:
    parquet_compression.hivetest3
    CAUSED BY: TableLoadingException: Failed to load metadata for table:
    hivetest3
    CAUSED BY: RuntimeException: Compressed text files are not supported:
    hdfs://nameservice1/user/hive/warehouse/parquet_compression.db/hivetest3/000000_0.snappy

    Thanks
    Deepak Gattala
    On Friday, March 22, 2013 10:25:20 AM UTC-5, Marcel Kornacker wrote:

    On Wed, Mar 20, 2013 at 3:07 AM, Sékine Coulibaly <scoul...@gmail.com>
    wrote:
    I setup the HDFS data on the RAID array (disregard what I said
    previously,
    these tests were done on a single drive HDFS), are are the results :

    A raw 100GB file write shows:
    real 11m20
    user 46sec
    sys 1m12

    The HDFS read is 11 minutes long (so around 150MB/s).
    The queries execution time are identical (compared to when they were
    performed on a non-RAID disk), this make me think that the system is not I/O
    bound (I expect more throghput from the RAID array than from a single
    drive).

    When looking at the profile, I can see that most of the time is used in :
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildTime: 5s547ms

    Not sure I can enhance BuildTime from my end. Is there some place for
    enhancement here ?
    Hi Sekine,

    unfortunately there's no way at the moment to make this query run
    faster. The reason is that aggregation is single-threaded within a
    single plan fragment executing on an impalad.

    However, we are planning on adding multi-threaded join and aggregation
    execution sometime after the GA release, although we can't promise any
    specific dates.

    Marcel
    Here is the profiling :

    Query (id=d29b07c309d34f27:9ea3d18614178baa):
    - PlanningTime: 7ms
    Query d29b07c309d34f27:9ea3d18614178baa:(260ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(5s903ms 0.00%)
    - RowsProduced: 5.00K
    CodeGen:
    - CodegenTime: 2ms
    - CompileTime: 118ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(5s906ms 0.59%)
    - BuildBuckets: 32.77K
    - BuildTime: 10ms
    - GetResultsTime: 541K clock cycles
    - MemoryUsed: 2.79 MB
    - RowsReturned: 5.00K
    - RowsReturnedRate: 846.00 /sec
    EXCHANGE_NODE (id=2):(5s871ms 99.46%)
    - BytesReceived: 1.75 MB
    - ConvertRowBatchTime: 1010K clock cycles
    - DeserializeRowBatchTimer: 1ms
    - MemoryUsed: 0.00
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.83 K/sec
    Averaged Fragment 1:(5s891ms 0.00%)
    completion times: min:5s888ms max:5s888ms mean: 5s888ms stddev:0
    execution rates: min:356.10 MB/sec max:356.10 MB/sec
    mean:356.10
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec
    Fragment 1:
    Instance d29b07c309d34f27:9ea3d18614178bac:(5s891ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec

    Thank you !

    Sekine


    2013/3/13 Greg Rahn <gr...@cloudera.com>
    Another option is just to use direct I/O via the dd option
    "oflag=direct"
    (or iflag=direct for direct path reading)


    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon <to...@cloudera.com>
    wrote:
    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk
    throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if
    there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before
    running
    the second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be
    especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably
    pretty
    accurate.

    -Todd



    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker <
    mar...@cloudera.com>
    wrote:
    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024
    count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput
    of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <
    scoul...@gmail.com>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's
    seen
    by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement,
    given
    the
    query type, RC format could be a good guess. I expect loading as
    little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull
    though...



    2013/3/12 Nong Li <no...@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the
    hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk
    and
    we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB
    of
    data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely bottle-necked
    on
    IO.

    Are you expecting more IO throughput? I see you have multiple
    disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms
    0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly
    <scoul...@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for
    my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <gr...@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly
    <scoul...@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running
    CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB
    RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET

    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library
    STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions
    INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT,
    interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED
    FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table
    logs_text;
    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY
    asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use
    case.
    Although Impala is around 300% faster than Postgresql9.2, I
    was
    expecting performance difference of at least one order of
    magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs
    for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using
    pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala
    setup,
    and
    I wish anyone could provide a comparison basis, and/or help
    point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the
    request,
    so disk i/o is most likely not the bottleneck. Of course,
    running
    these
    queries on the logs_text table is much more longer (eg 42 secs
    for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you



    --
    Todd Lipcon
    Software Engineer, Cloudera
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Deepak Gattala at Dec 2, 2013 at 2:17 pm
    Hi Lenni, thank you so much for your response, can you please mention how
    can i achieve it i have 20 TB data, can you please kindly provide me the
    solution on how can i change it, that would be really helpful.

    Thank you
    Deepak Gattala


    On Sun, Dec 1, 2013 at 6:23 PM, Lenni Kuff wrote:

    As you see from the error, Impala does not support snappy compressed text
    files (Impala only supports LZO compressed text). Depending on the amount
    of data you have, you can either:

    1) Reload the data in a format Impala can read and use Impala to convert
    the data to Parquet.
    2) Use the Hive Parquet SerDe to convert your data to Parquet using Hive.

    Thanks,
    Lenni

    On Sun, Dec 1, 2013 at 8:06 AM, Venkata Gattala wrote:

    I have a situation where i cannot read any snappy compressed files in
    Impala,

    All i am trying to do is to convert the existing Snappy compressed files
    and turn them into parquet can you please help how can i do that.

    [ausgtmhadoop06:21000] > select * from hivetest3;
    Query: select * from hivetest3
    ERROR: AnalysisException: Failed to load metadata for table:
    parquet_compression.hivetest3
    CAUSED BY: TableLoadingException: Failed to load metadata for table:
    hivetest3
    CAUSED BY: RuntimeException: Compressed text files are not supported:
    hdfs://nameservice1/user/hive/warehouse/parquet_compression.db/hivetest3/000000_0.snappy

    Thanks
    Deepak Gattala
    On Friday, March 22, 2013 10:25:20 AM UTC-5, Marcel Kornacker wrote:

    On Wed, Mar 20, 2013 at 3:07 AM, Sékine Coulibaly <scoul...@gmail.com>
    wrote:
    I setup the HDFS data on the RAID array (disregard what I said
    previously,
    these tests were done on a single drive HDFS), are are the results :

    A raw 100GB file write shows:
    real 11m20
    user 46sec
    sys 1m12

    The HDFS read is 11 minutes long (so around 150MB/s).
    The queries execution time are identical (compared to when they were
    performed on a non-RAID disk), this make me think that the system is not I/O
    bound (I expect more throghput from the RAID array than from a single
    drive).

    When looking at the profile, I can see that most of the time is used in :
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildTime: 5s547ms

    Not sure I can enhance BuildTime from my end. Is there some place for
    enhancement here ?
    Hi Sekine,

    unfortunately there's no way at the moment to make this query run
    faster. The reason is that aggregation is single-threaded within a
    single plan fragment executing on an impalad.

    However, we are planning on adding multi-threaded join and aggregation
    execution sometime after the GA release, although we can't promise any
    specific dates.

    Marcel
    Here is the profiling :

    Query (id=d29b07c309d34f27:9ea3d18614178baa):
    - PlanningTime: 7ms
    Query d29b07c309d34f27:9ea3d18614178baa:(260ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(5s903ms 0.00%)
    - RowsProduced: 5.00K
    CodeGen:
    - CodegenTime: 2ms
    - CompileTime: 118ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(5s906ms 0.59%)
    - BuildBuckets: 32.77K
    - BuildTime: 10ms
    - GetResultsTime: 541K clock cycles
    - MemoryUsed: 2.79 MB
    - RowsReturned: 5.00K
    - RowsReturnedRate: 846.00 /sec
    EXCHANGE_NODE (id=2):(5s871ms 99.46%)
    - BytesReceived: 1.75 MB
    - ConvertRowBatchTime: 1010K clock cycles
    - DeserializeRowBatchTimer: 1ms
    - MemoryUsed: 0.00
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.83 K/sec
    Averaged Fragment 1:(5s891ms 0.00%)
    completion times: min:5s888ms max:5s888ms mean: 5s888ms stddev:0
    execution rates: min:356.10 MB/sec max:356.10 MB/sec
    mean:356.10
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec
    Fragment 1:
    Instance d29b07c309d34f27:9ea3d18614178bac:(5s891ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 28.37K
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 122ms
    - LoadTime: 5ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 1.75 MB
    - DataSinkTime: 10ms
    - SerializeBatchTime: 3ms
    - ThriftTransmitTime: 11ms
    AGGREGATION_NODE (id=1):(5s894ms 94.81%)
    - BuildBuckets: 32.77K
    - BuildTime: 5s547ms
    - GetResultsTime: 1ms
    - MemoryUsed: 2.79 MB
    - RowsReturned: 28.37K
    - RowsReturnedRate: 4.81 K/sec
    HDFS_SCAN_NODE (id=0):(309ms 5.25%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 11s149ms
    - MaterializeTupleTime: 4s297ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 2.05 GB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 121.12 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 996ms
    - TotalReadThroughput: 329.91 MB/sec

    Thank you !

    Sekine


    2013/3/13 Greg Rahn <gr...@cloudera.com>
    Another option is just to use direct I/O via the dd option
    "oflag=direct"
    (or iflag=direct for direct path reading)


    On Wed, Mar 13, 2013 at 9:57 AM, Todd Lipcon <to...@cloudera.com>
    wrote:
    Depending on the amount of RAM on the machine, you probably should
    actually use a larger file for this test. To see actual disk
    throughput I
    usually like to use a file at least 2x RAM, preferably 10x RAM if
    there is
    free disk space on the node.

    eg use dd if=/dev/zero of=/path/to/file bs=1M count=100000
    (creates a 100G file)

    The test will take longer but more accurate.

    Alternatively, you can drop the buffer cache completely before
    running
    the second test: echo 1 | sudo tee /proc/sys/vm/drop_caches
    but the downside of this is that your JVM startup time will be
    especially
    slow as well, given it will have to seek around disk to load shared
    libraries, etc.

    Doing something in the middle (eg 10G and drop_caches) is probably
    pretty
    accurate.

    -Todd



    On Wed, Mar 13, 2013 at 9:48 AM, Marcel Kornacker <
    mar...@cloudera.com>
    wrote:
    Sekine, could you run this
    dd if=/dev/zero of=<file path on your raid drive> bs=1024
    count=3145728
    This will tell you what your raw write throughput on that drive is.

    Then copy that file into hdfs:
    hadoop fs -put <file path> /tmp
    Then measure how long it takes to read it:
    time hadoop fs -cat /tmp/<file name> > /dev/null

    How does your hdfs read throughput compare with the raw throughput
    of
    that drive?

    On Tue, Mar 12, 2013 at 9:35 AM, Sékine Coulibaly <
    scoul...@gmail.com>
    wrote:
    Yes, there are multiples disks, but its a RAID 5, so I guess it's
    seen
    by
    the system as one disk.
    Since crunching data volume seems the only possible enhancement,
    given
    the
    query type, RC format could be a good guess. I expect loading as
    little as
    200MB and so having a performance boost of around x10.
    However my last RC format tests were not very successfull
    though...



    2013/3/12 Nong Li <no...@cloudera.com>
    Based on the runtime profile, Impala is running as fast as the
    hardware
    allows.

    The runtime profile indicates that your DN is setup with 1 disk
    and
    we are
    getting ~85MB/sec from that
    disk. This sounds reasonable to me. You are reading about 2GB
    of
    data
    from it so this ends up taking
    ~25 seconds. With this setup, Impala is completely
    bottle-necked on
    IO.

    Are you expecting more IO throughput? I see you have multiple
    disks
    but
    the throughput does not reflect
    that. I'd start by investigating the IO setup.

    Here are the relevant counters from the runtime profile.

    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms
    0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    <-- One disk
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - PerDiskReadThroughput: 82.83 MB/sec
    <-- Disk throughput


    On Mon, Mar 11, 2013 at 9:21 AM, Sékine Coulibaly
    <scoul...@gmail.com>
    wrote:
    About 1.9 GB snappy compressed, raw is around 9.5 GB.
    Yes, the pg table has indexes but not on the columns I used for
    my
    impala
    queries (in other wordes, not on asset_name.


    2013/3/8 Greg Rahn <gr...@cloudera.com>
    How many megabytes in size is the 37 million rows file?
    Does the Postgresql table have any indexes?

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly
    <scoul...@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running
    CH
    4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB
    RAM,
    RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some
    way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET

    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library
    STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions
    INT,
    cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT,
    interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED
    FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table
    logs_text;
    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte string,count_sessions int,cmd
    string,status_code
    string,client_ip string,client_type
    string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from
    logs_text;
    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY
    asset_name
    LIMIT 5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use
    case.
    Although Impala is around 300% faster than Postgresql9.2, I
    was
    expecting performance difference of at least one order of
    magnitude (in
    other words, I was expecting around 1 sec for Req#1, 2 secs
    for
    Req#2), and
    I'm wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using
    pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala
    setup,
    and
    I wish anyone could provide a comparison basis, and/or help
    point
    out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of
    the
    request,
    so disk i/o is most likely not the bottleneck. Of course,
    running
    these
    queries on the logs_text table is much more longer (eg 42
    secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud
    0.35)
    during
    the request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you



    --
    Todd Lipcon
    Software Engineer, Cloudera
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Sékine Coulibaly at Mar 11, 2013 at 4:19 pm
    Sure Skye, didn't know I could get that info.
    Here it is. From what I can see, there is uncompressible delays :
    - 14secs for decompression
    - 1,2 sec for query planning/codegen/compile

    Is there room for improvement here ?

    Thank you !


    Query (id=46fa998af31342a5:92e5c5ed0f0c921f):
        - PlanningTime: 1s128ms
       Query 46fa998af31342a5:92e5c5ed0f0c921f:(387ms 0.00%)
         Aggregate Profile:
         Coordinator Fragment:(26s140ms 0.00%)
            - RowsProduced: 1
           CodeGen:
              - CodegenTime: 31ms
              - CompileTime: 118ms
              - LoadTime: 139ms
              - ModuleFileSize: 44.61 KB
           AGGREGATION_NODE (id=3):(26s171ms 0.32%)
              - BuildBuckets: 1.02K
              - BuildTime: 4K clock cycles
              - GetResultsTime: 9K clock cycles
              - MemoryUsed: 32.01 KB
              - RowsReturned: 1
              - RowsReturnedRate: 0
           EXCHANGE_NODE (id=2):(26s089ms 99.81%)
              - BytesReceived: 16.00 B
              - ConvertRowBatchTime: 3K clock cycles
              - DeserializeRowBatchTimer: 9K clock cycles
              - MemoryUsed: 0.00
              - RowsReturned: 1
              - RowsReturnedRate: 0
         Averaged Fragment 1:(26s138ms 0.00%)
           completion times: min:20s281ms max:20s281ms mean: 20s281ms stddev:0
           execution rates: min:103.38 MB/sec max:103.38 MB/sec
    mean:103.38 MB/sec stddev:0.00 /sec
           split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
            - RowsProduced: 1
           CodeGen:
              - CodegenTime: 1656K clock cycles
              - CompileTime: 77ms
              - LoadTime: 4ms
              - ModuleFileSize: 44.61 KB
           DataStreamSender:
              - BytesSent: 16.00 B
              - DataSinkTime: 666K clock cycles
              - SerializeBatchTime: 41K clock cycles
              - ThriftTransmitTime: 812K clock cycles
           AGGREGATION_NODE (id=1):(26s139ms 0.99%)
              - BuildBuckets: 1.02K
              - BuildTime: 176ms
              - GetResultsTime: 8K clock cycles
              - MemoryUsed: 32.01 KB
              - RowsReturned: 1
              - RowsReturnedRate: 0
           HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
              - BytesRead: 2.05 GB
              - DecompressionTime: 14s036ms
              - MaterializeTupleTime: 0K clock cycles
              - MemoryUsed: 0.00
              - PerDiskReadThroughput: 82.83 MB/sec
              - RowsReturned: 37.47M
              - RowsReturnedRate: 1.45 M/sec
              - ScanRangesComplete: 43
              - ScannerThreadsReadTime: 25s315ms
              - TotalReadThroughput: 102.99 MB/sec
         Fragment 1:
           Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
             Hdfs split stats (:<# splits>/): 0:43/2.20B
              - RowsProduced: 1
             CodeGen:
                - CodegenTime: 1656K clock cycles
                - CompileTime: 77ms
                - LoadTime: 4ms
                - ModuleFileSize: 44.61 KB
             DataStreamSender:
                - BytesSent: 16.00 B
                - DataSinkTime: 666K clock cycles
                - SerializeBatchTime: 41K clock cycles
                - ThriftTransmitTime: 812K clock cycles
             AGGREGATION_NODE (id=1):(26s139ms 0.99%)
                - BuildBuckets: 1.02K
                - BuildTime: 176ms
                - GetResultsTime: 8K clock cycles
                - MemoryUsed: 32.01 KB
                - RowsReturned: 1
                - RowsReturnedRate: 0
             HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
               File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
                - BytesRead: 2.05 GB
                - DecompressionTime: 14s036ms
                - MaterializeTupleTime: 0K clock cycles
                - MemoryUsed: 0.00
                - PerDiskReadThroughput: 82.83 MB/sec
                - RowsReturned: 37.47M
                - RowsReturnedRate: 1.45 M/sec
                - ScanRangesComplete: 43
                - ScannerThreadsReadTime: 25s315ms
                - TotalReadThroughput: 102.99 MB/sec




    2013/3/8 Skye Wanderman-Milne <skye@cloudera.com>
    Could you provide the query profiles for the queries you're testing? These
    will help identify where the bottleneck is. (Query profiles can be accessed
    via the impalad web interface at http://your-impalad-server:25000/queries
    .)

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly wrote:

    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING, status_code STRING, client_ip STRING,browser_type STRING,
    session_id STRING, content_name STRING, nb_bytes_sent INT, progress INT,
    interface STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface string,server_ip string,customer_id string )
    STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was expecting
    performance difference of at least one order of magnitude (in other words,
    I was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm
    wondering if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during the
    request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Marcel Kornacker at Mar 12, 2013 at 1:18 am
    To comment specifically on the elements you called out:
    On Mon, Mar 11, 2013 at 9:18 AM, Sékine Coulibaly wrote:
    Sure Skye, didn't know I could get that info.
    Here it is. From what I can see, there is uncompressible delays :
    - 14secs for decompression
    This is overlapped fully with I/O, and is also multi-threaded.
    - 1,2 sec for query planning/codegen/compile
    Is this just for the first time you run this query after impalad
    starts up, or do you get that exact same number every time?

    Marcel
    Is there room for improvement here ?

    Thank you !


    Query (id=46fa998af31342a5:92e5c5ed0f0c921f):
    - PlanningTime: 1s128ms
    Query 46fa998af31342a5:92e5c5ed0f0c921f:(387ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(26s140ms 0.00%)
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 31ms
    - CompileTime: 118ms
    - LoadTime: 139ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(26s171ms 0.32%)
    - BuildBuckets: 1.02K
    - BuildTime: 4K clock cycles
    - GetResultsTime: 9K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(26s089ms 99.81%)
    - BytesReceived: 16.00 B
    - ConvertRowBatchTime: 3K clock cycles
    - DeserializeRowBatchTimer: 9K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 1
    - RowsReturnedRate: 0
    Averaged Fragment 1:(26s138ms 0.00%)
    completion times: min:20s281ms max:20s281ms mean: 20s281ms stddev:0
    execution rates: min:103.38 MB/sec max:103.38 MB/sec mean:103.38
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1656K clock cycles
    - CompileTime: 77ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 666K clock cycles
    - SerializeBatchTime: 41K clock cycles
    - ThriftTransmitTime: 812K clock cycles
    AGGREGATION_NODE (id=1):(26s139ms 0.99%)
    - BuildBuckets: 1.02K
    - BuildTime: 176ms
    - GetResultsTime: 8K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 14s036ms
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 82.83 MB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 1.45 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 25s315ms
    - TotalReadThroughput: 102.99 MB/sec
    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1656K clock cycles
    - CompileTime: 77ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 666K clock cycles
    - SerializeBatchTime: 41K clock cycles
    - ThriftTransmitTime: 812K clock cycles
    AGGREGATION_NODE (id=1):(26s139ms 0.99%)
    - BuildBuckets: 1.02K
    - BuildTime: 176ms
    - GetResultsTime: 8K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 14s036ms
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 82.83 MB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 1.45 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 25s315ms
    - TotalReadThroughput: 102.99 MB/sec




    2013/3/8 Skye Wanderman-Milne <skye@cloudera.com>
    Could you provide the query profiles for the queries you're testing? These
    will help identify where the bottleneck is. (Query profiles can be accessed
    via the impalad web interface at http://your-impalad-server:25000/queries.)


    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd STRING,
    status_code STRING, client_ip STRING,browser_type STRING, session_id STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator int,interface
    string,server_ip string,customer_id string ) STORED AS SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was expecting
    performance difference of at least one order of magnitude (in other words, I
    was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm wondering
    if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during the
    request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you
  • Sékine Coulibaly at Mar 12, 2013 at 4:29 pm
    Hi Marcel,

    The query duration is roughly the same after the 2nd call (the first one is
    slightly longer). I'm quite surprised the IO is the issue (iotop seemed to
    show quite little io activity actually, but i'll try iostat asap to check).

    I'll try to use RC format to limit the disks IO and see how it goes.

    Sekine



    2013/3/12 Marcel Kornacker <marcel@cloudera.com>
    To comment specifically on the elements you called out:
    On Mon, Mar 11, 2013 at 9:18 AM, Sékine Coulibaly wrote:
    Sure Skye, didn't know I could get that info.
    Here it is. From what I can see, there is uncompressible delays :
    - 14secs for decompression
    This is overlapped fully with I/O, and is also multi-threaded.
    - 1,2 sec for query planning/codegen/compile
    Is this just for the first time you run this query after impalad
    starts up, or do you get that exact same number every time?

    Marcel
    Is there room for improvement here ?

    Thank you !


    Query (id=46fa998af31342a5:92e5c5ed0f0c921f):
    - PlanningTime: 1s128ms
    Query 46fa998af31342a5:92e5c5ed0f0c921f:(387ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(26s140ms 0.00%)
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 31ms
    - CompileTime: 118ms
    - LoadTime: 139ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(26s171ms 0.32%)
    - BuildBuckets: 1.02K
    - BuildTime: 4K clock cycles
    - GetResultsTime: 9K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(26s089ms 99.81%)
    - BytesReceived: 16.00 B
    - ConvertRowBatchTime: 3K clock cycles
    - DeserializeRowBatchTimer: 9K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 1
    - RowsReturnedRate: 0
    Averaged Fragment 1:(26s138ms 0.00%)
    completion times: min:20s281ms max:20s281ms mean: 20s281ms stddev:0
    execution rates: min:103.38 MB/sec max:103.38 MB/sec mean:103.38
    MB/sec stddev:0.00 /sec
    split sizes: min: 2.05 GB, max: 2.05 GB, avg: 2.05 GB, stddev: 0.00
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1656K clock cycles
    - CompileTime: 77ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 666K clock cycles
    - SerializeBatchTime: 41K clock cycles
    - ThriftTransmitTime: 812K clock cycles
    AGGREGATION_NODE (id=1):(26s139ms 0.99%)
    - BuildBuckets: 1.02K
    - BuildTime: 176ms
    - GetResultsTime: 8K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    - BytesRead: 2.05 GB
    - DecompressionTime: 14s036ms
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 82.83 MB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 1.45 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 25s315ms
    - TotalReadThroughput: 102.99 MB/sec
    Fragment 1:
    Instance 46fa998af31342a5:92e5c5ed0f0c9221:(26s138ms 0.00%)
    Hdfs split stats (:<# splits>/): 0:43/2.20B
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1656K clock cycles
    - CompileTime: 77ms
    - LoadTime: 4ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 666K clock cycles
    - SerializeBatchTime: 41K clock cycles
    - ThriftTransmitTime: 812K clock cycles
    AGGREGATION_NODE (id=1):(26s139ms 0.99%)
    - BuildBuckets: 1.02K
    - BuildTime: 176ms
    - GetResultsTime: 8K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(25s880ms 99.01%)
    File Formats: SEQUENCE_FILE/SNAPPY_BLOCKED:43
    - BytesRead: 2.05 GB
    - DecompressionTime: 14s036ms
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 82.83 MB/sec
    - RowsReturned: 37.47M
    - RowsReturnedRate: 1.45 M/sec
    - ScanRangesComplete: 43
    - ScannerThreadsReadTime: 25s315ms
    - TotalReadThroughput: 102.99 MB/sec




    2013/3/8 Skye Wanderman-Milne <skye@cloudera.com>
    Could you provide the query profiles for the queries you're testing?
    These
    will help identify where the bottleneck is. (Query profiles can be
    accessed
    via the impalad web interface at
    http://your-impalad-server:25000/queries.)

    On Fri, Mar 8, 2013 at 6:44 AM, Sékine Coulibaly <scoulibaly@gmail.com>
    wrote:
    I've setup a bench platform ona physical machine. I'm running CH 4.2.0
    and Impala 0.6 on an IBM M3 x3650 (8 HT cores, 1,6GHz, 32GB RAM, RAID 5
    6x150GB SAS drives, running RedHat 6.3.
    I'm running Impala tests on it and the results are in some way
    disappointing.

    I'm using a table having the following schema :

    delivery_srv string
    media_type string
    server_name string
    error_code string
    dte string
    count_sessions int
    cmd string
    status_code string
    client_ip string
    client_type string
    session_id string
    asset_name string
    count_sent int
    indicator int
    interface string
    server_ip string
    customer_id string

    I've loaded content into this table through hive as follows :

    SET hive.exec.compress.output=true;
    SET
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;

    create table logs_text(delivery_pop STRING, media_library STRING,
    videoserver STRING, severity STRING, dte STRING,nb_sessions INT, cmd
    STRING,
    status_code STRING, client_ip STRING,browser_type STRING, session_id
    STRING,
    content_name STRING, nb_bytes_sent INT, progress INT, interface
    STRING,ip_port STRING, mac STRING ) ROW FORMAT DELIMITED FIELDS
    TERMINATED
    BY '\t' LINES TERMINATED by '\n';
    load data local inpath '/tmp/logs/*.txt' into table logs_text;

    create table logs_sf_snappy (delivery_srv string,
    media_type string, server_name string,error_code
    string,dte
    string,count_sessions int,cmd string,status_code
    string,client_ip string,client_type string,session_id
    string,asset_name string,count_sent int,indicator
    int,interface
    string,server_ip string,customer_id string ) STORED AS
    SEQUENCEFILE;
    insert overwrite table logs_sf_snappy select * from logs_text;

    The log files contain around 37 millions rows.

    Here are some results :
    --> Impala :
    Req#1
    SELECT COUNT(1) FROM logs_sf_snappy;
    Query time : 2.83sec (counted 37 millions rows)
    Req#2
    SELECT COUNT(1),asset_name FROM logs_sf_snappy GROUP BY asset_name
    LIMIT
    5000;
    Query time : 6.43sec (grouped 37 millions rows)

    --> Postgresql 9.2 (same hw) :
    Req#1
    Query time : 8.3 sec

    Req#2
    Query time : 20 sec

    These are basically the kind of request I'll run for my use case.

    Although Impala is around 300% faster than Postgresql9.2, I was
    expecting
    performance difference of at least one order of magnitude (in other
    words, I
    was expecting around 1 sec for Req#1, 2 secs for Req#2), and I'm
    wondering
    if my setup is wrong.
    I've setup short-circuit reads and I'm using pseudo-distributed
    configuration.

    I'm not sure I'm getting most out of this very basic Impala setup, and
    I
    wish anyone could provide a comparison basis, and/or help point out
    configuration items that may be wrong.

    iotop shows very light hard drive activity at the start of the request,
    so disk i/o is most likely not the bottleneck. Of course, running these
    queries on the logs_text table is much more longer (eg 42 secs for
    Req#2).
    Regarding cpu, the load average remains quite low (aroud 0.35) during
    the
    request processing, so I'm quite puzzled :(


    Any help appreciated !

    Thank you

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 8, '13 at 8:59p
activeDec 2, '13 at 2:17p
posts16
users8
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase