FAQ
I have a cluster with 4 data nodes on which I am running very simple
queries for performance testing. The tables are parquet format. Environment
is Impala 1.2.1 / CDH 4.5 / Centos


    - I start with a table with 100 million rows and add 100 million rows at
    a time - till I go to 1 billion rows. I do a count(*) on this table after
    adding 100 million rows. The time taken for this simple count(*) is linear.
       -

       1.13s for 101,713,307 rows

       16.77s (!) for 203,426,614 rows

       2.23s for 305,139,921 rows

       4.72s for 406,853,228 rows

       5.30s for 508,566,535 rows

       4.59s for 610,279,842 rows

       5.90s for 711,993,149 rows

       8.94s for 813,706,456 rows

       11.33s for 915,419,763 rows

       7.09s for 1,017,133,070 rows
       - I run a simple count/group rollup query on the table with 100
    million rows and another table with same schema but 1 billion rows. The
    time taken for 100 mill rows is between 4.49 s - 5.58 seconds and time
    for 1 billion rows is around 36.96s - 38.54 seconds, again showing
    linear growth in time.
    - There seems a wide range of time when I execute the query repeatedly -
    bit surprised to see such variation. The timing is taken with running
    single query at a time on the cluster - there is no other query running on
    the cluster to distort timing tests.
    - These table contain timestamp and I am unable to run ANALYSE table
    Compute statistics on this from hive due to known issue of hive not
    recognizing parquet timestamp column (missing parquet jar file in hive).
    Impala 1.2.1. does not supports Analyze table from impala-shell, so I can't
    analyse table from impala. I realize this could be a major issue and
    certainly discount some of the time increase on missing statistics.
    However, I also want to make sure the queries are using all 4 impala-d in a
    fairly balanced way. How do I check that ?
    - The cluster was configured manually i.e. without cloudera manager.

Thanks,

Manoj






To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedDec 19, '13 at 6:51p
activeDec 19, '13 at 6:51p
posts1
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Manoj Samel: 1 post

People

Translate

site design / logo © 2022 Grokbase