FAQ
Hi All,

Recently a few of us following this forum have collaborated and compiled
some interesting Impala bench-marking results.

We've used the TPC-H data-set on our small HADOOP clusters to test query
run-times with Text and Parquet file formats.

The following spreadsheet (which is still a work in progress) was prepared
by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim Hejit (from
OnMarc).

https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

On row 21 on the summary sheet we have tried to benchmark our environments
using a simple cost performance factor metric. The cost of our
environments is based monthly on hardware and operating costs (e.g
Electricity), assuming the hardware is fully depreciated (straight line
method) over 3 years.

While the initial focus was on simple queries on a single table, some
attempts have also been made to compare performance using the complex
queries documented in TPC-H (http://www.tpc.org/tpch/)



To re-create these tests (on Text tables) in your own environment you can
use the following link

https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

If you also want to try it with Parquet format tables then the following
link has the files I created/used.

https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


Why not try these on your own clusters (especially those running
significantly larger clusters).

We’d be only too happy to include your results.

Kind Regards

Aron

Search Discussions

  • Kashif Khan at Jul 22, 2013 at 4:36 pm
    Great comparisons. Correct me if I understood it wrong, parquet slows down
    on simple select without any aggregation.(B9).

    Thanks

    On Mon, Jul 22, 2013 at 12:14 PM, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.****

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was prepared
    by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim Hejit (from
    OnMarc).****


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our environments
    using a simple cost performance factor metric. The cost of our
    environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)****

    ** **

    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.

    ****


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards****

    Aron****


    --
    Cheers,
      Kashif
  • Nong Li at Jul 22, 2013 at 10:10 pm
    Thanks for doing this!

    I'd love to know how many iterations you are running these and if you are
    doing
    any os buffer cache commands in between. Also, the on disk size of the
    parquet
    table would be very useful.

    As for B9, I'm guessing this query is slow for two reasons. The query is
    doing
    a full table scan which is the weakest setup for a columnar format and we
    currently
    don't have codegen enabled for the parquet scanner (so the predicates are
    evaluated
    slower). The codegen limitation is something we're actively working on.


    On Mon, Jul 22, 2013 at 9:36 AM, Kashif Khan wrote:

    Great comparisons. Correct me if I understood it wrong, parquet slows down
    on simple select without any aggregation.(B9).

    Thanks


    On Mon, Jul 22, 2013 at 12:14 PM, Aron MacDonald <
    aron_macdonald@hotmail.com> wrote:
    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.****

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).****


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)****

    ** **

    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.

    ****


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards****

    Aron****


    --
    Cheers,
    Kashif
  • Aron MacDonald at Sep 6, 2013 at 9:31 am
    Hi Nong,

    I just saw your slides from them the impala meetup on the 20th August
    http://www.slideshare.net/cloudera/presentations-25757981

    Great stuff.

    Do have any more details or links, especially on the TPC-H queries you ran,
    etc that
    I could perhaps include in this spreadshseet?

    Cheers
    Aron

    On Tuesday, July 23, 2013 8:35:14 AM UTC+1, Aron MacDonald wrote:

    Hi Nong.

    In the detail sheets (for the simple queries) you may be able to see how
    many iterations we attempted and the disk size of the respective tables.
    The first iteration typically represents the first run after a OS buffer
    refresh. In my case I typically didn't record the first iteration, and
    recorded the results for subsequent iterations.

    Note: In the case of the complex TPC-H queries multiple iterations was not
    always straightforward. In my case I had frequent impala crashes due to
    small memory available on my nodes. The results of the crash would then
    reset the OS buffers. So in my detail sheet it might be easiest to consider
    the complex TPC-H run times as the first iteration only.

    I understand the the number of HDFS files and their size is also important
    impact upon performance, and the details are buried in impala logs. It
    would be quite handy if there was a command line statement to run, so this
    information could be consistently tracked and monitored.

    Are you aware of any easy way to extract this info?





    On Monday, July 22, 2013 11:10:03 PM UTC+1, Nong wrote:

    Thanks for doing this!

    I'd love to know how many iterations you are running these and if you are
    doing
    any os buffer cache commands in between. Also, the on disk size of the
    parquet
    table would be very useful.

    As for B9, I'm guessing this query is slow for two reasons. The query is
    doing
    a full table scan which is the weakest setup for a columnar format and we
    currently
    don't have codegen enabled for the parquet scanner (so the predicates are
    evaluated
    slower). The codegen limitation is something we're actively working on.


    On Mon, Jul 22, 2013 at 9:36 AM, Kashif Khan wrote:

    Great comparisons. Correct me if I understood it wrong, parquet slows
    down on simple select without any aggregation.(B9).

    Thanks


    On Mon, Jul 22, 2013 at 12:14 PM, Aron MacDonald <aron_ma...@hotmail.com
    wrote:
    Hi All,

    Recently a few of us following this forum have collaborated and
    compiled some interesting Impala bench-marking results.****

    We've used the TPC-H data-set on our small HADOOP clusters to test
    query run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).****


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)****

    ** **

    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the
    following link has the files I created/used.

    ****


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards****

    Aron****


    --
    Cheers,
    Kashif
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Aron MacDonald at Jul 23, 2013 at 7:06 am
    I think Nong has explained it, and sure could do a far better job than me.
    In simple terms, if I understand things correctly, a Parquet (Column)
      table generates an internal key for each field value, which enables better
    compression and faster reads when you are trying to look at an aggregated
    view of the data (subset of the fields).
    At some point though the advantages of storing it this way are lost if you
    are trying to report more granular information.

    Jung-Yup has performed an interesting examination of this:
    https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/XbpXQuOPd8k


    On Monday, July 22, 2013 5:36:08 PM UTC+1, Kashif Khan wrote:

    Great comparisons. Correct me if I understood it wrong, parquet slows down
    on simple select without any aggregation.(B9).

    Thanks


    On Mon, Jul 22, 2013 at 12:14 PM, Aron MacDonald <aron_ma...@hotmail.com<javascript:>
    wrote:
    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.****

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).****


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)****

    ** **

    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.

    ****


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards****

    Aron****


    --
    Cheers,
    Kashif
  • Jlwill20 at Jan 17, 2014 at 12:04 pm
    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the lineitem
    table has some corrupt data. Some of the date fields have unprintable
    characters and not dates in them. At the 1GB level I don't get this.

    2. When I run at the 100GB level (text format because of issue above) there
    appears to be a memory leak in Impala. After I run a few queries all the
    memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.

    Has anyone seen these issues?


    Thanks,
    Jim
    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was prepared
    by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim Hejit (from
    OnMarc).


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our environments
    using a simple cost performance factor metric. The cost of our
    environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Nong Li at Jan 17, 2014 at 5:53 pm
    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the lineitem
    table has some corrupt data. Some of the date fields have unprintable
    characters and not dates in them. At the 1GB level I don't get this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled? If
    you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Jim Williams at Jan 18, 2014 at 1:46 pm
    Hi Nong,

    My cluster is CM managed. I have not touched any parms. I'm using it
    straight out of the box. Let me know if I should do something with process
    mem limit. And where I can set that. Or any other parms that will help
    performance.

    Here is what I did:

    I brought up the cluster (so it's a fresh start of everything including the
    os) and ran TPC-H query 7. Then I captured the memz and the metrics
    values.

    Note that after I did this, I ran query 7 again and it took twice as long
    to run as the first time. I ran it again and it took 3 times as long as
    the second time. So the times were (4 min, 8 min, 24 min).

    This is a 4 node cluster. Each has 48GB mem. My TPC-H database is 100GB.

    Thanks,
    Jim

    On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the lineitem
    table has some corrupt data. Some of the date fields have unprintable
    characters and not dates in them. At the 1GB level I don't get this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled? If
    you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alan Choi at Jan 18, 2014 at 8:14 pm
    Hi Jim,

    By default, Impala will impose a memory limit of 80% of the system memory.
    So, you're having a mem-limit of 37GB (as shown in the memz.txt). Impala
    shouldn't use a whole lot more than 37GB.

    Two things I want to make sure. First, you probably have some other process
    running on the node: such as CM agent, monitor, and Impala catalog service.
    How much memory are consumed by these process?

    Second, just to make sure, you don't have HBase running, right?

    Thanks,
    Alan

    On Sat, Jan 18, 2014 at 5:46 AM, Jim Williams wrote:

    Hi Nong,

    My cluster is CM managed. I have not touched any parms. I'm using it
    straight out of the box. Let me know if I should do something with process
    mem limit. And where I can set that. Or any other parms that will help
    performance.

    Here is what I did:

    I brought up the cluster (so it's a fresh start of everything including
    the os) and ran TPC-H query 7. Then I captured the memz and the metrics
    values.

    Note that after I did this, I ran query 7 again and it took twice as long
    to run as the first time. I ran it again and it took 3 times as long as
    the second time. So the times were (4 min, 8 min, 24 min).

    This is a 4 node cluster. Each has 48GB mem. My TPC-H database is 100GB.

    Thanks,
    Jim

    On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the lineitem
    table has some corrupt data. Some of the date fields have unprintable
    characters and not dates in them. At the 1GB level I don't get this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled? If
    you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and
    compiled some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test
    query run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the
    following link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Jim Williams at Jan 19, 2014 at 3:08 pm
    HBase is running but is not in use. The only thing running on the cluster
    are the TPC-H queries. One at a time. I tried to stop HBase anyway for my
    next run and I got a message saying to stop Impala and Hue first, so I
    didn't do that.

    I have Impala daemons running on all 4 nodes in the cluster. The catalog
    service is on my third node. It doesn't seem to be using much memory.

    I ran a query about 30 minutes ago. I have a tool that shows free memory.
    It is 4% on one of the impalad nodes and 20% on the node with the cat
    server and impalad.

    I am running the top command for the impalad processes. It shows %MEM to
    be roughly 60 on all systems.

    Here are screen shots from the impalad on one of my nodes. Don't the Host
    Memory Usage and Resident Memory charts show that the daemons are not
    releasing memory? This is 30 minutes after I ran the query.








    I ran the same query again and interestingly both of those charts spiked
    down about half way briefly and then spiked back up to where they are in
    the screen shots. The top commands now show %MEM of 85.


    I've run the query for a third time. It's been running for about an hour
    and is about 78% done. The first time I ran it, it took 5 minutes. Here
    are screen shots of the top command.

    impalad on host1



    Here is the catalogd on host3



    This is from host1 impalad display







































    On Sat, Jan 18, 2014 at 3:14 PM, Alan Choi wrote:

    Hi Jim,

    By default, Impala will impose a memory limit of 80% of the system memory.
    So, you're having a mem-limit of 37GB (as shown in the memz.txt). Impala
    shouldn't use a whole lot more than 37GB.

    Two things I want to make sure. First, you probably have some other
    process running on the node: such as CM agent, monitor, and Impala catalog
    service. How much memory are consumed by these process?

    Second, just to make sure, you don't have HBase running, right?

    Thanks,
    Alan

    On Sat, Jan 18, 2014 at 5:46 AM, Jim Williams wrote:

    Hi Nong,

    My cluster is CM managed. I have not touched any parms. I'm using it
    straight out of the box. Let me know if I should do something with process
    mem limit. And where I can set that. Or any other parms that will help
    performance.

    Here is what I did:

    I brought up the cluster (so it's a fresh start of everything including
    the os) and ran TPC-H query 7. Then I captured the memz and the metrics
    values.

    Note that after I did this, I ran query 7 again and it took twice as long
    to run as the first time. I ran it again and it took 3 times as long as
    the second time. So the times were (4 min, 8 min, 24 min).

    This is a 4 node cluster. Each has 48GB mem. My TPC-H database is 100GB.

    Thanks,
    Jim

    On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the
    lineitem table has some corrupt data. Some of the date fields have
    unprintable characters and not dates in them. At the 1GB level I don't get
    this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled? If
    you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and
    compiled some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test
    query run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the
    following link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Jim Williams at Jan 20, 2014 at 5:40 pm
    Last couple of emails failed. Here is another attempt.

    HBase is running but is not in use. The only thing running on the cluster
    are the TPC-H queries. One at a time. I tried to stop HBase anyway for my
    next run and I got a message saying to stop Impala and Hue first, so I
    didn't do that.

    I have Impala daemons running on all 4 nodes in the cluster. The catalog
    service is on my third node.

    Here are the results of running TPC-H query 17, 3 times. I will wait a few
    minutes in between each query to gather statistics. I'm getting the
    statistics from the impalad screens and the top command. I'm taking them
    from my #1 machine where I run the queries and the #3 machine where the
    catalog is. There are impalad daemons on all machines in the cluster.

    There is nothing else running on the cluster. Each node had 48GB mem.

    Query times were:
    1. 6 min, 46 sec
    2. 24 min, 21 sec
    3. 51 min 42 sec

    Values after reboot of cluster
    ****************************************************************************

    Hadoop1 impalad
    ************************
    Host Memory Usage - 2.5GB Physical, 3.3 GB Cached
    Resident Memory - 227 MB
    Top Command - RES = 226m, %MEM = .5

    Hadoop3 impalad
    ************************
    Host Memory Usage - 4.6GB Physical, 3.7GB Cached
    Resident Memory - 229 MB
    Top Command - RES = 229m, %MEM = .5

    Hadoop3 catalogd
    ************************
    Top Command - RES = 326m, %MEM = .7

    *******************************************************************************

    Values after running query 17 first time
    ************************************
    Query took: 6 min 46 sec
    ************************************

    Hadoop1 impalad
    ************************
    Host Memory Usage - 38.4GB Physical, 4.6GB Cached
    Resident Memory - 39.7GB
    Top Command - RES = 39GB, %MEM = 84.5

    Hadoop3 impalad
    ************************
    Host Memory Usage - 34.5GB Physical, 4.6 Cached
    Resident Memory - 29.8 GB
    Top Command - RES = 29GB, %MEM = 63.3

    Hadoop3 catalogd
    ************************
    Top Command - RES = 309m, %MEM = .6

    *******************************************************************************

    Values after running query 17 second time
    ************************************
    Query took: 24 min 21 sec
    ************************************
    Hadoop1 impalad
    ************************
    Host Memory Usage - 46.7GB Physical, 0GB Cached, Swapped used 8.4GB
    Resident Memory - 41.8GB
    Top Command - RES = 41GB, %MEM = 88.9
    Physical Memory display shows 46.5 of 47.1 GB used

    Hadoop3 impalad
    ************************
    Host Memory Usage - 45GB Physical, 1.8 Cached
    Resident Memory - 40GB
    Top Command - RES = 39GB, %MEM = 84.9
    Physical Memory display shows 45 of 47.1 GB used

    Hadoop3 catalogd
    ************************
    Top Command - RES = 314m, %MEM = .7

    *******************************************************************************

    Values after running query 17 third time
    ************************************
    Query took: 51 min 42 sec
    ************************************
    Hadoop1 impalad
    ************************
    Host Memory Usage - 46.6xGB Physical, 0GB Cached, Swapped used 15.4GB
    Resident Memory - 42GB
    Top Command - RES = 42GB, %MEM = 89.2
    Physical Memory display shows 46.5 of 47.1 GB used

    Hadoop3 impalad
    ************************
    Host Memory Usage - 46.8GB Physical, 0GB Cached, Swapped used 27GB
    Resident Memory - 40.7GB
    Top Command - RES = 40GB, %MEM = 86.4
    Physical Memory display shows 46.8 of 47.1 GB used

    Hadoop3 catalogd
    ************************
    Top Command - RES = 272m, %MEM = .5










    On Sat, Jan 18, 2014 at 3:14 PM, Alan Choi wrote:

    Hi Jim,

    By default, Impala will impose a memory limit of 80% of the system memory.
    So, you're having a mem-limit of 37GB (as shown in the memz.txt). Impala
    shouldn't use a whole lot more than 37GB.

    Two things I want to make sure. First, you probably have some other
    process running on the node: such as CM agent, monitor, and Impala catalog
    service. How much memory are consumed by these process?

    Second, just to make sure, you don't have HBase running, right?

    Thanks,
    Alan

    On Sat, Jan 18, 2014 at 5:46 AM, Jim Williams wrote:

    Hi Nong,

    My cluster is CM managed. I have not touched any parms. I'm using it
    straight out of the box. Let me know if I should do something with process
    mem limit. And where I can set that. Or any other parms that will help
    performance.

    Here is what I did:

    I brought up the cluster (so it's a fresh start of everything including
    the os) and ran TPC-H query 7. Then I captured the memz and the metrics
    values.

    Note that after I did this, I ran query 7 again and it took twice as long
    to run as the first time. I ran it again and it took 3 times as long as
    the second time. So the times were (4 min, 8 min, 24 min).

    This is a 4 node cluster. Each has 48GB mem. My TPC-H database is 100GB.

    Thanks,
    Jim

    On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the
    lineitem table has some corrupt data. Some of the date fields have
    unprintable characters and not dates in them. At the 1GB level I don't get
    this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled? If
    you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and
    compiled some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test
    query run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the
    following link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Jim Williams at Jan 20, 2014 at 7:35 pm
    I meant to mention that when I bring down Impala, a lot of memory is
    released. I can run query 17 again and it takes the same amount of time as
    the first time after a reboot of the cluster. But then I'm in the same
    situation where any subsequent queries will take a lot longer to run.

    Also, I looked at all the CM monitors. They are using very little memory.


    Thanks,
    Jim

    On Mon, Jan 20, 2014 at 12:40 PM, Jim Williams wrote:

    Last couple of emails failed. Here is another attempt.

    HBase is running but is not in use. The only thing running on the cluster
    are the TPC-H queries. One at a time. I tried to stop HBase anyway for my
    next run and I got a message saying to stop Impala and Hue first, so I
    didn't do that.

    I have Impala daemons running on all 4 nodes in the cluster. The catalog
    service is on my third node.

    Here are the results of running TPC-H query 17, 3 times. I will wait a
    few minutes in between each query to gather statistics. I'm getting the
    statistics from the impalad screens and the top command. I'm taking them
    from my #1 machine where I run the queries and the #3 machine where the
    catalog is. There are impalad daemons on all machines in the cluster.

    There is nothing else running on the cluster. Each node had 48GB mem.

    Query times were:
    1. 6 min, 46 sec
    2. 24 min, 21 sec
    3. 51 min 42 sec

    Values after reboot of cluster

    ****************************************************************************

    Hadoop1 impalad
    ************************
    Host Memory Usage - 2.5GB Physical, 3.3 GB Cached
    Resident Memory - 227 MB
    Top Command - RES = 226m, %MEM = .5

    Hadoop3 impalad
    ************************
    Host Memory Usage - 4.6GB Physical, 3.7GB Cached
    Resident Memory - 229 MB
    Top Command - RES = 229m, %MEM = .5

    Hadoop3 catalogd
    ************************
    Top Command - RES = 326m, %MEM = .7


    *******************************************************************************

    Values after running query 17 first time
    ************************************
    Query took: 6 min 46 sec
    ************************************

    Hadoop1 impalad
    ************************
    Host Memory Usage - 38.4GB Physical, 4.6GB Cached
    Resident Memory - 39.7GB
    Top Command - RES = 39GB, %MEM = 84.5

    Hadoop3 impalad
    ************************
    Host Memory Usage - 34.5GB Physical, 4.6 Cached
    Resident Memory - 29.8 GB
    Top Command - RES = 29GB, %MEM = 63.3

    Hadoop3 catalogd
    ************************
    Top Command - RES = 309m, %MEM = .6


    *******************************************************************************

    Values after running query 17 second time
    ************************************
    Query took: 24 min 21 sec
    ************************************
    Hadoop1 impalad
    ************************
    Host Memory Usage - 46.7GB Physical, 0GB Cached, Swapped used 8.4GB
    Resident Memory - 41.8GB
    Top Command - RES = 41GB, %MEM = 88.9
    Physical Memory display shows 46.5 of 47.1 GB used

    Hadoop3 impalad
    ************************
    Host Memory Usage - 45GB Physical, 1.8 Cached
    Resident Memory - 40GB
    Top Command - RES = 39GB, %MEM = 84.9
    Physical Memory display shows 45 of 47.1 GB used

    Hadoop3 catalogd
    ************************
    Top Command - RES = 314m, %MEM = .7


    *******************************************************************************

    Values after running query 17 third time
    ************************************
    Query took: 51 min 42 sec
    ************************************
    Hadoop1 impalad
    ************************
    Host Memory Usage - 46.6xGB Physical, 0GB Cached, Swapped used 15.4GB
    Resident Memory - 42GB
    Top Command - RES = 42GB, %MEM = 89.2
    Physical Memory display shows 46.5 of 47.1 GB used

    Hadoop3 impalad
    ************************
    Host Memory Usage - 46.8GB Physical, 0GB Cached, Swapped used 27GB
    Resident Memory - 40.7GB
    Top Command - RES = 40GB, %MEM = 86.4
    Physical Memory display shows 46.8 of 47.1 GB used

    Hadoop3 catalogd
    ************************
    Top Command - RES = 272m, %MEM = .5










    On Sat, Jan 18, 2014 at 3:14 PM, Alan Choi wrote:

    Hi Jim,

    By default, Impala will impose a memory limit of 80% of the system
    memory. So, you're having a mem-limit of 37GB (as shown in the memz.txt).
    Impala shouldn't use a whole lot more than 37GB.

    Two things I want to make sure. First, you probably have some other
    process running on the node: such as CM agent, monitor, and Impala catalog
    service. How much memory are consumed by these process?

    Second, just to make sure, you don't have HBase running, right?

    Thanks,
    Alan

    On Sat, Jan 18, 2014 at 5:46 AM, Jim Williams wrote:

    Hi Nong,

    My cluster is CM managed. I have not touched any parms. I'm using it
    straight out of the box. Let me know if I should do something with process
    mem limit. And where I can set that. Or any other parms that will help
    performance.

    Here is what I did:

    I brought up the cluster (so it's a fresh start of everything including
    the os) and ran TPC-H query 7. Then I captured the memz and the metrics
    values.

    Note that after I did this, I ran query 7 again and it took twice as
    long to run as the first time. I ran it again and it took 3 times as long
    as the second time. So the times were (4 min, 8 min, 24 min).

    This is a 4 node cluster. Each has 48GB mem. My TPC-H database is
    100GB.

    Thanks,
    Jim

    On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

    Inline.
    On Fri, Jan 17, 2014 at 4:04 AM, wrote:

    Hello,

    I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
    running Impala 1.2.1 and I've run into 2 issues.

    1. When I load my Parquet tables with a 10GB TPC-H database the
    lineitem table has some corrupt data. Some of the date fields have
    unprintable characters and not dates in them. At the 1GB level I don't get
    this.
    You're running into https://issues.cloudera.org/browse/IMPALA-692. I
    recommend upgrading to 1.2.3.

    2. When I run at the 100GB level (text format because of issue above)
    there appears to be a memory leak in Impala. After I run a few queries all
    the memory on my systems (48GB) is taken up and not released. So, at that
    point it is non-stop swapping.
    Is this a CM managed cluster? Do you have process mem limits enabled?
    If you can get the cluster in this state or close to it, can you
    send us the /memz page and /metrics from the debug web page?

    Has anyone seen these issues?


    Thanks,
    Jim

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and
    compiled some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test
    query run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was
    prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
    Hejit (from OnMarc).

    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
    wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our
    environments using a simple cost performance factor metric. The cost of
    our environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you
    can use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the
    following link has the files I created/used.

    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
    sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Bps cdh at Feb 28, 2014 at 3:31 pm
    Awesome post, very helpful!

    I have a basic question about running TPC-H against compressed data (I've
    run it against text and parquet uncompressed). Below are the tables dgen
    created for me if I want to test compression is it as simple as using
    snappy/bzip/gzip/zip etc... command line tools against each of the tables
    below then putting them into hdfs? or is it more complicated than that?

    Please let me know - Thanks

    customer.tbl
    lineitem.tbl
    nation.tbl
    orders.tbl
    partsupp.tbl
    part.tbl
    region.tbl
    supplier.tbl

    On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

    Hi All,

    Recently a few of us following this forum have collaborated and compiled
    some interesting Impala bench-marking results.

    We've used the TPC-H data-set on our small HADOOP clusters to test query
    run-times with Text and Parquet file formats.

    The following spreadsheet (which is still a work in progress) was prepared
    by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim Hejit (from
    OnMarc).


    https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

    On row 21 on the summary sheet we have tried to benchmark our environments
    using a simple cost performance factor metric. The cost of our
    environments is based monthly on hardware and operating costs (e.g
    Electricity), assuming the hardware is fully depreciated (straight line
    method) over 3 years.

    While the initial focus was on simple queries on a single table, some
    attempts have also been made to compare performance using the complex
    queries documented in TPC-H (http://www.tpc.org/tpch/)



    To re-create these tests (on Text tables) in your own environment you can
    use the following link

    https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

    If you also want to try it with Parquet format tables then the following
    link has the files I created/used.


    https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=sharing


    Why not try these on your own clusters (especially those running
    significantly larger clusters).

    We’d be only too happy to include your results.

    Kind Regards

    Aron
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJul 22, '13 at 4:14p
activeFeb 28, '14 at 3:31p
posts13
users6
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase