FAQ
Hi Jim,

By default, Impala will impose a memory limit of 80% of the system memory.
So, you're having a mem-limit of 37GB (as shown in the memz.txt). Impala
shouldn't use a whole lot more than 37GB.

Two things I want to make sure. First, you probably have some other process
running on the node: such as CM agent, monitor, and Impala catalog service.
How much memory are consumed by these process?

Second, just to make sure, you don't have HBase running, right?

Thanks,
Alan

On Sat, Jan 18, 2014 at 5:46 AM, Jim Williams wrote:

Hi Nong,

My cluster is CM managed. I have not touched any parms. I'm using it
straight out of the box. Let me know if I should do something with process
mem limit. And where I can set that. Or any other parms that will help
performance.

Here is what I did:

I brought up the cluster (so it's a fresh start of everything including
the os) and ran TPC-H query 7. Then I captured the memz and the metrics
values.

Note that after I did this, I ran query 7 again and it took twice as long
to run as the first time. I ran it again and it took 3 times as long as
the second time. So the times were (4 min, 8 min, 24 min).

This is a 4 node cluster. Each has 48GB mem. My TPC-H database is 100GB.

Thanks,
Jim

On Fri, Jan 17, 2014 at 12:52 PM, Nong Li wrote:

Inline.
On Fri, Jan 17, 2014 at 4:04 AM, wrote:

Hello,

I'm trying to do the TPC-H queries with a small 4 node cluster. I'm
running Impala 1.2.1 and I've run into 2 issues.

1. When I load my Parquet tables with a 10GB TPC-H database the lineitem
table has some corrupt data. Some of the date fields have unprintable
characters and not dates in them. At the 1GB level I don't get this.
You're running into https://issues.cloudera.org/browse/IMPALA-692. I
recommend upgrading to 1.2.3.

2. When I run at the 100GB level (text format because of issue above)
there appears to be a memory leak in Impala. After I run a few queries all
the memory on my systems (48GB) is taken up and not released. So, at that
point it is non-stop swapping.
Is this a CM managed cluster? Do you have process mem limits enabled? If
you can get the cluster in this state or close to it, can you
send us the /memz page and /metrics from the debug web page?

Has anyone seen these issues?


Thanks,
Jim

On Monday, July 22, 2013 12:14:00 PM UTC-4, Aron MacDonald wrote:

Hi All,

Recently a few of us following this forum have collaborated and
compiled some interesting Impala bench-marking results.

We've used the TPC-H data-set on our small HADOOP clusters to test
query run-times with Text and Parquet file formats.

The following spreadsheet (which is still a work in progress) was
prepared by myself, Lee Jung-Yup, Henrik Behrens and most recently Tim
Hejit (from OnMarc).

https://docs.google.com/spreadsheet/ccc?key=0AgQ09vI0R_
wIdEVMeTQwZGJSOVQwcFRSRFFFUmcxWWc#gid=6

On row 21 on the summary sheet we have tried to benchmark our
environments using a simple cost performance factor metric. The cost of
our environments is based monthly on hardware and operating costs (e.g
Electricity), assuming the hardware is fully depreciated (straight line
method) over 3 years.

While the initial focus was on simple queries on a single table, some
attempts have also been made to compare performance using the complex
queries documented in TPC-H (http://www.tpc.org/tpch/)



To re-create these tests (on Text tables) in your own environment you
can use the following link

https://github.com/kj-ki/tpc-h-impala/tree/master/tpch_impala

If you also want to try it with Parquet format tables then the
following link has the files I created/used.

https://docs.google.com/file/d/0Bxydpie8Km_fNUtvREdxYVNJWkE/edit?usp=
sharing


Why not try these on your own clusters (especially those running
significantly larger clusters).

We’d be only too happy to include your results.

Kind Regards

Aron
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 13 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedJul 22, '13 at 4:14p
activeFeb 28, '14 at 3:31p
posts13
users6
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase