Hi,

We are running Elasticsearch 0.90.2 on Debian 7.0/OpenJDK7u3 (2 nodes
cluster).
From time to time, Elasticsearch stop responding and the issue looks
related to the Garbage Collector.

Here are the information we have collected when problems occur :
-The search threadpool hits the concurrent active items limit and the queue
limit (default values, ie 36 threads and 1000 slots in the queue).
-We have high rate of slow queries (>8 seconds)
-The garbage collector logs long passes (around 6 seconds)
-Clients get Rejected exceptions
All of this happens for several minutes (> 10) from time to time

Then everything get back to normal.
Logs are attached.

The values we have :
System total memory : 6GB
ES_HEAP_SIZE=3g

We are almost sure this issue comes from long GC run.
We are planning to change the GC for G1 (after upgrading to Java 7u25,
because this GC requires Java 7u4), but I've seen in this group one thread
saying it crashes:/
What can we do to prevent this behavior and run ES smoothly ?



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Radu Gheorghe at Nov 14, 2013 at 5:42 pm
    Hello Vincent,

    If the search threadpool hits the limit maybe you have too many concurrent
    searches. If that's the case, you'll probably have to just add nodes and/or
    increase the number of replicas. Or, you can look at making your queries
    faster, if that's possible.

    G1 may help, I would try it and see how it goes.

    Last but not least, I would look at what is consuming memory. Is it field
    cache? Is it filter cache? I think nodes stats can tell you that, and you
    could turn a few knobs there and limit memory usage:
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html

    You may also want to try out our SPM for Elasticsearch. It will show you
    all sorts of metrics, from Garbage Collector and pool sizes to cache sizes.
    I assume it would be very helpful in this particular case:
    http://sematext.com/spm/elasticsearch-performance-monitoring/

    Best regards,
    Radu
    --
    Performance Monitoring * Log Analytics * Search Analytics
    Solr & Elasticsearch Support * http://sematext.com/
    On Thu, Nov 14, 2013 at 6:06 AM, vincent miszczak wrote:

    Hi,

    We are running Elasticsearch 0.90.2 on Debian 7.0/OpenJDK7u3 (2 nodes
    cluster).
    From time to time, Elasticsearch stop responding and the issue looks
    related to the Garbage Collector.

    Here are the information we have collected when problems occur :
    -The search threadpool hits the concurrent active items limit and the
    queue limit (default values, ie 36 threads and 1000 slots in the queue).
    -We have high rate of slow queries (>8 seconds)
    -The garbage collector logs long passes (around 6 seconds)
    -Clients get Rejected exceptions
    All of this happens for several minutes (> 10) from time to time

    Then everything get back to normal.
    Logs are attached.

    The values we have :
    System total memory : 6GB
    ES_HEAP_SIZE=3g

    We are almost sure this issue comes from long GC run.
    We are planning to change the GC for G1 (after upgrading to Java 7u25,
    because this GC requires Java 7u4), but I've seen in this group one thread
    saying it crashes:/
    What can we do to prevent this behavior and run ES smoothly ?



    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Mark Walkom at Nov 14, 2013 at 10:05 pm
    You probably also want to change to Oracle java as well, OpenJDK is not
    recommended.

    Regards,
    Mark Walkom

    Infrastructure Engineer
    Campaign Monitor
    email: markw@campaignmonitor.com
    web: www.campaignmonitor.com

    On 15 November 2013 04:42, Radu Gheorghe wrote:

    Hello Vincent,

    If the search threadpool hits the limit maybe you have too many concurrent
    searches. If that's the case, you'll probably have to just add nodes and/or
    increase the number of replicas. Or, you can look at making your queries
    faster, if that's possible.

    G1 may help, I would try it and see how it goes.

    Last but not least, I would look at what is consuming memory. Is it field
    cache? Is it filter cache? I think nodes stats can tell you that, and you
    could turn a few knobs there and limit memory usage:

    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html

    You may also want to try out our SPM for Elasticsearch. It will show you
    all sorts of metrics, from Garbage Collector and pool sizes to cache sizes.
    I assume it would be very helpful in this particular case:
    http://sematext.com/spm/elasticsearch-performance-monitoring/

    Best regards,
    Radu
    --
    Performance Monitoring * Log Analytics * Search Analytics
    Solr & Elasticsearch Support * http://sematext.com/

    On Thu, Nov 14, 2013 at 6:06 AM, vincent miszczak <
    vincent.miszczak@gmail.com> wrote:
    Hi,

    We are running Elasticsearch 0.90.2 on Debian 7.0/OpenJDK7u3 (2 nodes
    cluster).
    From time to time, Elasticsearch stop responding and the issue looks
    related to the Garbage Collector.

    Here are the information we have collected when problems occur :
    -The search threadpool hits the concurrent active items limit and the
    queue limit (default values, ie 36 threads and 1000 slots in the queue).
    -We have high rate of slow queries (>8 seconds)
    -The garbage collector logs long passes (around 6 seconds)
    -Clients get Rejected exceptions
    All of this happens for several minutes (> 10) from time to time

    Then everything get back to normal.
    Logs are attached.

    The values we have :
    System total memory : 6GB
    ES_HEAP_SIZE=3g

    We are almost sure this issue comes from long GC run.
    We are planning to change the GC for G1 (after upgrading to Java 7u25,
    because this GC requires Java 7u4), but I've seen in this group one thread
    saying it crashes:/
    What can we do to prevent this behavior and run ES smoothly ?



    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Joergprante at Nov 15, 2013 at 8:20 am
    If you update to ES 0.90.7 or later, you should be safe in using the G1 GC
    collector, because the GNU Trove collections have been replaced by HPPC.

    I agree, moving from OpenJDK to the latest Oracle Java (7u25 is known to be
    stable) can help in erratic memory situations.

    But first, you have to check what is reason why you allocate so much data
    on the heap so that it can not be garbage collected. Maybe simply your
    requirements are too high for just 2 nodes and you need to add more nodes.

    Jörg

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Vincent miszczak at Nov 15, 2013 at 1:07 pm
    Hi guys,

    Thank you for your help.
    From your advice, we're gonna test ES 0.90.7+, Oracle JDK, G1 and look what
    is consuming memory.
    This will take a few days to setup the environment, I'll come back with
    results when I have them.

    About the load, our CPU are most of the time idle. Would having more memory
    help to have a better behavior ?

    About Oracle/OpenJDK, is there a real difference between the 2 products
    behaviour with ES ? OpenJDK ships builtin into Debian, while OracleJDK
    don't. We like the idea to simply apt-get upgrade the package to get the
    latest patches (and Java is very often patched).

    Vincent


    We're

    Le jeudi 14 novembre 2013 15:06:56 UTC+1, vincent miszczak a écrit :
    Hi,

    We are running Elasticsearch 0.90.2 on Debian 7.0/OpenJDK7u3 (2 nodes
    cluster).
    From time to time, Elasticsearch stop responding and the issue looks
    related to the Garbage Collector.

    Here are the information we have collected when problems occur :
    -The search threadpool hits the concurrent active items limit and the
    queue limit (default values, ie 36 threads and 1000 slots in the queue).
    -We have high rate of slow queries (>8 seconds)
    -The garbage collector logs long passes (around 6 seconds)
    -Clients get Rejected exceptions
    All of this happens for several minutes (> 10) from time to time

    Then everything get back to normal.
    Logs are attached.

    The values we have :
    System total memory : 6GB
    ES_HEAP_SIZE=3g

    We are almost sure this issue comes from long GC run.
    We are planning to change the GC for G1 (after upgrading to Java 7u25,
    because this GC requires Java 7u4), but I've seen in this group one thread
    saying it crashes:/
    What can we do to prevent this behavior and run ES smoothly ?


    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Joergprante at Nov 15, 2013 at 2:21 pm
    It depends on the OpenJDK release version. You have mentioned OpenJDK 7u3,
    and this is an older version, maybe with open bugs affecting Lucene/ES.

    OpenJDK forms the base for Oracle JDK
    http://openjdk.java.net/projects/jdk7u/qanda.html

    Vendors and distributors may patch OpenJDK for their purposes, and they
    also recommend Java versions. It is up to you to get informed about the
    best solution for you.

    I always recommend to update to the latest Java 7 version, because of the
    chance to get most known bugs fixed.

    Please note that every once a while, new Java releases bring new challenges
    to run Lucene/ES smoothly. There is unfortunately no "official"
    certification process of finding a reliable JVM for Lucene/ES to mitigate
    risks, only advise from best practice is available. In general, all JVMs
    since version 6 should be able to run ES "somehow" (ie. without crashing).

    Jörg

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Vincent miszczak at Dec 9, 2013 at 1:39 pm
    Hi,

    Some feedback for the community.

    We have upgraded to ES 0.90.7+, same GC problems.

    We have upgraded OpenJDK 7 from update 3 to update 25 :
    1. we have a 2 nodes cluster, and running u3 aside with u25 give
    serialization errors, we needed to upgrade both hosts at the same time.
    2. we got strange results with u25. We had large CPU usage and ES regularly
    stopped responding because of this.

    We have upgraded to Oracle JDK 7 update 45.
    No more CPU issue, no problem for now, no more GC issues without more
    tuning. We are still watching if GC behaves correctly but behaviour looks
    much better.

    Vincent

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e255ab6b-385f-47ee-a8f6-c5b95f12069d%40googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupelasticsearch @
categorieselasticsearch
postedNov 14, '13 at 2:07p
activeDec 9, '13 at 1:39p
posts7
users4
websiteelasticsearch.org
irc#elasticsearch

People

Translate

site design / logo © 2017 Grokbase