FAQ
Hi,
after running tests on both MemoryIndex and RAMDirectory based index
in Lucene 3.1, seems MultiPhraseQueries are slowing down over time
(each iteration of executing the same MultiPhraseQueries on the same
doc, seems to require more and more execution time). Are there any
existing/known issues related to the MultiPhraseQuery in Lucene 3.1
which could lead to this performance drop?

Tomislav

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Uwe Schindler at May 2, 2011 at 4:15 pm
    Can you checkout latest 3.1 branch @
    https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1

    And test if it solves your issue. There was aproblem in PhraseQuery's
    internal sorting and quicksort. It does not slowdown over time, but with
    type of query (how many terms the phrase contains). Maybe you sort your
    queries according number of terms and so you get a slowdown.

    See issue: https://issues.apache.org/jira/browse/LUCENE-3054

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tomislav Poljak
    Sent: Monday, May 02, 2011 6:01 PM
    To: java-user@lucene.apache.org
    Subject: MultiPhraseQuery slowing down over time in Lucene 3.1

    Hi,
    after running tests on both MemoryIndex and RAMDirectory based index in
    Lucene 3.1, seems MultiPhraseQueries are slowing down over time (each
    iteration of executing the same MultiPhraseQueries on the same doc, seems
    to require more and more execution time). Are there any existing/known
    issues related to the MultiPhraseQuery in Lucene 3.1 which could lead to this
    performance drop?

    Tomislav

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 2, 2011 at 4:16 pm
    By "slowing down over time" do you mean you use the same index (no new
    docs added) yet running the same MPQ over and over you see it taking
    longer to execute over time?

    Mike

    http://blog.mikemccandless.com
    On Mon, May 2, 2011 at 12:00 PM, Tomislav Poljak wrote:
    Hi,
    after running tests on both MemoryIndex and RAMDirectory based index
    in Lucene 3.1, seems MultiPhraseQueries are slowing down over time
    (each iteration of executing the same MultiPhraseQueries on the same
    doc, seems to require more and more execution time). Are there any
    existing/known issues related to the MultiPhraseQuery in Lucene 3.1
    which could lead to this performance drop?

    Tomislav

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at May 2, 2011 at 5:19 pm
    Hi,

    I think this describes what's going on:

    10 load N stored queries
    20 parse N stored queries, keep them in some List forever
    30 for each incoming document create a new MemoryIndex instance "mi"
    40 for query 1 to N do mi.search(query)

    Over time this step 40 takes longer and longer and longer -- if some of the
    queries are MultiPhraseQueries. This is even with with mergeSort being used in
    MultiPhraseQuery.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Mon, May 2, 2011 12:15:40 PM
    Subject: Re: MultiPhraseQuery slowing down over time in Lucene 3.1

    By "slowing down over time" do you mean you use the same index (no new
    docs added) yet running the same MPQ over and over you see it taking
    longer to execute over time?

    Mike

    http://blog.mikemccandless.com
    On Mon, May 2, 2011 at 12:00 PM, Tomislav Poljak wrote:
    Hi,
    after running tests on both MemoryIndex and RAMDirectory based index
    in Lucene 3.1, seems MultiPhraseQueries are slowing down over time
    (each iteration of executing the same MultiPhraseQueries on the same
    doc, seems to require more and more execution time). Are there any
    existing/known issues related to the MultiPhraseQuery in Lucene 3.1
    which could lead to this performance drop?

    Tomislav

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 3, 2011 at 9:49 am
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?

    Mike

    http://blog.mikemccandless.com

    On Mon, May 2, 2011 at 1:19 PM, Otis Gospodnetic
    wrote:
    Hi,

    I think this describes what's going on:

    10 load N stored queries
    20 parse N stored queries, keep them in some List forever
    30 for each incoming document create a new MemoryIndex instance "mi"
    40 for query 1 to N do mi.search(query)

    Over time this step 40 takes longer and longer and longer -- if some of the
    queries are MultiPhraseQueries.  This is even with with mergeSort being used in
    MultiPhraseQuery.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Mon, May 2, 2011 12:15:40 PM
    Subject: Re: MultiPhraseQuery slowing down over time in Lucene 3.1

    By "slowing down over time" do you mean you use the same index (no new
    docs  added) yet running the same MPQ over and over you see it taking
    longer to  execute over time?

    Mike

    http://blog.mikemccandless.com
    On Mon, May 2, 2011 at  12:00 PM, Tomislav Poljak wrote:
    Hi,
    after running tests on both MemoryIndex and RAMDirectory based  index
    in Lucene 3.1, seems MultiPhraseQueries are slowing down over  time
    (each iteration of executing the same MultiPhraseQueries on the  same
    doc, seems to require more and more execution time). Are there  any
    existing/known issues related to the MultiPhraseQuery in Lucene  3.1
    which could lead to this performance drop?

    Tomislav

    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For  additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tomislav Poljak at May 3, 2011 at 11:44 am
    Hi,

    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?
    Yes. I've tested running same query list (~3,5 k queries) on the same
    MemoryIndex instance and after a while iterations get slower and
    slower. Same thing happens when running queries on the same instance
    of RAMDir based index holding only one doc. But, if I remove
    MultiPhraseQuery type of queries from the query list then speed of
    execution is the same, meaning execution time for other queries is
    constant and it doesn't grow over time (as it would be expected).

    I've tried to run tests with the latest 3.1 branch as Uwe suggested
    (checkout and built today) and slowdown is still present when
    MultiPhraseQuery type of queries are included (not removed from the
    query list).

    Tomislav
    Mike

    http://blog.mikemccandless.com

    On Mon, May 2, 2011 at 1:19 PM, Otis Gospodnetic
    wrote:
    Hi,

    I think this describes what's going on:

    10 load N stored queries
    20 parse N stored queries, keep them in some List forever
    30 for each incoming document create a new MemoryIndex instance "mi"
    40 for query 1 to N do mi.search(query)

    Over time this step 40 takes longer and longer and longer -- if some of the
    queries are MultiPhraseQueries.  This is even with with mergeSort being used in
    MultiPhraseQuery.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Mon, May 2, 2011 12:15:40 PM
    Subject: Re: MultiPhraseQuery slowing down over time in Lucene 3.1

    By "slowing down over time" do you mean you use the same index (no new
    docs  added) yet running the same MPQ over and over you see it taking
    longer to  execute over time?

    Mike

    http://blog.mikemccandless.com
    On Mon, May 2, 2011 at  12:00 PM, Tomislav Poljak wrote:
    Hi,
    after running tests on both MemoryIndex and RAMDirectory based  index
    in Lucene 3.1, seems MultiPhraseQueries are slowing down over  time
    (each iteration of executing the same MultiPhraseQueries on the  same
    doc, seems to require more and more execution time). Are there  any
    existing/known issues related to the MultiPhraseQuery in Lucene  3.1
    which could lead to this performance drop?

    Tomislav

    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For  additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at May 3, 2011 at 11:47 am

    Hi,

    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?
    Yes. I've tested running same query list (~3,5 k queries) on the same
    MemoryIndex instance and after a while iterations get slower and slower.
    Same thing happens when running queries on the same instance of RAMDir
    based index holding only one doc. But, if I remove MultiPhraseQuery type of
    queries from the query list then speed of execution is the same, meaning
    execution time for other queries is constant and it doesn't grow over time (as
    it would be expected).

    I've tried to run tests with the latest 3.1 branch as Uwe suggested (checkout
    and built today) and slowdown is still present when MultiPhraseQuery type
    of queries are included (not removed from the query list).
    Thanks! Then this is another issue. Seems to be some memory leak in this
    query? That's completely strange. Can you open an issue?


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 3, 2011 at 3:26 pm

    On Tue, May 3, 2011 at 7:43 AM, Tomislav Poljak wrote:
    Hi,

    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?
    Yes. I've tested running same query list (~3,5 k queries) on the same
    MemoryIndex instance and after a while iterations get slower and
    slower. Same thing happens when running queries on the same instance
    of RAMDir based index holding only one doc. But, if I remove
    MultiPhraseQuery type of queries from the query list then speed of
    execution is the same, meaning execution time for other queries is
    constant and it doesn't grow over time (as it would be expected).

    I've tried to run tests with the latest 3.1 branch as Uwe suggested
    (checkout and built today) and slowdown is still present when
    MultiPhraseQuery type of queries are included (not removed from the
    query list).
    Spooky!

    Can you boil this into a contained test case?

    Mike

    http://blog.mikemccandless.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tomislav Poljak at May 4, 2011 at 10:52 am
    Hi,
    seems there is a custom impl of MultiPhraseQuery used in the system,
    which uses (and maybe misuses) Lucene's MultiPhraseQuery that could be
    the reason of slowdown. I've tried running sample Lucene's
    MultiPhraseQuery in an infinite while loop printing out times for
    every 1000 executions and couldn't reproduce slowdown.

    Thanks for provided assistance,

    Tomislav


    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    On Tue, May 3, 2011 at 7:43 AM, Tomislav Poljak wrote:
    Hi,

    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?
    Yes. I've tested running same query list (~3,5 k queries) on the same
    MemoryIndex instance and after a while iterations get slower and
    slower. Same thing happens when running queries on the same instance
    of RAMDir based index holding only one doc. But, if I remove
    MultiPhraseQuery type of queries from the query list then speed of
    execution is the same, meaning execution time for other queries is
    constant and it doesn't grow over time (as it would be expected).

    I've tried to run tests with the latest 3.1 branch as Uwe suggested
    (checkout and built today) and slowdown is still present when
    MultiPhraseQuery type of queries are included (not removed from the
    query list).
    Spooky!

    Can you boil this into a contained test case?

    Mike

    http://blog.mikemccandless.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 4, 2011 at 12:22 pm
    OK, phew :) Thanks for bringing closure...

    Mike

    http://blog.mikemccandless.com
    On Wed, May 4, 2011 at 6:52 AM, Tomislav Poljak wrote:
    Hi,
    seems there is a custom impl of MultiPhraseQuery used in the system,
    which uses (and maybe misuses) Lucene's MultiPhraseQuery that could be
    the reason of slowdown. I've tried running sample Lucene's
    MultiPhraseQuery in an infinite while loop printing out times for
    every 1000 executions and couldn't reproduce slowdown.

    Thanks for provided assistance,

    Tomislav


    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    On Tue, May 3, 2011 at 7:43 AM, Tomislav Poljak wrote:
    Hi,

    2011/5/3 Michael McCandless <lucene@mikemccandless.com>:
    I feel like we are back to Basic ;)

    If you keep running line 40 over and over on the same memory index, do
    you see a slowdown?
    Yes. I've tested running same query list (~3,5 k queries) on the same
    MemoryIndex instance and after a while iterations get slower and
    slower. Same thing happens when running queries on the same instance
    of RAMDir based index holding only one doc. But, if I remove
    MultiPhraseQuery type of queries from the query list then speed of
    execution is the same, meaning execution time for other queries is
    constant and it doesn't grow over time (as it would be expected).

    I've tried to run tests with the latest 3.1 branch as Uwe suggested
    (checkout and built today) and slowdown is still present when
    MultiPhraseQuery type of queries are included (not removed from the
    query list).
    Spooky!

    Can you boil this into a contained test case?

    Mike

    http://blog.mikemccandless.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 2, '11 at 4:01p
activeMay 4, '11 at 12:22p
posts10
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase