FAQ
Hello all,

I am have indexed more than 4 million documents. My query fetches 300,000
hits. If i perform sorting on any field, then tomcat reports out of memory
exception.
Sometimes the query results may be around 1000, but sorting on any field
might take more than 30 - 50 secs.

I don't know what's going wrong.

My index searcher is static object and it is getting refreshed every minute.
JSP pages directly calls the index searcher object and performs search.

Regards
Ganesh


Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Fuad Efendi at Sep 17, 2008 at 1:58 pm
    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo <emailgane@yahoo.co.in>:
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh - yahoo at Sep 18, 2008 at 4:54 am
    My index is growing by 1 million records per day. How much memory do i need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Fuad Efendi" <fuad@efendi.ca>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo <emailgane@yahoo.co.in>:
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Sep 17, 2008 at 2:21 pm
    <<<My index searcher is static object and it is getting refreshed every
    minute>>>

    Does this mean you close/open your searcher every minute? If so, this could
    be the root of why your sorting is taking so long. Although it's not the OOM
    problem, see Faud's email.

    The first few searches on a newly opened searcher do a lot of setup, which
    is very
    expensive. A 50 second search is usually considered unacceptable, so you
    might
    want to revisit how often you open/close your searcher if, indeed, you are
    doing it
    every minute.

    Best
    Erick
    On Wed, Sep 17, 2008 at 8:46 AM, Ganesh - yahoo wrote:

    Hello all,

    I am have indexed more than 4 million documents. My query fetches 300,000
    hits. If i perform sorting on any field, then tomcat reports out of memory
    exception.
    Sometimes the query results may be around 1000, but sorting on any field
    might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Sep 18, 2008 at 6:48 am
    If your index is increasing in size so fast, you should start thinking about sharding your index (breaking it into multiple smaller indices that each fits on its server) and searching across them (aka distributed search).

    Yes, Lucene can handle millions of records if run on adequate hardware and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----
    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh - yahoo at Sep 19, 2008 at 5:27 am
    Ok. If i distribure the indexes, whether sorting would be faster?

    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?
    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    Don't know why Lucene keeps all terms in FieldCache for sorting. It supposed
    to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting

    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices that
    each fits on its server) and searching across them (aka distributed
    search).

    Yes, Lucene can handle millions of records if run on adequate hardware and
    if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----
    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh - yahoo at Sep 22, 2008 at 11:46 am
    My index crossed 5 GB and 5 million documents are indexed.
    My query includes searching and sorting returns 40000 hits.

    If i do search from a standalone application, the results are returned in 12
    seconds. If i perform the same from web application running inside Tomcat,
    out of memory exception is occured.

    Could any one clarify it?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Ganesh - yahoo" <emailgane@yahoo.co.in>
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 19, 2008 10:56 AM
    Subject: Re: Exception while doing sorting

    Ok. If i distribure the indexes, whether sorting would be faster?

    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?
    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    Don't know why Lucene keeps all terms in FieldCache for sorting. It
    supposed to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting

    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices that
    each fits on its server) and searching across them (aka distributed
    search).

    Yes, Lucene can handle millions of records if run on adequate hardware
    and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----
    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and
    performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Sep 22, 2008 at 12:59 pm
    Sure, your tomcat instance is assigning some amount of memory
    to the JVM that your searcher is running in. Of course, now you're
    going to ask me now to increase that number... I have no idea but
    I've seen this question multiple times in the mail archive,
    so a search there or in the tomcat docs should let you know.

    But 12 seconds is still a long time to wait for a search to complete.
    Can you tell us more about your search?

    For instance, are you opening a searcher for each request? That's bad.
    Are you sorting? that can take a long time, but again the first one
    will have a performance penalty as things are cached.

    There are a number of tips here:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

    Best
    Erick
    On Mon, Sep 22, 2008 at 7:45 AM, Ganesh - yahoo wrote:

    My index crossed 5 GB and 5 million documents are indexed.
    My query includes searching and sorting returns 40000 hits.

    If i do search from a standalone application, the results are returned in
    12 seconds. If i perform the same from web application running inside
    Tomcat, out of memory exception is occured.

    Could any one clarify it?

    Regards
    Ganesh

    ----- Original Message ----- From: "Ganesh - yahoo" <emailgane@yahoo.co.in
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 19, 2008 10:56 AM

    Subject: Re: Exception while doing sorting


    Ok. If i distribure the indexes, whether sorting would be faster?
    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    Don't know why Lucene keeps all terms in FieldCache for sorting. It
    supposed to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message ----- From: "Otis Gospodnetic" <
    otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting


    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices that
    each fits on its server) and searching across them (aka distributed search).

    Yes, Lucene can handle millions of records if run on adequate hardware
    and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----

    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message ----- From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and > performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh - yahoo at Sep 23, 2008 at 4:52 am
    System Specification:
    Processor speed: 2Ghz
    Ram: 3 GB

    IndexDB size 5 GB.
    Total documents indexed: 5.8 million.

    To collect hits, i have replaced Hits object with TopFieldDocs. This has
    improved the search performance better. Sorting is faster on date / long
    field, but it is very slow on string field. In a standalone application it
    took 10 - 20 secs to dispaly the results sorted on string field. [I am not
    opening indexsearcher every time].

    Regards
    Ganesh



    ----- Original Message -----
    From: "Erick Erickson" <erickerickson@gmail.com>
    To: <java-user@lucene.apache.org>
    Sent: Monday, September 22, 2008 6:29 PM
    Subject: Re: Exception while doing sorting

    Sure, your tomcat instance is assigning some amount of memory
    to the JVM that your searcher is running in. Of course, now you're
    going to ask me now to increase that number... I have no idea but
    I've seen this question multiple times in the mail archive,
    so a search there or in the tomcat docs should let you know.

    But 12 seconds is still a long time to wait for a search to complete.
    Can you tell us more about your search?

    For instance, are you opening a searcher for each request? That's bad.
    Are you sorting? that can take a long time, but again the first one
    will have a performance penalty as things are cached.

    There are a number of tips here:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

    Best
    Erick

    On Mon, Sep 22, 2008 at 7:45 AM, Ganesh - yahoo
    wrote:
    My index crossed 5 GB and 5 million documents are indexed.
    My query includes searching and sorting returns 40000 hits.

    If i do search from a standalone application, the results are returned in
    12 seconds. If i perform the same from web application running inside
    Tomcat, out of memory exception is occured.

    Could any one clarify it?

    Regards
    Ganesh

    ----- Original Message ----- From: "Ganesh - yahoo"
    <emailgane@yahoo.co.in
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 19, 2008 10:56 AM

    Subject: Re: Exception while doing sorting


    Ok. If i distribure the indexes, whether sorting would be faster?
    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    Don't know why Lucene keeps all terms in FieldCache for sorting. It
    supposed to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message ----- From: "Otis Gospodnetic" <
    otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting


    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices
    that
    each fits on its server) and searching across them (aka distributed
    search).

    Yes, Lucene can handle millions of records if run on adequate hardware
    and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----

    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message ----- From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and > performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Sep 23, 2008 at 12:40 pm
    That still seems excessive. Are you measuring your first sort? Lucene
    builds up caches to help sort with the first few *sorts* that happen, so
    that's a possibility.

    But if that isn't the case, I think you need to slap a profiler on the
    problem and see where you're spending your time. I'd also be careful
    about what you measure when you measure your query. For instance,
    I've been fooled by measuring the total time to get an assembled response
    and it turned out that the time was spent fetching the documents, NOT
    searching/sorting.

    Try measuring various operations. In particular comment out anything having
    to do with assembling the response. Perhaps just substitute in making a list
    of the doc IDs and time *that*. Slowly build back up to your current app,
    and
    I suspect that one of the steps will cause your time to increase
    dramatically.

    How many documents are you assembling to respond? If you're assembling
    40,000 hits, then 10-20 seconds may not be unreasonable.

    Best
    Erick
    On Tue, Sep 23, 2008 at 12:51 AM, Ganesh - yahoo wrote:

    System Specification:
    Processor speed: 2Ghz
    Ram: 3 GB

    IndexDB size 5 GB.
    Total documents indexed: 5.8 million.

    To collect hits, i have replaced Hits object with TopFieldDocs. This has
    improved the search performance better. Sorting is faster on date / long
    field, but it is very slow on string field. In a standalone application it
    took 10 - 20 secs to dispaly the results sorted on string field. [I am not
    opening indexsearcher every time].

    Regards
    Ganesh



    ----- Original Message ----- From: "Erick Erickson" <
    erickerickson@gmail.com>
    To: <java-user@lucene.apache.org>
    Sent: Monday, September 22, 2008 6:29 PM

    Subject: Re: Exception while doing sorting


    Sure, your tomcat instance is assigning some amount of memory
    to the JVM that your searcher is running in. Of course, now you're
    going to ask me now to increase that number... I have no idea but
    I've seen this question multiple times in the mail archive,
    so a search there or in the tomcat docs should let you know.

    But 12 seconds is still a long time to wait for a search to complete.
    Can you tell us more about your search?

    For instance, are you opening a searcher for each request? That's bad.
    Are you sorting? that can take a long time, but again the first one
    will have a performance penalty as things are cached.

    There are a number of tips here:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

    Best
    Erick

    On Mon, Sep 22, 2008 at 7:45 AM, Ganesh - yahoo <emailgane@yahoo.co.in
    wrote:
    My index crossed 5 GB and 5 million documents are indexed.
    My query includes searching and sorting returns 40000 hits.

    If i do search from a standalone application, the results are returned in
    12 seconds. If i perform the same from web application running inside
    Tomcat, out of memory exception is occured.

    Could any one clarify it?

    Regards
    Ganesh

    ----- Original Message ----- From: "Ganesh - yahoo" <
    emailgane@yahoo.co.in
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 19, 2008 10:56 AM

    Subject: Re: Exception while doing sorting


    Ok. If i distribure the indexes, whether sorting would be faster?
    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.

    Don't know why Lucene keeps all terms in FieldCache for sorting. It
    supposed to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message ----- From: "Otis Gospodnetic" <
    otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting


    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices
    that
    each fits on its server) and searching across them (aka distributed
    search).

    Yes, Lucene can handle millions of records if run on adequate hardware
    and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



    ----- Original Message ----

    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message ----- From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and > performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dipen at Sep 22, 2008 at 1:12 pm
    @ganesh:
    For increasing memory in tomcat, you wanna increase it in CATALINA_OPTS in
    catalina.sh file
    add this : -Xmx1500m which means it shud not use more than 1500 megs or
    -Xms500m should have atleast 500 megs




    On Mon, Sep 22, 2008 at 5:15 PM, Ganesh - yahoo wrote:

    My index crossed 5 GB and 5 million documents are indexed.
    My query includes searching and sorting returns 40000 hits.

    If i do search from a standalone application, the results are returned in
    12 seconds. If i perform the same from web application running inside
    Tomcat, out of memory exception is occured.

    Could any one clarify it?

    Regards
    Ganesh

    ----- Original Message ----- From: "Ganesh - yahoo" <emailgane@yahoo.co.in
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 19, 2008 10:56 AM

    Subject: Re: Exception while doing sorting


    Ok. If i distribure the indexes, whether sorting would be faster?
    In Lucene user group mailing list, most emails suggests to use single
    indicies. Searching across the indexes may not be slower?

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    Don't know why Lucene keeps all terms in FieldCache for sorting. It
    supposed to sort only the hits. Please clarify?

    Regards
    Ganesh

    ----- Original Message ----- From: "Otis Gospodnetic" <
    otis_gospodnetic@yahoo.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, September 18, 2008 12:17 PM
    Subject: Re: Exception while doing sorting


    If your index is increasing in size so fast, you should start thinking
    about sharding your index (breaking it into multiple smaller indices that
    each fits on its server) and searching across them (aka distributed search).

    Yes, Lucene can handle millions of records if run on adequate hardware
    and if used correctly.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----

    From: Ganesh - yahoo <emailgane@yahoo.co.in>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 18, 2008 12:53:19 AM
    Subject: Re: Exception while doing sorting

    My index is growing by 1 million records per day. How much memory do i
    need
    to increase.

    What kind of sorting algorithm being used in Lucene. Is this efficient
    enough to handle millions of records.

    Whether we could do sorting using our own algorithm?

    Regards
    Ganesh

    ----- Original Message ----- From: "Fuad Efendi"
    To:
    Sent: Wednesday, September 17, 2008 7:28 PM
    Subject: Re: Exception while doing sorting


    Increase memory.

    Lucene uses FieldCache for sorting on non-tokenized field and tries to
    maintain fields from all your 4 millions documents, even if you need
    to sort only 4000 docs.
    ==============
    http://www.tokenizer.org/bot.html


    Quoting Ganesh - yahoo :
    Hello all,

    I am have indexed more than 4 million documents. My query fetches
    300,000 hits. If i perform sorting on any field, then tomcat reports
    out of memory exception.
    Sometimes the query results may be around 1000, but sorting on any
    field might take more than 30 - 50 secs.

    I don't know what's going wrong.

    My index searcher is static object and it is getting refreshed every
    minute. JSP pages directly calls the index searcher object and > performs
    search.

    Regards
    Ganesh


    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends
    http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends
    http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 17, '08 at 12:46p
activeSep 23, '08 at 12:40p
posts11
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase