FAQ
Hi,



I am facing the problem



The line in the loop is going very slow giving me a performance hit

for (int i = 0; i < hits.length; ++i) {



int docId = hits[i].doc;

Document d = searcher.doc(docId); //problem

}



How can I improve this. Please give me an example of the improved code



Thanks,

Suman





Ps :

In one of post Erick said ..



this line is really suspicious:

Document document = this.indexReader.document(doc)
From the Javadoc for HitCollector.collect:
Note: This is called in an inner search loop. For good search performance,
implementations of this method should not call
Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
Searcher.html#doc%28int%29>or
IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
/index/IndexReader.html#document%28int%29>on
every document number encountered. Doing so can slow searches by an
order
of magnitude or more.

Search Discussions

  • Anshum at Mar 10, 2011 at 9:41 am
    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Document.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexSearcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexSearcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com

    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani wrote:



    Hi,



    I am facing the problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId); //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search performance,
    implementations of this method should not call

    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or

    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.








  • Suman.holani at Mar 10, 2011 at 9:48 am
    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId); //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search performance,
    implementations of this method should not call

    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or

    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Anshum at Mar 10, 2011 at 10:22 am
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com

    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani wrote:

    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)


    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId); //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call

    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or

    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Erick Erickson at Mar 10, 2011 at 2:16 pm
    If you're loading 100,000 documents, you can expect it to be slow. If
    you're loading 10 documents, it should be quite fast... So how big is
    hits.length?

    And what version of Lucene are you using? The Hits object has been
    deprecated for quite some time I believe.....

    The problem here is that you're loading the entire result set. This is
    rarely the right thing to do, which is why paging is used normally.

    Why do you need to load the entire result set? That seems to be the
    crux of the issue.

    Best
    Erick
    On Thu, Mar 10, 2011 at 5:22 AM, Anshum wrote:
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com

    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani wrote:

    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)


    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the  problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId);  //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call

    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or

    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Suman.holani at Mar 11, 2011 at 6:10 am
    Hello Erick,

    Hits .length is 1800

    Version is lucene 3.0.3

    I need the entire result set . As I ll be fetching records which satisfy the
    search conditions. And will be validating them wrt to current counts ,
    scheduling the successful resultset.Selecting one of them on basis of random
    scheduling.

    I cannot take page wise result. As that will lead to starvation of documents
    which are at end.

    I cannot add validating current counts onto index as it is changing v
    frequently. So not possible to change entire index everytime for that.

    Let me know of some soln .


    Let say there are 5 fields in indexing . A, B C ,D ,E

    when I search 1000 records are fetched
    I wanna use A, D for the time being for validating the records wrt counts.
    Note:fields B,C,E is nt required now, bt I am fetching it and storing in a
    list

    A,D in list are given to another process for validation
    After validation 700 records are in list
    Of wchich one of the record displayed after scheduling with entire fields A,
    B,C,D,E

    Regards,
    Suman





    -----Original Message-----
    From: Erick Erickson
    Sent: Thursday, March 10, 2011 7:46 PM
    To: [email protected]
    Subject: Re: document object

    If you're loading 100,000 documents, you can expect it to be slow. If
    you're loading 10 documents, it should be quite fast... So how big is
    hits.length?

    And what version of Lucene are you using? The Hits object has been
    deprecated for quite some time I believe.....

    The problem here is that you're loading the entire result set. This is
    rarely the right thing to do, which is why paging is used normally.

    Why do you need to load the entire result set? That seems to be the
    crux of the issue.

    Best
    Erick
    On Thu, Mar 10, 2011 at 5:22 AM, Anshum wrote:
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani
    wrote:
    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to
    initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective
    field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the  problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId);  //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call
    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or
    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Suman.holani at Mar 11, 2011 at 10:15 am
    Hi,

    In Java I am using RAM based index


    For a small case
    for (int i = 0; i < hits.length; ++i) {
    // Document D = searcher.doc(hits[i].doc);
    }

    Found 37 hits.
    0 total milliseconds


    ==================================


    In case I uncomment the lines
    for (int i = 0; i < hits.length; ++i) {
    Document D = searcher.doc(hits[i].doc);
    }
    Found 37 hits.
    17 total milliseconds


    How to improve this. If I am doing something wrong here.

    The same index search in clucene takes jst less than 1 ms that too , when it
    is File based indexes.

    Regards,
    Suman



    -----Original Message-----
    From: suman.holani
    Sent: Friday, March 11, 2011 11:35 AM
    To: '[email protected]'
    Subject: RE: document object

    Hello Erick,

    Hits .length is 1800

    Version is lucene 3.0.3

    I need the entire result set . As I ll be fetching records which satisfy the
    search conditions. And will be validating them wrt to current counts ,
    scheduling the successful resultset.Selecting one of them on basis of random
    scheduling.

    I cannot take page wise result. As that will lead to starvation of documents
    which are at end.

    I cannot add validating current counts onto index as it is changing v
    frequently. So not possible to change entire index everytime for that.

    Let me know of some soln .


    Let say there are 5 fields in indexing . A, B C ,D ,E

    when I search 1000 records are fetched
    I wanna use A, D for the time being for validating the records wrt counts.
    Note:fields B,C,E is nt required now, bt I am fetching it and storing in a
    list

    A,D in list are given to another process for validation
    After validation 700 records are in list
    Of wchich one of the record displayed after scheduling with entire fields A,
    B,C,D,E

    Regards,
    Suman





    -----Original Message-----
    From: Erick Erickson
    Sent: Thursday, March 10, 2011 7:46 PM
    To: [email protected]
    Subject: Re: document object

    If you're loading 100,000 documents, you can expect it to be slow. If
    you're loading 10 documents, it should be quite fast... So how big is
    hits.length?

    And what version of Lucene are you using? The Hits object has been
    deprecated for quite some time I believe.....

    The problem here is that you're loading the entire result set. This is
    rarely the right thing to do, which is why paging is used normally.

    Why do you need to load the entire result set? That seems to be the
    crux of the issue.

    Best
    Erick
    On Thu, Mar 10, 2011 at 5:22 AM, Anshum wrote:
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani
    wrote:
    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to
    initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective
    field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the  problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId);  //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call
    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or
    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ian Lea at Mar 11, 2011 at 10:17 am
    If I've read this right you are saying that you need to look at fields
    A and D for 1000 docs but B, C and E for just one. If that is right
    then lazy loading/FieldSelector will help.

    But even loading just A and D for 1000 hits will inevitably take time.
    As already suggested, you could look at caching to speed that up.


    --
    Ian.



    On Fri, Mar 11, 2011 at 6:04 AM, suman.holani wrote:
    Hello Erick,

    Hits .length is 1800

    Version is lucene 3.0.3

    I need the entire result set . As I ll be fetching records which satisfy the
    search conditions. And will be validating them wrt to current counts ,
    scheduling the successful resultset.Selecting one of them on basis of random
    scheduling.

    I cannot take page wise result. As that will lead to starvation of documents
    which are at end.

    I cannot add validating current counts onto index as it is changing v
    frequently. So not possible to change entire index everytime for that.

    Let me know of some soln .


    Let say there are 5 fields in indexing . A, B C ,D ,E

    when I search 1000 records are fetched
    I wanna use A, D for the time being for validating the records wrt counts.
    Note:fields B,C,E is nt required now, bt I am fetching it and storing in a
    list

    A,D in list are given to another process for validation
    After validation 700 records are in list
    Of wchich one of the record displayed after scheduling with entire fields A,
    B,C,D,E

    Regards,
    Suman





    -----Original Message-----
    From: Erick Erickson
    Sent: Thursday, March 10, 2011 7:46 PM
    To: [email protected]
    Subject: Re: document object

    If you're loading 100,000 documents, you can expect it to be slow. If
    you're loading 10 documents, it should be quite fast... So how big is
    hits.length?

    And what version of Lucene are you using? The Hits object has been
    deprecated for quite some time I believe.....

    The problem here is that you're loading the entire result set. This is
    rarely the right thing to do, which is why paging is used normally.

    Why do you need to load the entire result set? That seems to be the
    crux of the issue.

    Best
    Erick
    On Thu, Mar 10, 2011 at 5:22 AM, Anshum wrote:
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani
    wrote:
    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to
    initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective
    field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the  problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId);  //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call
    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or
    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ian Lea at Mar 11, 2011 at 10:46 am
    You've been told several times that the searcher.doc() call can be
    expensive and given suggestions as to how to improve it. You have
    provided no evidence that you have tried any of these suggestions.

    I know nothing about clucene and you have not provided any evidence as
    to whether your comparison is fair or not. Maybe you should just use
    clucene.


    --
    Ian.

    On Fri, Mar 11, 2011 at 10:09 AM, suman.holani wrote:
    Hi,

    In Java I am using RAM based index


    For a small case
    for (int i = 0; i < hits.length; ++i) {
    //    Document D = searcher.doc(hits[i].doc);
    }

    Found 37 hits.
    0 total milliseconds


    ==================================


    In case I uncomment the lines
    for (int i = 0; i < hits.length; ++i) {
    Document D = searcher.doc(hits[i].doc);
    }
    Found 37 hits.
    17 total milliseconds


    How to improve this. If I am doing something wrong here.

    The same index search in clucene takes jst less than 1 ms that too , when it
    is File based indexes.

    Regards,
    Suman



    -----Original Message-----
    From: suman.holani
    Sent: Friday, March 11, 2011 11:35 AM
    To: '[email protected]'
    Subject: RE: document object

    Hello Erick,

    Hits .length is 1800

    Version is lucene 3.0.3

    I need the entire result set . As I ll be fetching records which satisfy the
    search conditions. And will be validating them wrt to current counts ,
    scheduling the successful resultset.Selecting one of them on basis of random
    scheduling.

    I cannot take page wise result. As that will lead to starvation of documents
    which are at end.

    I cannot add validating current counts onto index as it is changing v
    frequently. So not possible to change entire index everytime for that.

    Let me know of some soln .


    Let say there are 5 fields in indexing . A, B C ,D ,E

    when I search 1000 records are fetched
    I wanna use A, D for the time being for validating the records wrt counts.
    Note:fields B,C,E is nt required now, bt I am fetching it and storing in a
    list

    A,D in list are given to another process for validation
    After validation 700 records are in list
    Of wchich one of the record displayed after scheduling with entire fields A,
    B,C,D,E

    Regards,
    Suman





    -----Original Message-----
    From: Erick Erickson
    Sent: Thursday, March 10, 2011 7:46 PM
    To: [email protected]
    Subject: Re: document object

    If you're loading 100,000 documents, you can expect it to be slow. If
    you're loading 10 documents, it should be quite fast... So how big is
    hits.length?

    And what version of Lucene are you using? The Hits object has been
    deprecated for quite some time I believe.....

    The problem here is that you're loading the entire result set. This is
    rarely the right thing to do, which is why paging is used normally.

    Why do you need to load the entire result set? That seems to be the
    crux of the issue.

    Best
    Erick
    On Thu, Mar 10, 2011 at 5:22 AM, Anshum wrote:
    Depends on your data. I know that's a vague answer but that's the point.
    What you could do is use FieldCache if memory and data let you do so. Would
    it?

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:12 PM, suman.holani
    wrote:
    Hi Anshum,

    Thanks for prompt reply.

    I am only storing the fields in index , which I want to get/fetch after
    search.

    The area I am not sure is when we call searcher/reader class to
    initialize
    Document object is heavy?
    Can we use something else in that place, which doesnot needs to load all
    doc
    again.

    Regards,
    Suman


    -----Original Message-----
    From: Anshum
    Sent: Thursday, March 10, 2011 3:11 PM
    To: [email protected]
    Subject: Re: document object

    Hi Suman,
    Do you need to load/use all fields that you have stored in the index? If
    that's not the case I'd suggest you to use the


    public Document
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/document/Doc
    ument.html>
    *doc*(int i, FieldSelector fieldSelector)

    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexS
    earcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)
    <
    http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Index
    Searcher.html#doc(int,
    org.apache.lucene.document.FieldSelector)>function .
    This should help you. Also, otherwise if you're using very selective
    field
    which may be used though a FieldCache it'd be a nice thing to do.

    Hope that helps.
    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    On Thu, Mar 10, 2011 at 3:01 PM, suman.holani
    wrote:

    Hi,



    I am facing the  problem



    The line in the loop is going very slow giving me a performance hit

    for (int i = 0; i < hits.length; ++i) {



    int docId = hits[i].doc;

    Document d = searcher.doc(docId);  //problem

    }



    How can I improve this. Please give me an example of the improved code



    Thanks,

    Suman





    Ps :

    In one of post Erick said ..



    this line is really suspicious:

    Document document = this.indexReader.document(doc)

    From the Javadoc for HitCollector.collect:

    Note: This is called in an inner search loop. For good search
    performance,
    implementations of this method should not call
    Searcher.doc(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/
    Searcher.html#doc%28int%29>or
    IndexReader.document(int)<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene
    /index/IndexReader.html#document%28int%29>on
    every document number encountered. Doing so can slow searches by an
    order
    of magnitude or more.










    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 10, '11 at 9:36a
activeMar 11, '11 at 10:46a
posts9
users4
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase