FAQ
Hi everyone,

There is a question about the document’s length and search efficiency.

Think of this situation:

Two ways to index some html pages(ignore some information): one is both
store and index the html content in lucene dictionary, the other is just
index the content . For the first method is there a efficiency problem
compare to the second besides the folder size increase?

Thanks,
Jarvis



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Karl Wettin at Sep 21, 2007 at 6:52 am

    21 sep 2007 kl. 08.23 skrev Jarvis:

    There is a question about the document’s length and search efficiency.
    Two ways to index some html pages(ignore some information): one is
    both
    store and index the html content in lucene dictionary, the other is
    just
    index the content . For the first method is there a efficiency problem
    compare to the second besides the folder size increase?
    Not sure I understand your question, but I'll give it a go.

    As far as I know, storing data in a document will not affect search
    speed. However, loading large amounts of data to a Document will of
    course consume resources. Therefor it is possible to pass a
    FieldSelector to the IndexReader when you retrieve a Document,
    allowing you to define what fields to ignore, load, lazy load, et c.

    I hope this helps.

    --
    karl
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jarvis at Sep 21, 2007 at 7:10 am
    Storing data in a document will not affect search speed.

    This is helpful .

    And another question :)

    When I make a search which will return 500000 results , it will be very
    inefficient when I want to get the document between the No.450000 to
    No.450010 or some back document . Why was it ? Or some solution ?

    Thanks,
    Jarvis .


    -----Original Message-----
    From: Karl Wettin
    Sent: Friday, September 21, 2007 2:45 PM
    To: java-user@lucene.apache.org
    Subject: Re: About the search efficiency based on document's length

    21 sep 2007 kl. 08.23 skrev Jarvis:
    There is a question about the document’s length and search efficiency.
    Two ways to index some html pages(ignore some information): one is
    both
    store and index the html content in lucene dictionary, the other is
    just
    index the content . For the first method is there a efficiency problem
    compare to the second besides the folder size increase?
    Not sure I understand your question, but I'll give it a go.

    As far as I know, storing data in a document will not affect search
    speed. However, loading large amounts of data to a Document will of
    course consume resources. Therefor it is possible to pass a
    FieldSelector to the IndexReader when you retrieve a Document,
    allowing you to define what fields to ignore, load, lazy load, et c.

    I hope this helps.

    --
    karl
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Karl Wettin at Sep 21, 2007 at 7:42 am

    21 sep 2007 kl. 09.09 skrev Jarvis:

    Storing data in a document will not affect search speed.

    This is helpful .
    Someone should probably confirm that though.
    And another question :)

    When I make a search which will return 500000 results , it will be
    very
    inefficient when I want to get the document between the No.450000 to
    No.450010 or some back document . Why was it ? Or some solution ?
    I suppose you are referring to the class Hits? It should only be an
    extra cost if you iterate a lot of documents priot to index 450000,
    as that will force it to replace the query now and then.

    It is a pretty simple peice of code. Go right ahead and take a look
    at it:

    <http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/
    apache/lucene/search/Hits.java>


    --
    karl



    Thanks,
    Jarvis .


    -----Original Message-----
    From: Karl Wettin
    Sent: Friday, September 21, 2007 2:45 PM
    To: java-user@lucene.apache.org
    Subject: Re: About the search efficiency based on document's length

    21 sep 2007 kl. 08.23 skrev Jarvis:
    There is a question about the document’s length and search
    efficiency.
    Two ways to index some html pages(ignore some information): one is
    both
    store and index the html content in lucene dictionary, the other is
    just
    index the content . For the first method is there a efficiency
    problem
    compare to the second besides the folder size increase?
    Not sure I understand your question, but I'll give it a go.

    As far as I know, storing data in a document will not affect search
    speed. However, loading large amounts of data to a Document will of
    course consume resources. Therefor it is possible to pass a
    FieldSelector to the IndexReader when you retrieve a Document,
    allowing you to define what fields to ignore, load, lazy load, et c.

    I hope this helps.

    --
    karl
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Sep 21, 2007 at 9:53 am
    This may be of interest:

    http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c62924.70068.qm@web26010.mail.ukl.yahoo.com%3e


    Cheers
    Mark


    ----- Original Message ----
    From: Karl Wettin <karl.wettin@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 September, 2007 8:35:05 AM
    Subject: Re: About the search efficiency based on document's length


    21 sep 2007 kl. 09.09 skrev Jarvis:
    Storing data in a document will not affect search speed.

    This is helpful .
    Someone should probably confirm that though.
    And another question :)

    When I make a search which will return 500000 results , it will be
    very
    inefficient when I want to get the document between the No.450000 to
    No.450010 or some back document . Why was it ? Or some solution ?
    I suppose you are referring to the class Hits? It should only be an
    extra cost if you iterate a lot of documents priot to index 450000,
    as that will force it to replace the query now and then.

    It is a pretty simple peice of code. Go right ahead and take a look
    at it:

    <http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/
    apache/lucene/search/Hits.java>


    --
    karl



    Thanks,
    Jarvis .


    -----Original Message-----
    From: Karl Wettin
    Sent: Friday, September 21, 2007 2:45 PM
    To: java-user@lucene.apache.org
    Subject: Re: About the search efficiency based on document's length

    21 sep 2007 kl. 08.23 skrev Jarvis:
    There is a question about the document’s length and search
    efficiency.
    Two ways to index some html pages(ignore some information): one is
    both
    store and index the html content in lucene dictionary, the other is
    just
    index the content . For the first method is there a efficiency
    problem
    compare to the second besides the folder size increase?
    Not sure I understand your question, but I'll give it a go.

    As far as I know, storing data in a document will not affect search
    speed. However, loading large amounts of data to a Document will of
    course consume resources. Therefor it is possible to pass a
    FieldSelector to the IndexReader when you retrieve a Document,
    allowing you to define what fields to ignore, load, lazy load, et c.

    I hope this helps.

    --
    karl
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    ___________________________________________________________
    Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
    now.
    http://uk.answers.yahoo.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 21, '07 at 6:24a
activeSep 21, '07 at 9:53a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase