FAQ
I have a collection of indices with a total of about 7,000,000
documents between them all. When I attempt to run a search over these
indices, the searching process's memory usage increases to ~1.7GB if I
allow java to use that much memory. If I don't (my normal memory cap
is 512MB), I get the following exception:

Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(TermBuffer.java:104)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:159)
at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:119)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:157)
at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:419)
at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:87)
at org.apache.lucene.search.Searcher.docFreqs(Searcher.java:178)
at org.apache.lucene.search.MultiSearcher.createWeight(MultiSearcher.java:311)
at org.apache.lucene.search.Searcher.search(Searcher.java:118)
at org.apache.lucene.search.Searcher.search(Searcher.java:97)
at SearchThread.run(SearchThread.java:54)

So, it looks like simply attempting to read the .tii files from the
indices is taking huge amounts of RAM. This is only happening on one
machine; other machines with similar data run just with 256-512MB
memory restrictions, so I'm trying to figure out what could cause the
.tii files to become so bloated. Is there anything I can do to fix
these indices? Searching is also very slow on this machine; many
machines with tens of millions of documents can do searches with
subsecond responses, whereas this machine takes many seconds to call
its HitCollector's collect function for the first time.

Any suggestions about how to slim down the .tii files on this machine
(or any workarounds) would be much appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Alex at Jun 17, 2008 at 6:19 pm
    you can invoke IndexReader.setTermInfosIndexDivisor prior to any search to control the fraction of .tii file read into memory.


    _________________________________________________________________
    聰明搜尋和瀏覽網路的免費工具列 — MSN 搜尋工具列
    http://toolbar.live.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tsuraan at Jun 17, 2008 at 6:31 pm
    That's really nice. Thanks!

    I'm guessing the answer is no, but is there an equivalent to that for
    lucene-2.2.0? Upgrading shouldn't be much of a problem anyhow (we've
    been doing it since 1.9), but out of curiosity...
    On 17/06/2008, Alex wrote:

    you can invoke IndexReader.setTermInfosIndexDivisor prior to any search to
    control the fraction of .tii file read into memory.


    _________________________________________________________________
    聰明搜尋和瀏覽網路的免費工具列 — MSN 搜尋工具列
    http://toolbar.live.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at Jun 17, 2008 at 7:38 pm
    Hi tsuraan,
    On 06/17/2008 at 2:31 PM, tsuraan wrote:
    I'm guessing the answer is no, but is there an equivalent to that for
    lucene-2.2.0?
    Not exactly equivalent, but: from the apidoc for the 2.3.2 version of setTermInfosIndexDivisor(int)
    <http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/index/IndexReader.html#setTermInfosIndexDivisor(int)>:

    For IndexReader implementations that use TermInfosReader to read terms,
    this sets the indexDivisor to subsample the number of indexed terms
    loaded into memory. This has the same effect as
    IndexWriter.setTermIndexInterval(int) except that setting must be done
    at indexing time while this setting can be set per reader. [....]

    The apidoc for the 2.2.0 version of IndexWriter.setTermIndexInterval(int):

    <http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html#setTermIndexInterval(int)>

    Steve

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 17, '08 at 6:09p
activeJun 17, '08 at 7:38p
posts4
users3
websitelucene.apache.org

3 users in discussion

Tsuraan: 2 posts Steven A Rowe: 1 post Alex: 1 post

People

Translate

site design / logo © 2022 Grokbase