FAQ
What is the limit of Lucene: # of docs per index?



If RangeFilter.Bits(), for example, it initializes a bitset to the size of
maxDoc from the indexReader. I wonder what happen if the # of docs is huge,
say MaxInt (4G in 32bit or 2^63 in 64 bit)?

Search Discussions

  • Karl Wettin at May 8, 2008 at 6:00 pm

    Michael Siu skrev:
    What is the limit of Lucene: # of docs per index?
    Integer.MAX_VALUE

    Multiple indices joined in a single MultiWhatNot is still limited to
    that number.

    If RangeFilter.Bits(), for example, it initializes a bitset to the size of
    maxDoc from the indexReader. I wonder what happen if the # of docs is huge,
    say MaxInt (4G in 32bit or 2^63 in 64 bit)?
    ArrayIndexOutOfBoundsException ?

    It should not be that difficult to upgrade int to longs, but it is a
    rather large job.

    How many documents do you have? You might want to consider alternative
    ways to represent your corpus in the index so it takes less documents.


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael Siu at May 8, 2008 at 6:13 pm
    The # of documents that we are going to index could be potentially more than
    2G. So I guess I have to split the index file into multiple of files with
    each contain up to 2G files. Any other suggestion?

    Thanks.

    -----Original Message-----
    From: Karl Wettin
    Sent: Thursday, May 08, 2008 11:00 AM
    To: java-user@lucene.apache.org
    Subject: Re: Limit of Lucene

    Michael Siu skrev:
    What is the limit of Lucene: # of docs per index?
    Integer.MAX_VALUE

    Multiple indices joined in a single MultiWhatNot is still limited to
    that number.

    If RangeFilter.Bits(), for example, it initializes a bitset to the size of
    maxDoc from the indexReader. I wonder what happen if the # of docs is huge,
    say MaxInt (4G in 32bit or 2^63 in 64 bit)?
    ArrayIndexOutOfBoundsException ?

    It should not be that difficult to upgrade int to longs, but it is a
    rather large job.

    How many documents do you have? You might want to consider alternative
    ways to represent your corpus in the index so it takes less documents.


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Grant Ingersoll at May 8, 2008 at 6:29 pm
    In practice, you will more than likely have to distribute your index
    across multiple nodes once you get somewhere in the range of tens of
    millions of documents, but it all depends on your hardware, documents,
    throughput needs, etc.

    On May 8, 2008, at 2:13 PM, Michael Siu wrote:

    The # of documents that we are going to index could be potentially
    more than
    2G. So I guess I have to split the index file into multiple of files
    with
    each contain up to 2G files. Any other suggestion?

    Thanks.

    -----Original Message-----
    From: Karl Wettin
    Sent: Thursday, May 08, 2008 11:00 AM
    To: java-user@lucene.apache.org
    Subject: Re: Limit of Lucene

    Michael Siu skrev:
    What is the limit of Lucene: # of docs per index?
    Integer.MAX_VALUE

    Multiple indices joined in a single MultiWhatNot is still limited to
    that number.

    If RangeFilter.Bits(), for example, it initializes a bitset to the
    size of
    maxDoc from the indexReader. I wonder what happen if the # of docs
    is huge,
    say MaxInt (4G in 32bit or 2^63 in 64 bit)?
    ArrayIndexOutOfBoundsException ?

    It should not be that difficult to upgrade int to longs, but it is a
    rather large job.

    How many documents do you have? You might want to consider alternative
    ways to represent your corpus in the index so it takes less documents.


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --------------------------
    Grant Ingersoll

    Lucene Helpful Hints:
    http://wiki.apache.org/lucene-java/BasicsOfPerformance
    http://wiki.apache.org/lucene-java/LuceneFAQ







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 8, '08 at 5:23p
activeMay 8, '08 at 6:29p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase