Hi,
Thanks very much for your helps!
Your point is well taken and it may cover most use cases, but it seems
to me that in principle the limit is not just for one segment: suppose
within one index we have 3 segments and each has docs close to 2^31-1,
then if I need to loop through most docs in all three segments we would
still have problems?
The use case is (rare one): if user searched a word which is in most
docs and we use pagination, and user somehow just wants to get last a
few pages (lowest rank), then we have to use a large nDocs to call search
(may go beyond Integer.INTEGER_MAX).
Best regards, Lisheng
-----Original Message-----
From: Lance Norskog
Sent: Tuesday, November 02, 2010 7:00 PM
To: java-user@lucene.apache.org; simon.willnauer@gmail.com
Subject: Re: How to handle more than Integer.MAX_VALUE documents?
You would have to control your MergePolicy so it doesn't collapse
everything back to one segment.
On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer
wrote:
On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote:2billion is a hard limit. Usually people split indexes into multiple
index long before this, and use the parallel multi reader (I think) to
read from all of the sub-indexes.
On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
wrote:
Hi,
Now lucene uses integer as document id, so it means we cannot have more
than 2^31-1 documents within one collection? Even if we use MultiSearcher
the document id is still integer so it seems this is still a problem?
This is really the limit of a segment, I think you can write you own
collector and collect documents which higher (absolute) doc ids than
INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you
should really rethink the way your search works and apply some
sharding techniques. I really haven't been up to that many docs in a
single index but I think it should work to have multiple segments with
INT_MAX documents in it since we search on segment level provided if
you collector supports it.
simon
We have been using lucene for some time and our document count is growing
rather rapidly, maybe this is a much-discussed issue already, but I did not
find the lead, any pointer would be really appreciated.
Thanks very much for helps, Lisheng
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--
Lance Norskog
goksron@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--
Lance Norskog
goksron@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org