FAQ
Jochen,

Someone else recently made a similar, reasonable complaint. I agree
that this should be fixed. The fastest way to get it fixed would be to
submit a patch to lucene-dev, with a test case, etc.

Doug

Jochen Frey wrote:
Hi!

I hope this is the right forum for this post.

I was wondering if other people would consider this a bug (it might be a
feature and I am missing the point of it):

.The default IndexWriter.maxFieldLength is 10,000.
.The point of maxFieldLength is to limit memory usage.
.The current position (which is compared against maxFieldLength) is
essentially determined by the sum of the PositionIncrements of all Tokens
added to the index.

Why does this matter? If you have setPositionIncrement(1000) for sentence
ending tokens, only the first 10 sentences of your document will be indexed,
the rest will not be searchable (since position will be greater than
10,000).

Why I think this is a bug: If you skip 1000 positions, no memory is required
by the DocumentWriter for the empty 999 positions, thus not using
maxFieldLength to limit memory but simply available positions.

I suggest that there be a counter in DocumentWriter, that counts the actual
number of tokens in the postingTable (probably in
DocumentWriter.addPosition), so that maxFieldLength is compared against the
number actual entries, not the number of actual entries and the number
skipped entries.

Best,
Jochen

PS: Please let me know if this is the wrong forum for this so I'll post to
the right one next time.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 8 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedDec 12, '03 at 9:12p
activeDec 20, '03 at 12:54a
posts8
users5
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase