Julien Nioche wrote:
Profiling my application indicates that a lot of times is spent for the
creation of temporary Term objects.
It does indeed look like term lookup is using a lot of your time. I
don't see the Term constructor showing up as significant in your
profile, so it looks to me like it could just the cost of parsing the
data, not the allocation/GC stuff. I've found that allocation of
temporary objects doesn't really cost much with modern garbage
collectors. The biggest cost of allocating objects is sometimes just
the constructor.

What sort of queries are you making against what sort of an index? It
looks like you're probably making large queries with lots of
low-frequency terms, in order for term lookup to be such a large factor.
You might try sorting the terms in the query. If subsequent lookups
are nearby in the TermInfo file then it won't have to scan as much.
Could that help? Also, is your index optimized? An optimized index
will drastically reduce the term lookup costs.

If all these fail, try reducing TermInfosWriter.INDEX_INTERVAL. You'll
have to re-create your indexes each time you change this constant. You
might try a value like 16. This would keep the number of terms in
memory from being too huge (1 of 16 terms), but would reduce the average
number scanned from 64 to 8, which would be substantial. Tell me how
this works. If it makes a big difference, then perhaps we should make
this parameter more easily changable.


To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 10 | next ›
Discussion Overview
groupdev @
postedDec 4, '03 at 2:39p
activeDec 5, '03 at 11:34p



site design / logo © 2021 Grokbase