FAQ
Thank you for your answer Doug

Profiling my application indicates that a lot of times is spent for the
creation of temporary Term objects.

This is at least true for PhraseQueries weighting as shown on the profiling
figures below :

.41.2% - 473240 ms - 2802 inv.
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer
..40.4% - 464202 ms - 7440 inv.
org.apache.lucene.index.IndexReader.termPositions
...40.1% - 460378 ms - 7440 inv.
org.apache.lucene.index.SegmentTermDocs.seek
....40.0% - 459297 ms - 7440 inv.
org.apache.lucene.index.TermInfosReader.get
.....39.1% - 448370 ms - 7440 inv.
org.apache.lucene.index.TermInfosReader.scanEnum
.......34.4% - 394578 ms - 484790 inv.
org.apache.lucene.index.SegmentTermEnum.next
.........25.8% - 296435 ms - 484790 inv.
org.apache.lucene.index.SegmentTermEnum.readTerm
.........3.5% - 40565 ms - 969580 inv.
org.apache.lucene.store.InputStream.readVLong
.........1.8% - 21147 ms - 484790 inv.
org.apache.lucene.store.InputStream.readVInt

This is only method time, it doesn't take into account the time required for
garbage collecting all those temporary objects.

I'll test other applications I made to confirm this.
Scott,
I tried NIODirectory and provided some benchmarks for it on the list with my
apps. It improves a little bit the overall performances but it could be
interesting if we could choose the files we want to map into memory.

----- Original Message -----
From: "Doug Cutting" <cutting@lucene.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Thursday, December 04, 2003 7:28 PM
Subject: Re: suggestion for a CustomDirectory

Julien Nioche wrote:
However in most cases the
application would be faster because :
- tree access to the Term (this is only the case for the Terms in the
.tii)
- no need to create up to 127 temporary Term objects (with creation of
Strings and so on....)
- limit garbage collecting
The .tii is already read into memory when the index is opened. So the
only savings would be the creation of (on average) 64 temporary Term
objects per query. Do you have any evidence that this is a substantial
part of the computation? I'd be surprised if it was. To find out, you
could write a program which compares the time it takes to call docFreq()
on a set of terms (allocating the 64 temporary Terms) to what it takes
to perform queries (doing the rest of the work). I'll bet that the
first is substantially faster: most of the work of executing a query is
processing the .frq and .prx files. These are bigger than the RAM on
your machine, and so cannot be cached. Thus you'll always be doing some
disk i/o, which will likely dominate real performance.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 10 | next ›
Discussion Overview
groupjava-dev @
categorieslucene
postedDec 4, '03 at 2:39p
activeDec 5, '03 at 11:34p
posts10
users5
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase