FAQ
Doug,

Thanks for the fix. For a single query at a time, this does indeed
solve the problem. The query performance comes back up to the level it
was before (a change from 17,000ms to 800ms with the test query we are
using). As you mentioned, however, performance does indeed suffer when
more than one query is being performed at a time. With several
simultanious queries, the timings degrades back to around 17s.

Thanks,

Scott

-----Original Message-----
From: Doug Cutting
Sent: Monday, December 03, 2001 8:49 AM
To: 'Lucene Developers List'
Subject: RE: Query performance with DateFilter


I have a guess about what the problem is. Lucene used to do a better job of
re-using TermFreq input streams. I've attached new versions of a few files
which should restore the earlier behavior. Try running with these.
This isn't actually a very good fix, since it uses a single element cache
(as was done before). For example, performance will suffer again if more
than one thread uses a DateFilter at the same time. A scalable fix would
not be much harder to implement. So if this fixes your problem, I will
check in the more scalable version.
Doug
-----Original Message-----
From: Scott Stanley
Sent: Friday, November 30, 2001 2:58 PM
To: lucene-dev
Subject: Query performance with DateFilter


I have found that searching with date filtering is much slower since
shifting from Lucene 1.1b to lucene 1.2 rc2 (basically from
com.lucene
to org.apache.lucene).

With 1.1b, search time was : 700ms
With 1.2rc2 : 11,000 ms!
(15 times slower)
(with 50,000 files indexed)

However, searching with no filtering seems to be a bit faster with
1.2rc2.

To be sure that the DateFilter was responsible for the performance
hit, I tested this:

DateFilter df = new DateFilter("DOC_DATE", 1000087883595L,
1009087883595L)
BitSet bs = df.bits(IndexReader.open("/index");

With Lucene 1.1b : 668 ms
With Lucene 1.2 rc2 : 9000 ms

Running this under JProbe, I noticed that the performance difference
was coming from the call to SegmentTermDocs.next(). This method
call
seems to be much slower because InputStream.readByte() is slower...

I noticed that InputStream.refill() and
InputStream.readInternal() take
much more time. I finally narrowed down to
RandomAccessFile.read(byte[], int, int) which is called
around 50 times
more often in 1.2 RC2 than in the earlier version.

Is there an issue with the way FSDirectory handles
bufferization of the
bytes read from the index files? Is all of this related to the Thread
Safety fix? I guess the bottom line is, is there anything we can do
to bring the performance back up with the DateFilter?

Scott

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

  • Doug Cutting at Dec 7, 2001 at 6:35 pm

    From: Scott Stanley

    Thanks for the fix. For a single query at a time, this does indeed
    solve the problem. The query performance comes back up to
    the level it
    was before (a change from 17,000ms to 800ms with the test query we are
    using). Great!
    As you mentioned, however, performance does indeed
    suffer when
    more than one query is being performed at a time. With several
    simultanious queries, the timings degrades back to around 17s.
    Please find attached a fix that should also work for simulataneous queries.
    If it works for you I will check it in. Note that, in addition to changes
    to a bunch of lucene.index files, this has a new version of
    lucene.search.DateFilter.

    Tell me how it goes...

    Doug

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedDec 7, '01 at 5:50p
activeDec 7, '01 at 6:35p
posts2
users2
websitelucene.apache.org

2 users in discussion

Scott Stanley: 1 post Doug Cutting: 1 post

People

Translate

site design / logo © 2021 Grokbase