FAQ
Hi

Nearing completion on a new version of a lucene search component for the
http://www.musicbrainz.org music database and having a problem with
performance. There are a number of indexes each built from data in a
database, there is one index for albums, another for artists, and
another for tracks (individual songs on albums). Testing search
performance on every index they all perform well except for the tracks
index which is considerably slower than before.

The code is similar, though there are changes sin all areas, and I have
checked with a profiler that it is the lucene search that is taking the
time. It is spending most of its time in BooleanScorer2, you can see a
breakdown at http://imagebin.org/151368, is this normal ? I profiled a
well performing index and this method didn't even show up.

One thought I had is that some of the test queries are a little silly
IMO and contain alot of OR queries both on terms within field and over
multiple fields. We only ever return 25 results, but I guess Lucene has
to sort all the results and in some cases there are over a million
matches, could this be the reason ? However these queries still perform
better with the old code base and Im using TopScoreDocsCollecter so I
can't see how to improve it, and all queries not just the silly ones
appear slower.

Code Extract:
public Results searchLucene(String query, int offset, int limit)
throws IOException, ParseException {
IndexSearcher searcher = getIndexSearcher();
QueryParser parser = getParser();
TopScoreDocCollector collector =
TopScoreDocCollector.create(offset + limit, true);
searcher.search(parser.parse(query), collector);
searchCount.incrementAndGet();
return processResults(searcher, collector, offset);
}
private Results processResults(IndexSearcher searcher,
TopScoreDocCollector collector, int offset) throws IOException {
Results results = new Results();
TopDocs topDocs = collector.topDocs();
results.offset = offset;
results.totalHits = topDocs.totalHits;
ScoreDoc docs[] = topDocs.scoreDocs;
float maxScore = topDocs.getMaxScore();
for (int i = offset; i < docs.length; i++) {
Result result = new Result();
result.score = docs[i].score / maxScore;
result.doc = new MbDocument(searcher.doc(docs[i].doc));
results.results.add(result);
}
return results;
}

I'm using Lucene 3.0.3, old code base is using 2.9.2, any help
appreciated on what the problem could be, on on how I should proceed.

thanks Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Paul Taylor at May 3, 2011 at 8:06 am

    On 02/05/2011 23:36, Paul Taylor wrote:
    Hi

    Nearing completion on a new version of a lucene search component for
    the http://www.musicbrainz.org music database and having a problem
    with performance. There are a number of indexes each built from data
    in a database, there is one index for albums, another for artists, and
    another for tracks (individual songs on albums). Testing search
    performance on every index they all perform well except for the tracks
    index which is considerably slower than before.

    The code is similar, though there are changes sin all areas, and I
    have checked with a profiler that it is the lucene search that is
    taking the time. It is spending most of its time in BooleanScorer2,
    you can see a breakdown at http://imagebin.org/151368, is this normal
    ? I profiled a well performing index and this method didn't even show up.

    One thought I had is that some of the test queries are a little silly
    IMO and contain alot of OR queries both on terms within field and over
    multiple fields. We only ever return 25 results, but I guess Lucene
    has to sort all the results and in some cases there are over a million
    matches, could this be the reason ? However these queries still
    perform better with the old code base and Im using
    TopScoreDocsCollecter so I can't see how to improve it, and all
    queries not just the silly ones appear slower.

    Code Extract:
    public Results searchLucene(String query, int offset, int limit)
    throws IOException, ParseException {
    IndexSearcher searcher = getIndexSearcher();
    QueryParser parser = getParser();
    TopScoreDocCollector collector =
    TopScoreDocCollector.create(offset + limit, true);
    searcher.search(parser.parse(query), collector);
    searchCount.incrementAndGet();
    return processResults(searcher, collector, offset);
    }
    private Results processResults(IndexSearcher searcher,
    TopScoreDocCollector collector, int offset) throws IOException {
    Results results = new Results();
    TopDocs topDocs = collector.topDocs();
    results.offset = offset;
    results.totalHits = topDocs.totalHits;
    ScoreDoc docs[] = topDocs.scoreDocs;
    float maxScore = topDocs.getMaxScore();
    for (int i = offset; i < docs.length; i++) {
    Result result = new Result();
    result.score = docs[i].score / maxScore;
    result.doc = new MbDocument(searcher.doc(docs[i].doc));
    results.results.add(result);
    }
    return results;
    }

    I'm using Lucene 3.0.3, old code base is using 2.9.2, any help
    appreciated on what the problem could be, on on how I should proceed.

    thanks Paul
    Rereading the javadocs I realised I was calling thw wrong search method,
    but having made this change it only gived me a 10% improvement.

    public Results searchLucene(String query, int offset, int limit) throws
    IOException, ParseException {
    IndexSearcher searcher = getIndexSearcher();
    QueryParser parser = getParser();
    TopDocs topdocs = searcher.search(parser.parse(query), offset +
    limit);
    searchCount.incrementAndGet();
    return processResults(searcher, topdocs, offset);
    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 2, '11 at 10:37p
activeMay 3, '11 at 8:06a
posts2
users1
websitelucene.apache.org

1 user in discussion

Paul Taylor: 2 posts

People

Translate

site design / logo © 2022 Grokbase