On 05/05/2011 11:13, Ahmet Arslan wrote:
Yes correct, but I have looked and the list of
optimizations before. What was clear from profiling was that
it wasnt the searching part that was slow (a query run on
the same index with only a few matching docs ran super fast)
the slowness only occurs when there are loads of matching
docs, and spends most of its time in scorer that is why I
was trying to remove the poor matches.
Okey all clear. Can you give us some example query strings where there are loads of matching?
Do you use stop word filter? Could it be case described as
"As you approach the upper limits of a single machine,
extremely frequent terms (called stop words) can become very
expensive in the wrong query. If part of a top level BooleanQuery, a
SHOULD clause that appears in every document will cause a match and
score for every document in your index."http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
We used to use the default stop word list but have no stop words because
Lucene is used to match very short fields relating to Musicdata such as
artist or album name, therefore the default stop words really need to be
included to get good matches, for example how would you match the artist
'The The' otherwise, so use of a stop word word list is not an option.
If people construct good queries thee is no problem, but the trouble is
that many users just OR everything they are looking for because they
don't want a good match rejected because just one term fails, but the
problem is there are a number of very popular terms, for example the
tnum:(6) qdur:(189) artist:(tama) track:(ibata) tracks:(10) release:(the
global rhythm september 2002)
will match any song that is on an album with 10 tracks, any song which
is trackno 6 on an album, and any release containg the word 'the' , when
really what they are looking for is the song 'ibata' by artist 'tama',
This matches over a million documents (songs) , but doesn't match any
well, because the song 'ibata' by 'tama' isnt actually in the index !
So I dont think the query is very good but I cannot force users to
submit better queries, but I want to protect the server by reducing the
time these kind of query take (upto 1 second as opposed to the more
usual 100 milliseconds) and I hope that forcing x number of terms to
match would do that.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org