We are using the 'edismax' query parser for its many benefits over the
standard Lucene parser. For queries with more than 5 or 6 keywords (which
is a lot for our typical user), the recall can be very high (sometimes
matching 75% or more of the documents). This high recall, when coupled with
some custom PostFilter scoring, is hurting the query performance. I tried
varying the 'mm' (minimum match) parameter, but at values less than 100%,
the response time didn't improve much, and at 100%, there were often no
results, which is unacceptable.

So, I wrote a custom QueryComponent which rewrites the DisMax query.
Initially, the MinShouldMatch value is set to 100%. If the search returns 0
results, MinShouldMatch is set to 1 and the search is retried. This
improved the QPS throughput by about 2.5X. However, this only worked with
an unsharded index. With a sharded index, each shard returned only the
results from the first search (mm=100%). In the debugger, I could see 2
'response/ResultContext' NV-Pairs in the SolrQueryResponse object, so I
added code to remove the first pair if there were 2 pair present, which
fixed this problem. My question: is removing the extra ResultContext a
reasonable solution to this problem? It just seems a little brittle to me.


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupsolr-user @
postedJun 10, '14 at 6:10p
activeJun 10, '14 at 6:10p

1 user in discussion

Peter Keegan: 1 post



site design / logo © 2021 Grokbase