We are using the 'edismax' query parser for its many benefits over the
standard Lucene parser. For queries with more than 5 or 6 keywords (which
is a lot for our typical user), the recall can be very high (sometimes
matching 75% or more of the documents). This high recall, when coupled with
some custom PostFilter scoring, is hurting the query performance. I tried
varying the 'mm' (minimum match) parameter, but at values less than 100%,
the response time didn't improve much, and at 100%, there were often no
results, which is unacceptable.
So, I wrote a custom QueryComponent which rewrites the DisMax query.
Initially, the MinShouldMatch value is set to 100%. If the search returns 0
results, MinShouldMatch is set to 1 and the search is retried. This
improved the QPS throughput by about 2.5X. However, this only worked with
an unsharded index. With a sharded index, each shard returned only the
results from the first search (mm=100%). In the debugger, I could see 2
'response/ResultContext' NV-Pairs in the SolrQueryResponse object, so I
added code to remove the first pair if there were 2 pair present, which
fixed this problem. My question: is removing the extra ResultContext a
reasonable solution to this problem? It just seems a little brittle to me.