Why do this at all? I have a hard time understanding what benefit this
is to the _user_.
And even returning 5% is risky. I mean what happens for a query of
*:*? For a corpus of 100M docs that's still 5M documents which is
Sure, you say, well I'll cap it at XXX docs. The principle still holds though.
Users usually don't want to deal with very many docs at a time.
If you must do this for some kind of reporting or something, just fire
two queries. The first has a rows of 0 and the second has a rows=5%
of what was returned the first time.
Under the covers, you really can't do this without writing some sort
of custom collector. Solr (Well, Lucene) uses the
rows parameter as the dimension of the list where the most relevant
docs are stored, and replaced as "better" docs some along. You can't
know how many doc are going to be found before you score them all.
So how would you know what 5% was when you start? You'd have to
write something that would keep 20X whatever your max was set
to and then grow it as necessary.... but by that time you _might_ have
already thrown away docs that should be in the expanded list....... Or
you'd have to keep _all_ the results which would be very expensive usually.
All in all, I think a 2-query solution is much simpler than hacking into
your own collector, not to mention far more efficient in the general case.
On Wed, Jun 8, 2016 at 10:26 PM, Binoy Dalal wrote:
I don't think you can do such a thing ootb with solr but this is pretty
easy to achieve using a custom search component.
Just write some custom code which will limit your resultset and plug it
into your request handler as the last component.
On Thu, 9 Jun 2016, 08:53 Prasanna Josium, wrote:
I use a dse stack with has solr4.10.
I want to control the number of rows from result set as a percent of the
max hit 'numFound' or 'maxScore' for a query.
1) for a query 'foo', if I get 100 hits and if I want to get the top 5%
percent (say rows=5%). Then I get only 5 rows.
for a query 'bar', if I get 1000 hits, I want to get the top 5%
(rows=5%).Then I get top 50 rows.
2) for a query 'foo' if the maxScore is 4.5, I want to get say all records
within 10% of maxScore ..I want to get all records whose score is between
4.5 to 4.0(this could be the any number of records)
in other words, the returned set is a percent of hits, instead of a
static row count.
Is there a way to do this readily or via some custom implementation?