Considering you have that number of documents for each class, you may think
of splitting the index (as I believe that the total number would be high).
What exactly would you mean by 'get the index' from the result? Do you mean
that you would want to fetch the class as well (without actually fetching
If thats the case you might just collect documents from different indexes
into different holders so you would know what belongs to which index.

On Mon, Jun 9, 2008 at 12:43 PM, Sascha Fahl wrote:

Hi and thanks a lot,

I'll take a look at the HitCollector thing.
I think I will have around 500.000 - 1.000.000 docs per class.
So having different indeces is a good idea I think. Especially because
half of the requests will point to only one document class and not to all
Is there a way to get the index from a result? So I would not need to have
this one
special field for identifying the document classes.


Am 08.06.2008 um 17:49 schrieb Erick Erickson:

well, why not just include a field that identifies each document's class?
Then, to search over
all classes, just don't mention the class field.

When you *do* want to restrict by class, include "AND class:blah" in your

This assumes you don't have a huge number of classes and it wouldn't be a
to have a clause like AND (class:cl1 OR class:cl2 OR class:cl3......).
said, a
clause containing 100s of class:blah isn't a problem.

And if you want to get really fancy, create a Filter for each class and
can just
combine the filters appropriately when you want to query and restrict to

Limiting the results to the top 10/20 per class is tricker, but see
for a way to intervene in the hit selection process. You could keep a
that counted by class and reject each doc that came through the
class after it reached your limit. Beware of doing a Reader.get() on each
hit, you'd have to do some work with TermDocs/TermEnum.

All that said, I haven't really kept up on the faceting that's been talked
in the archive, so you may want to look at the searchable mail archive and
see what's up with that.

How big is your index? That is, how many documents and how many fields
(approx) are you talking about here? That'll influence whether you would
be better off keeping them all in one index or splitting them up. But if
can keep them in a single index, your maintenance will be *much* easier.


On Sat, Jun 7, 2008 at 10:34 AM, Sascha Fahl <sascha.fahl@googlemail.com>

I am quite new to the lucene scene and I need your help :-)
There are several document classes. Lets say documents from class A, B,
D and E. What I need is the following:

1) I want to search over all classes together. So the query should hit
results from all different classes - ideally it is possible
to limit the results from each class to lets say 10 or 20 results per

2) I want to search over all classes seperately.

My first idea was to have one index per class and use a MultiReader for
searching over all classes and use an IndexReader for
searching over the classes seperately. Right now I have 3 questions:
1. Is that a good idea?
2. Is there a way to identify the index a result comes from?
3. Is it possible to limit the results to a number of hits from each

Thank you,

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupjava-user @
postedJun 8, '08 at 11:20a
activeJun 9, '08 at 7:58a



site design / logo © 2022 Grokbase