FAQ
I saw a similar - but not identical - question asked earlier in the archive
but no answer.

I have 2 (or more) indexes of web url's with intersecting hits. The url's
are defined as keys in case that makes a difference. I am using
MultiSearcher to search multiple indexes, but I get hits repeated if they
exist in both indexes. I am trying to get a set of all unique url's among
the indexes.

Can MultiSearcher be told not to repeat hits with duplicate "key" values? Or
does it already do this indicating my Doc's are not defined properly? As a
last resort, can someone recommend an efficient method to convert the Vector
of hitDocs into a Set after the fact?

FYI - as a test, I used MultiSearcher to search one index and it found 45
hits. I then gave MultiSearher 2 Searchers pointing to the same index, and
it found 90 hits. From this I concluded that MultiSearher merely adds hits
to the Vector rather than looking for duplicates. Is that right?

TIA,

Steve B.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Otis Gospodnetic at Jun 29, 2004 at 1:29 pm
    Steve,

    Answer to your last question: that's right.
    Lucene does not know what field you use as your PK (unique document
    identifier), so it cannot merge hits and remove duplicates.

    As for converting Vector to Set.... HashSet set = new HashSet();
    set.addAll(YourVectorInstance);
    Does that work for you? Vector isA Collection.

    Otis



    --- steve wrote:
    I saw a similar - but not identical - question asked earlier in the
    archive
    but no answer.

    I have 2 (or more) indexes of web url's with intersecting hits. The
    url's
    are defined as keys in case that makes a difference. I am using
    MultiSearcher to search multiple indexes, but I get hits repeated if
    they
    exist in both indexes. I am trying to get a set of all unique url's
    among
    the indexes.

    Can MultiSearcher be told not to repeat hits with duplicate "key"
    values? Or
    does it already do this indicating my Doc's are not defined properly?
    As a
    last resort, can someone recommend an efficient method to convert the
    Vector
    of hitDocs into a Set after the fact?

    FYI - as a test, I used MultiSearcher to search one index and it
    found 45
    hits. I then gave MultiSearher 2 Searchers pointing to the same
    index, and
    it found 90 hits. From this I concluded that MultiSearher merely adds
    hits
    to the Vector rather than looking for duplicates. Is that right?

    TIA,

    Steve B.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Steve at Jun 29, 2004 at 2:07 pm
    Thanks for the quick reply.

    I have played with this just a bit, but it seems there is no quick way to
    get the Hits.hitsDocs vector in one swoop. The Hits class is final so I
    cannot extend it, and there are no public accessors to get/set the hitDocs.
    It seems I will need to loop through Hits.doc(int) and extract the hits one
    at a time. Do you know another way to do this?


    ----- Original Message -----
    From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Tuesday, June 29, 2004 9:29 AM
    Subject: Re: How to get unique Hits using Multisearcher

    Steve,

    Answer to your last question: that's right.
    Lucene does not know what field you use as your PK (unique document
    identifier), so it cannot merge hits and remove duplicates.

    As for converting Vector to Set.... HashSet set = new HashSet();
    set.addAll(YourVectorInstance);
    Does that work for you? Vector isA Collection.

    Otis



    --- steve wrote:
    I saw a similar - but not identical - question asked earlier in the
    archive
    but no answer.

    I have 2 (or more) indexes of web url's with intersecting hits. The
    url's
    are defined as keys in case that makes a difference. I am using
    MultiSearcher to search multiple indexes, but I get hits repeated if
    they
    exist in both indexes. I am trying to get a set of all unique url's
    among
    the indexes.

    Can MultiSearcher be told not to repeat hits with duplicate "key"
    values? Or
    does it already do this indicating my Doc's are not defined properly?
    As a
    last resort, can someone recommend an efficient method to convert the
    Vector
    of hitDocs into a Set after the fact?

    FYI - as a test, I used MultiSearcher to search one index and it
    found 45
    hits. I then gave MultiSearher 2 Searchers pointing to the same
    index, and
    it found 90 hits. From this I concluded that MultiSearher merely adds
    hits
    to the Vector rather than looking for duplicates. Is that right?

    TIA,

    Steve B.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Otis Gospodnetic at Jun 29, 2004 at 2:23 pm
    Ah, you wanted all Hits at once, without iterating, I see.
    You;ll have to iterate either using Hits object and its methods, or by
    executing the search and passing HitCollector instance to the search
    method, depending on what makes sense in your application.

    Otis

    --- steve wrote:
    Thanks for the quick reply.

    I have played with this just a bit, but it seems there is no quick
    way to
    get the Hits.hitsDocs vector in one swoop. The Hits class is final so
    I
    cannot extend it, and there are no public accessors to get/set the
    hitDocs.
    It seems I will need to loop through Hits.doc(int) and extract the
    hits one
    at a time. Do you know another way to do this?


    ----- Original Message -----
    From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Tuesday, June 29, 2004 9:29 AM
    Subject: Re: How to get unique Hits using Multisearcher

    Steve,

    Answer to your last question: that's right.
    Lucene does not know what field you use as your PK (unique document
    identifier), so it cannot merge hits and remove duplicates.

    As for converting Vector to Set.... HashSet set = new HashSet();
    set.addAll(YourVectorInstance);
    Does that work for you? Vector isA Collection.

    Otis



    --- steve wrote:
    I saw a similar - but not identical - question asked earlier in
    the
    archive
    but no answer.

    I have 2 (or more) indexes of web url's with intersecting hits.
    The
    url's
    are defined as keys in case that makes a difference. I am using
    MultiSearcher to search multiple indexes, but I get hits repeated
    if
    they
    exist in both indexes. I am trying to get a set of all unique
    url's
    among
    the indexes.

    Can MultiSearcher be told not to repeat hits with duplicate "key"
    values? Or
    does it already do this indicating my Doc's are not defined
    properly?
    As a
    last resort, can someone recommend an efficient method to convert
    the
    Vector
    of hitDocs into a Set after the fact?

    FYI - as a test, I used MultiSearcher to search one index and it
    found 45
    hits. I then gave MultiSearher 2 Searchers pointing to the same
    index, and
    it found 90 hits. From this I concluded that MultiSearher merely
    adds
    hits
    to the Vector rather than looking for duplicates. Is that right?

    TIA,

    Steve B.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 29, '04 at 1:04p
activeJun 29, '04 at 2:23p
posts4
users2
websitelucene.apache.org

2 users in discussion

Steve: 2 posts Otis Gospodnetic: 2 posts

People

Translate

site design / logo © 2022 Grokbase