Hi,



We have an index consisting of about 10 million items, so iteration
through records is not possible.



The index has a "description" field that is searched, it also contains
an "itemid" field, and it contains a "categorytree" field.



Each item in the index may belong to more than one category, so each
item may appear in the index more than once. However, when the index is
searched, we do not want to pull duplicate items.



Would there be a way to search on the "description" field but only
return items which have unique "itemid"?



Thanks!



Filip Stanek

Search Discussions

  • Erik Hatcher at Dec 7, 2006 at 4:35 pm
    Why not just allow documents to have multiple category values rather
    than duplicating documents?

    One document per itemid seems all you'd need to me.

    Erik
    On Dec 7, 2006, at 11:09 AM, Filip Stanek wrote:

    Hi,



    We have an index consisting of about 10 million items, so iteration
    through records is not possible.



    The index has a "description" field that is searched, it also contains
    an "itemid" field, and it contains a "categorytree" field.



    Each item in the index may belong to more than one category, so each
    item may appear in the index more than once. However, when the
    index is
    searched, we do not want to pull duplicate items.



    Would there be a way to search on the "description" field but only
    return items which have unique "itemid"?



    Thanks!



    Filip Stanek
  • Filip Stanek at Dec 7, 2006 at 5:03 pm
    I have already proposed this solution, except it would require a
    modification of the db to feed the data, a modification to the indexing
    process, and modification of the searching.

    That didn't go well with management, so I'm wondering if there's a way
    to do this with only having to modify the searcher. :)


    -----Original Message-----
    From: Erik Hatcher
    Sent: Thursday, December 07, 2006 11:35 AM
    To: lucene-net-user@incubator.apache.org
    Subject: Re: Search and Uniqueness

    Why not just allow documents to have multiple category values rather
    than duplicating documents?

    One document per itemid seems all you'd need to me.

    Erik
    On Dec 7, 2006, at 11:09 AM, Filip Stanek wrote:

    Hi,



    We have an index consisting of about 10 million items, so iteration
    through records is not possible.



    The index has a "description" field that is searched, it also contains
    an "itemid" field, and it contains a "categorytree" field.



    Each item in the index may belong to more than one category, so each
    item may appear in the index more than once. However, when the
    index is
    searched, we do not want to pull duplicate items.



    Would there be a way to search on the "description" field but only
    return items which have unique "itemid"?



    Thanks!



    Filip Stanek

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedDec 7, '06 at 4:09p
activeDec 7, '06 at 5:03p
posts3
users2
websitelucene.apache.org

2 users in discussion

Filip Stanek: 2 posts Erik Hatcher: 1 post

People

Translate

site design / logo © 2022 Grokbase