FAQ
Hello, I've searched on previous posts on this topic but couldn't find an answer. I want to query my index (which are a number of 'flattened' Oracle tables) for some criteria, then return Hits such that there are no Documents that duplicate a particular field. In the case where table A has a one-to-many relationship to table B, I get one Document for each (A1-B1, A1-B2, A1-B3...). My index needs to have each of these records as 'B' is a searchable field in the index. However, after the query is executed, I want my resulting Hits on be unique on 'A'. I'm only returning the Oracle object ID, so once I've seen it once I don't need it again. It looks like some sort of custom Filter is in order. My fix at the moment is to run the query, then store unique id's in a Map to build another query that will return singletons on field 'A'. I could skip this step if there was a way to remove documents from Hits (I didn't see a way). Has anyone written a filter that does this? Are there others using Lucene to mimic a relational DB? I've got a complex SQL search that joins (most outer) 40 some tables. Query performance is important, and the tables are relatively static. I find the ID's of the objects that match the users' criteria, then go to the DB to instantiate them. Any comments are appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Doug Cutting at Oct 1, 2004 at 7:24 pm

    Timm, Andy (ETW) wrote:
    Hello, I've searched on previous posts on this topic but couldn't find an answer. I want to query my index (which are a number of 'flattened' Oracle tables) for some criteria, then return Hits such that there are no Documents that duplicate a particular field. In the case where table A has a one-to-many relationship to table B, I get one Document for each (A1-B1, A1-B2, A1-B3...). My index needs to have each of these records as 'B' is a searchable field in the index. However, after the query is executed, I want my resulting Hits on be unique on 'A'. I'm only returning the Oracle object ID, so once I've seen it once I don't need it again. It looks like some sort of custom Filter is in order.
    I'd suggest a HitCollector that uses a FieldCache of the "A" values to
    check for duplicates, and collect only a the best document id for each
    value of "A". This would use a bit of RAM, but be very fast.

    http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html
    http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/FieldCache.html

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 1, '04 at 7:01p
activeOct 1, '04 at 7:24p
posts2
users2
websitelucene.apache.org

2 users in discussion

Doug Cutting: 1 post Timm, Andy (ETW): 1 post

People

Translate

site design / logo © 2022 Grokbase