FAQ
Hello All,

I am kind of new to Lucene, and having problem filtering search results.

Background:

My Indexed documents have multiple bills and each bill has multiple
versions.

Each version of the same bill has a different bill Version Id, but the same
bill Id. In most likely case, the text in different versions varies only
slightly. The text for all these versions indexed.

Problem:

Lets say, for a particular search term, if it is present in one version of
the bill, in most cases it is present in all other versions too. So the
users have come up with a requirement stating that they would like to see
only the latest bill version for the same bill having this search term.

So when I perform a search for a particular word, I might get different
versions of the same bill, but have to display only the latest record for
that bill. I did some research and understood that filters could be used to
implement this kind of requirement, however I am not sure how to proceed.

Any hints on how to implement this would be highly appreciated.

Thanks.
--
View this message in context: http://n3.nabble.com/Problem-with-search-tp717137p717137.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Shai Erera at Apr 14, 2010 at 8:55 am
    I don't know if that proposal is the most efficient one, but you can try it.
    In general, what you're looking for is a GROUP BY Bill-Id feature and then
    select the most recent one, right? Only you don't need all the Versions of
    the same Bill, and therefore you can hold the most recent Version-Id only.
    What you can do is write a Collector which for each received document checks
    its Bill-Id and Version-Id. It keeps a Map Bill-Id -> Version-Id and for
    every incoming doc checks the map:
    1) If the Bill-Id hasn't been seen yet, stores it in the map.
    2) If it has been seen, compares the Version-Id of the incoming doc to the
    one in the map and replaces them if needed.

    By storing the Bill-Id and Version-Id in the FieldCache you can make that
    Collector work very fast. Also, you can apply some optimization to the
    process by e.g. not checking the map if the document has no chance in being
    selected for the top-K requested docs (for e.g. a low score) etc.

    I've outlined a general approach .. other, perhaps more efficient ones, may
    exist.

    Another alternative is to run your search, collecting top-NK, where N is a
    factor/multiplier you activate on K. After the search is done, you filter
    out the unneeded docs w/ "old" Version-Id. If you choose your N smartly,
    you'll do it just once, not re-running the query in case it filtered out too
    many docs.

    Hope this helps,
    Shai
    On Tue, Apr 13, 2010 at 11:59 PM, Sirish Vadala wrote:


    Hello All,

    I am kind of new to Lucene, and having problem filtering search results.

    Background:

    My Indexed documents have multiple bills and each bill has multiple
    versions.

    Each version of the same bill has a different bill Version Id, but the same
    bill Id. In most likely case, the text in different versions varies only
    slightly. The text for all these versions indexed.

    Problem:

    Lets say, for a particular search term, if it is present in one version of
    the bill, in most cases it is present in all other versions too. So the
    users have come up with a requirement stating that they would like to see
    only the latest bill version for the same bill having this search term.

    So when I perform a search for a particular word, I might get different
    versions of the same bill, but have to display only the latest record for
    that bill. I did some research and understood that filters could be used to
    implement this kind of requirement, however I am not sure how to proceed.

    Any hints on how to implement this would be highly appreciated.

    Thanks.
    --
    View this message in context:
    http://n3.nabble.com/Problem-with-search-tp717137p717137.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sirish Vadala at Apr 14, 2010 at 9:07 pm
    Hmmm... Seems like a lot of work to be done. I will try these options and
    update.

    Thanks a lot.

    Best.
    --
    View this message in context: http://n3.nabble.com/Problem-with-search-tp717137p719604.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 13, '10 at 8:59p
activeApr 14, '10 at 9:07p
posts3
users2
websitelucene.apache.org

2 users in discussion

Sirish Vadala: 2 posts Shai Erera: 1 post

People

Translate

site design / logo © 2022 Grokbase