FAQ
Hi All, I hope someone can offer some advice.
I want to extend lucene to search in a particular way(if it cant already):

I want to index docs, each with file containing several terms something
like:
doc1=>myfield:a
doc2=>myfield:a,b
doc3=>myfield:a,b,c
doc4=>myfield:a,b,c,d
so far nothing new.

I want to query for matching docs such that a query something like
myfield:(a or b) should only return docs if the doc itself is FULLY
matched.
ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
So the rules are its only a match if the termcount for each doc is <=the
termcount of the query(for that field) AND ALL the terms in the doc were
matched

a few more examples just to clarify:
myfield:(a or b or d) would match doc4
myfield:(a or b or c or d) would match ALL the docs here (this one works
anyway but only because it uses all the terms that exist)
myfield:(a) would match doc1

order is not important (but might be a nice have)

can anyone tell me if its possible to make lucene do this, and perhaps
offer a starting point?
thanks
Jason.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Grant Ingersoll at Nov 6, 2009 at 1:11 am

    On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote:

    Hi All, I hope someone can offer some advice.
    I want to extend lucene to search in a particular way(if it cant
    already):

    I want to index docs, each with file containing several terms
    something like:
    doc1=>myfield:a
    doc2=>myfield:a,b
    doc3=>myfield:a,b,c
    doc4=>myfield:a,b,c,d
    so far nothing new.

    I want to query for matching docs such that a query something like
    myfield:(a or b) should only return docs if the doc itself is FULLY
    matched.
    ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
    So the rules are its only a match if the termcount for each doc is
    <=the termcount of the query(for that field) AND ALL the terms in
    the doc were matched

    a few more examples just to clarify:
    myfield:(a or b or d) would match doc4
    myfield:(a or b or c or d) would match ALL the docs here (this one
    works anyway but only because it uses all the terms that exist)
    myfield:(a) would match doc1

    order is not important (but might be a nice have)

    can anyone tell me if its possible to make lucene do this, and
    perhaps offer a starting point?
    Would overriding http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29
    help?


    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com/

    Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
    using Solr/Lucene:
    http://www.lucidimagination.com/search


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Eacott at Nov 6, 2009 at 8:52 am
    looks like it, thanks!
    but if I index multiple copies of the same value, ie: myfield:a,a,b
    and search with myfield:(a or b)
    or perhaps myfield:(a or a or b)

    will I be able to tell the difference? is this apparently duplicate data
    kept as part of the query? (I'd like to be able to do this too)

    Cheers
    Jason.




    Grant Ingersoll wrote:
    On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote:

    Hi All, I hope someone can offer some advice.
    I want to extend lucene to search in a particular way(if it cant
    already):

    I want to index docs, each with file containing several terms
    something like:
    doc1=>myfield:a
    doc2=>myfield:a,b
    doc3=>myfield:a,b,c
    doc4=>myfield:a,b,c,d
    so far nothing new.

    I want to query for matching docs such that a query something like
    myfield:(a or b) should only return docs if the doc itself is FULLY
    matched.
    ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
    So the rules are its only a match if the termcount for each doc is
    <=the termcount of the query(for that field) AND ALL the terms in the
    doc were matched

    a few more examples just to clarify:
    myfield:(a or b or d) would match doc4
    myfield:(a or b or c or d) would match ALL the docs here (this one
    works anyway but only because it uses all the terms that exist)
    myfield:(a) would match doc1

    order is not important (but might be a nice have)

    can anyone tell me if its possible to make lucene do this, and perhaps
    offer a starting point?
    Would overriding
    http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29 help?



    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com/

    Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
    Solr/Lucene:
    http://www.lucidimagination.com/search


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 6, '09 at 12:32a
activeNov 6, '09 at 8:52a
posts3
users2
websitelucene.apache.org

2 users in discussion

Jason Eacott: 2 posts Grant Ingersoll: 1 post

People

Translate

site design / logo © 2022 Grokbase