FAQ
Im new to Lucene (dont they all just say that), and finding it a little
daunting. I am trying to find a way to replicate functionality we currently
have with our database searching to be able to apply it to documents too.
Most of it is just simple matching, but there is one particular part I am
having trouble with and don't know where to start....

Basically its a kind of weighted word expansion to allow for alternate
meanings/languages etc.
The expansion part is not a problem as that would be performed outside of
lucene and passed in, so just a bunch of weighted OR terms, *but* within
that expansion we only want the single best match from the list, and only
that contributes towards the final rank (as this would only form part of a
larger query).

For example - I want to be able to do a fruit search, the desired criteria
is "apple", so at its simplest we want to be able to construct something
like this :-

BESTOF( "apple"^10, "orange"^9, "pear"^8, "peach"^8, "grapes"^2 )

So ideally in this field we want to find "apple", but would accept one of
the other pre-defined alternatives as a match. If the document field
contains apples it scores higher than if it only contains oranges, but a
document with apples AND oranges only has apples taken into consideration,
we aren't looking for a cumulative score for this, just the closest match to
the original desired criteria, also term frequency (occurences within a
document) shouldnt affect the score (10 apples arent better than 1 apple).

I just dont know where to start, ... so all and any suggestions gratefully
welcomed,
Thanks

--
View this message in context: http://www.nabble.com/Match-%22best-one%22-from-list-tp17805042p17805042.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Chris Hostetter at Jun 13, 2008 at 10:25 pm
    : BESTOF( "apple"^10, "orange"^9, "pear"^8, "peach"^8, "grapes"^2 )
    :
    : So ideally in this field we want to find "apple", but would accept one of
    : the other pre-defined alternatives as a match. If the document field
    : contains apples it scores higher than if it only contains oranges, but a
    : document with apples AND oranges only has apples taken into consideration,

    take a look at DisjunctionMaxQuery. I'm 90% certain it's exactly what
    you're talking about.




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • JustJoc at Jun 16, 2008 at 2:02 pm

    hossman wrote:


    : BESTOF( "apple"^10, "orange"^9, "pear"^8, "peach"^8, "grapes"^2 )
    :
    : So ideally in this field we want to find "apple", but would accept one
    of
    : the other pre-defined alternatives as a match. If the document field
    : contains apples it scores higher than if it only contains oranges, but a
    : document with apples AND oranges only has apples taken into
    consideration,

    take a look at DisjunctionMaxQuery. I'm 90% certain it's exactly what
    you're talking about.

    -Hoss
    Perfect! after a few seemingly unsuccessful tests (all because of my own
    stupidity),
    DisjunctionMaxQuery turned out to be exactly what i'm looking for, thank you
    very much!

    But with great answers come more questions :)

    Ideally I want to be able to use the queryparser as the requirement above
    will just be
    an element of a larger query (albeit for now the most complex element). The
    format of
    the whole query is something which I don't have much control over, hence the
    preference
    to keep or somehow extend the existing parser.
    The queryparser as is, doesn't seem to support all of the query types, and
    it does
    seem that the only way to get extended functionality is to modify the
    grammar file
    QueryParser.jj, and (somehow) build a new parser? Am I right in thinking
    this, or is there
    another way to plug new query types into the existing queryparser that I
    havent found?
    I'd rather not mess with the distribution as is, but if theres no other way
    I'll have to roll
    my sleves up and learn JavaCC :D

    Thanks again!

    JJ
    --
    View this message in context: http://www.nabble.com/Match-%22best-one%22-from-list-tp17805042p17865087.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Jun 20, 2008 at 7:06 pm
    correct, adding new syntax to the parser currently requires editing the
    grammer.

    Something else you might consider is that ifyou expect "BESTOF" type
    queries to be the default behavior people want, you could just overriget
    the getBooleanQuery method of hte QUeryarser and *always* generate a
    DisMax query for any query containing multiple cuases.

    it may not be what you want ... but it's pretty easy.

    : > : BESTOF( "apple"^10, "orange"^9, "pear"^8, "peach"^8, "grapes"^2 )
    ...
    : Ideally I want to be able to use the queryparser as the requirement above
    : will just be
    : an element of a larger query (albeit for now the most complex element). The
    : format of
    : the whole query is something which I don't have much control over, hence the
    : preference
    : to keep or somehow extend the existing parser.
    : The queryparser as is, doesn't seem to support all of the query types, and
    : it does
    : seem that the only way to get extended functionality is to modify the
    : grammar file
    : QueryParser.jj, and (somehow) build a new parser? Am I right in thinking
    : this, or is there
    : another way to plug new query types into the existing queryparser that I
    : havent found?
    : I'd rather not mess with the distribution as is, but if theres no other way
    : I'll have to roll
    : my sleves up and learn JavaCC :D




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 12, '08 at 5:11p
activeJun 20, '08 at 7:06p
posts4
users2
websitelucene.apache.org

2 users in discussion

JustJoc: 2 posts Chris Hostetter: 2 posts

People

Translate

site design / logo © 2022 Grokbase