Grokbase Groups Lucene dev June 2010
FAQ
hi all,
I want to implement a query that taking position and terms'
relative positions into consideration. It only supports multiterm
queries like boolean or query.
But I want to consider term postion and terms relative positions.
e.g. there are two docs
doc1 apache lucene is a open source project
doc2 apache is a http server and lucene ...
if user search "apache lucene" doc1 will win because apache lucene
appear closer than doc2
e.g.
doc1 some other text apache lucene is a open source project
doc2 apache lucene is a open source project some other text
doc2 wins because "apache lucene" appear at the first position

I think I can imitate boolean query and just integrate position
information into boolean or query. but I am not familiar with lucene's
implementaion. anyone could show me some directions? thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Jun 4, 2010 at 3:36 pm
    This is perhaps best discussed on the java-user list instead. Here's
    some thoughts...
    On Jun 4, 2010, at 2:36 AM, Li Li wrote:

    hi all,
    I want to implement a query that taking position and terms'
    relative positions into consideration. It only supports multiterm
    queries like boolean or query.
    But I want to consider term postion and terms relative positions.
    e.g. there are two docs
    doc1 apache lucene is a open source project
    doc2 apache is a http server and lucene ...
    if user search "apache lucene" doc1 will win because apache lucene
    appear closer than doc2
    A PhraseQuery will do that. It's common-place to OR in a (sloppy)
    phrase query for the users query in order to get proximity to boost
    things. No custom query needed to accomplish this.
    e.g.
    doc1 some other text apache lucene is a open source project
    doc2 apache lucene is a open source project some other text
    doc2 wins because "apache lucene" appear at the first position
    And here, SpanFirstQuery is your friend. So OR'ing a PhraseQuery and
    a SpanFirstQuery (with nested SpanNearQuery, or whatever is
    appropriate) seems to accomplish your goals.

    Give those a try and report back if things still aren't quite what
    you're after.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Li Li at Jun 4, 2010 at 3:52 pm
    thank you. But I don't think SpanFirst query is my need. Because I
    want to get all documents that contains any term. But give the one
    whose position is top a boost. The same is term's relative posistions.
    e.g.
    doc1 apache lucene is a open source project
    doc2 apache is a http server and many many other words ...
    lucene ...
    if user searchs apache lucene, I want both the docs are presented to
    user. But doc1 gets a higher score. I don't want to use a phrase query
    because it's slow(compare to boolean query) and set slop to 10000
    seems strange.
    e.g.
    doc1 some other text ... apache lucene
    is a open source project
    doc2 apache lucene is a open source project some other text

    SpanFirstQuery is not my need. if user search apache, I want to show
    both docs but give higher score to doc2 because the matched terms'
    position less than doc1. If I use SpanFirstQuery SpanFirstQuery sfq =
    new SpanFirstQuery(apache, 100); I will fail to find docs which
    contains apache whose position is larger than 100.


    2010/6/4 Erik Hatcher <erik.hatcher@gmail.com>:
    This is perhaps best discussed on the java-user list instead.  Here's some
    thoughts...
    On Jun 4, 2010, at 2:36 AM, Li Li wrote:

    hi all,
    I want to implement a query that taking position and terms'
    relative positions into consideration. It only supports multiterm
    queries like boolean or query.
    But I want to consider term postion and terms relative positions.
    e.g. there are two docs
    doc1         apache lucene is a open source project
    doc2         apache is a http server and lucene ...
    if user search "apache lucene"  doc1 will win because apache lucene
    appear closer than doc2
    A PhraseQuery will do that.  It's common-place to OR in a (sloppy) phrase
    query for the users query in order to get proximity to boost things.  No
    custom query needed to accomplish this.
    e.g.
    doc1        some other text apache lucene is a open source project
    doc2         apache lucene is a open source project some other text
    doc2 wins because "apache lucene" appear at the first position
    And here, SpanFirstQuery is your friend.  So OR'ing a PhraseQuery and a
    SpanFirstQuery (with nested SpanNearQuery, or whatever is appropriate) seems
    to accomplish your goals.

    Give those a try and report back if things still aren't quite what you're
    after.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Erik Hatcher at Jun 4, 2010 at 7:00 pm
    That's why I recommended building a boolean OR'd query out of this.
    The normal query OR the phrase query OR the span first query.

    Erik

    On Jun 4, 2010, at 11:52 AM, Li Li wrote:

    thank you. But I don't think SpanFirst query is my need. Because I
    want to get all documents that contains any term. But give the one
    whose position is top a boost. The same is term's relative posistions.
    e.g.
    doc1 apache lucene is a open source project
    doc2 apache is a http server and many many other words ...
    lucene ...
    if user searchs apache lucene, I want both the docs are presented to
    user. But doc1 gets a higher score. I don't want to use a phrase query
    because it's slow(compare to boolean query) and set slop to 10000
    seems strange.
    e.g.
    doc1 some other text ... apache lucene
    is a open source project
    doc2 apache lucene is a open source project some other text

    SpanFirstQuery is not my need. if user search apache, I want to show
    both docs but give higher score to doc2 because the matched terms'
    position less than doc1. If I use SpanFirstQuery SpanFirstQuery sfq =
    new SpanFirstQuery(apache, 100); I will fail to find docs which
    contains apache whose position is larger than 100.


    2010/6/4 Erik Hatcher <erik.hatcher@gmail.com>:
    This is perhaps best discussed on the java-user list instead.
    Here's some
    thoughts...
    On Jun 4, 2010, at 2:36 AM, Li Li wrote:

    hi all,
    I want to implement a query that taking position and terms'
    relative positions into consideration. It only supports multiterm
    queries like boolean or query.
    But I want to consider term postion and terms relative positions.
    e.g. there are two docs
    doc1 apache lucene is a open source project
    doc2 apache is a http server and lucene ...
    if user search "apache lucene" doc1 will win because apache lucene
    appear closer than doc2
    A PhraseQuery will do that. It's common-place to OR in a (sloppy)
    phrase
    query for the users query in order to get proximity to boost
    things. No
    custom query needed to accomplish this.
    e.g.
    doc1 some other text apache lucene is a open source project
    doc2 apache lucene is a open source project some other text
    doc2 wins because "apache lucene" appear at the first position
    And here, SpanFirstQuery is your friend. So OR'ing a PhraseQuery
    and a
    SpanFirstQuery (with nested SpanNearQuery, or whatever is
    appropriate) seems
    to accomplish your goals.

    Give those a try and report back if things still aren't quite what
    you're
    after.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Li Li at Jun 5, 2010 at 2:14 am
    it may work. but will it be very slow and I want to score function
    like 1/(1+pos),SpanFirst seems give the save score.

    2010/6/5 Erik Hatcher <erik.hatcher@gmail.com>:
    That's why I recommended building a boolean OR'd query out of this. The
    normal query OR the phrase query OR the span first query.

    Erik

    On Jun 4, 2010, at 11:52 AM, Li Li wrote:

    thank you. But I don't think SpanFirst query is my need. Because I
    want to get all documents that contains any term. But give the one
    whose position is top a boost. The same is term's relative posistions.
    e.g.
    doc1 apache lucene is a open source project
    doc2 apache is a http server and many many other words ...
    lucene ...
    if user searchs apache lucene, I want both the docs are presented to
    user. But doc1 gets a higher score. I don't want to use a phrase query
    because it's slow(compare to boolean query) and set slop to 10000
    seems strange.
    e.g.
    doc1 some other text ... apache lucene
    is a open source project
    doc2 apache lucene is a open source project some other text

    SpanFirstQuery is not my need. if user search apache, I want to show
    both docs but give higher score to doc2 because the matched terms'
    position less than doc1. If I use SpanFirstQuery SpanFirstQuery sfq =
    new SpanFirstQuery(apache, 100); I will fail to find docs which
    contains apache whose position is larger than 100.


    2010/6/4 Erik Hatcher <erik.hatcher@gmail.com>:
    This is perhaps best discussed on the java-user list instead. Here's
    some
    thoughts...
    On Jun 4, 2010, at 2:36 AM, Li Li wrote:

    hi all,
    I want to implement a query that taking position and terms'
    relative positions into consideration. It only supports multiterm
    queries like boolean or query.
    But I want to consider term postion and terms relative positions.
    e.g. there are two docs
    doc1 apache lucene is a open source project
    doc2 apache is a http server and lucene ...
    if user search "apache lucene" doc1 will win because apache lucene
    appear closer than doc2
    A PhraseQuery will do that. It's common-place to OR in a (sloppy) phrase
    query for the users query in order to get proximity to boost things. No
    custom query needed to accomplish this.
    e.g.
    doc1 some other text apache lucene is a open source project
    doc2 apache lucene is a open source project some other text
    doc2 wins because "apache lucene" appear at the first position
    And here, SpanFirstQuery is your friend. So OR'ing a PhraseQuery and a
    SpanFirstQuery (with nested SpanNearQuery, or whatever is appropriate)
    seems
    to accomplish your goals.

    Give those a try and report back if things still aren't quite what you're
    after.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 4, '10 at 6:37a
activeJun 5, '10 at 2:14a
posts5
users2
websitelucene.apache.org

2 users in discussion

Li Li: 3 posts Erik Hatcher: 2 posts

People

Translate

site design / logo © 2021 Grokbase