FAQ
I've got a searching problem which I know lots of other people have run
across too. We've got documents which have keywords (which we extract and
put into a 'keywords' field) and also have body text (which we put in a
'body' field.)

Lets say we search for "text retrieval". We want to find documents that
have "text retrieval" in the body OR in the keywords, but we want to weight
hits on the keywords more heavily. I can't boost the tokens in the index
base, so I have to do that through the query.

If I convert a query for phrase Q into this:
body:Q OR keywords:Q^n
does that do what I want?

How should I select the boost factor N? Are there negative consequences to
this strategy? Am I better off doing two queries and merging the results
myself?


--
Brian Goetz
Quiotix Corporation
brian@quiotix.com Tel: 650-843-1300 Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

  • Alex Murzaku at Sep 21, 2002 at 3:38 pm
    Wouldn't field boosting (the new capability added as of
    http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
    be a simpler solution? I would just set the boost for the 'keywords'
    field to something higher than one depending on your requirements. As
    for the value of the booster, I have noticed that it needs quite some
    tweaking since there doesn't appear to exist a magic formula. In a
    similar situation, I just kept modifying it until I got something that
    satisfied my users. It was funny because, in typical Monty Python style
    we ended up deciding that "the number shall be three..."

    --- Brian Goetz wrote:
    I've got a searching problem which I know lots of other people have
    run
    across too. We've got documents which have keywords (which we
    extract and
    put into a 'keywords' field) and also have body text (which we put in
    a
    'body' field.)

    Lets say we search for "text retrieval". We want to find documents
    that
    have "text retrieval" in the body OR in the keywords, but we want to
    weight
    hits on the keywords more heavily. I can't boost the tokens in the
    index
    base, so I have to do that through the query.

    If I convert a query for phrase Q into this:
    body:Q OR keywords:Q^n
    does that do what I want?

    How should I select the boost factor N? Are there negative
    consequences to
    this strategy? Am I better off doing two queries and merging the
    results
    myself?


    --
    Brian Goetz
    Quiotix Corporation
    brian@quiotix.com Tel: 650-843-1300 Fax:
    650-324-8032

    http://www.quiotix.com


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

    __________________________________________________
    Do you Yahoo!?
    New DSL Internet Access from SBC & Yahoo!
    http://sbc.yahoo.com

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Clemens Marschner at Sep 21, 2002 at 5:19 pm
    I had the same problem as Brian. But since you have to rewrite the query
    anyway to do a query in two different fields it makes no difference if you
    use term or field boosting. Performance is the same.
    For new applications I'd say field boosting is a little simpler because you
    save on some commands during the query rewriting phase. Since I already had
    written that when field boosting came up, for me there is no difference.

    Btw. I changed the query classes to allow query rewriting. I made them
    Cloneable and added setter methods for them. If there's interest I'll
    contribute the patches asap.

    --Clemens

    ----- Original Message -----
    From: "Alex Murzaku" <murzaku@yahoo.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Saturday, September 21, 2002 5:38 PM
    Subject: Re: Term boosting

    Wouldn't field boosting (the new capability added as of
    http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
    be a simpler solution?

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Cmad at Sep 21, 2002 at 5:33 pm
    Content-Transfer-Encoding: 7bit
    X-Priority: 3
    X-MSMail-Priority: Normal
    X-Mailer: Microsoft Outlook Express 6.00.2600.0000
    X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
    X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
    X-RCPT-TO: <ed@braggfamily.com>
    Status: U
    X-UIDL: 332315228

    I had the same problem as Brian. But since you have to rewrite the query
    anyway to do a query in two different fields it makes no difference if you
    use term or field boosting. Performance is the same.
    For new applications I'd say field boosting is a little simpler because you
    save on some commands during the query rewriting phase. Since I already had
    written that when field boosting came up, for me there is no difference.

    Btw. I changed the query classes to allow query rewriting. I made them
    Cloneable and added setter methods for them. If there's interest I'll
    contribute the patches asap.

    --Clemens

    ----- Original Message -----
    From: "Alex Murzaku" <murzaku@yahoo.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Saturday, September 21, 2002 5:38 PM
    Subject: Re: Term boosting

    Wouldn't field boosting (the new capability added as of
    http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
    be a simpler solution?

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:



    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 21, '02 at 9:08a
activeSep 21, '02 at 5:33p
posts4
users3
websitelucene.apache.org

3 users in discussion

Cmad: 2 posts Brian Goetz: 1 post Alex Murzaku: 1 post

People

Translate

site design / logo © 2022 Grokbase