FAQ
Hi Guys,

Why does the use synonyms decrease relevancy of the returned results?

Running query 'Xapian::Query((Zserendip:(pos=1) AND Zjacket:(pos=2)))'
3 results found
Estimated matches: 3
ID 39896 100% Serendipity Jacket mens
ID 39947 98% Serendipity Hiking Jacket womens
ID 39964 90% Serendipity Jacket womens

But with synonyms the relevancy is decreased.

Running query 'Xapian::Query((Zserendip:(pos=1) AND (Zjacket:(pos=2)
OR Zcoat:(pos=2) OR Zparka:(pos=2))))'
3 results found
Estimated matches: 3
ID 39947 72% Serendipity Hiking Jacket womens
ID 39896 64% Serendipity Jacket mens
ID 39964 58% Serendipity Jacket womens

Obviously this is because more terms are involved, but is this
correct, or can it be disabled so that the synonyms count as just one
term with regards to relevancy?

Normally I set a floor of a certain amount of relevancy before I
present results to the user, but since synonyms decrease the
relevancy I may need to change that, but doing that could lead to
poor results being returned when it may make more sense to say no
results were found.

Thanks,

Rusty

P.S. Olly, Wellington is a wonderful city that I've had the pleasure
to visit many times now, I think you'll find it quite nice.

Search Discussions

  • James Aylett at Jan 3, 2008 at 4:18 pm

    On Wed, Jan 02, 2008 at 11:51:47PM -0700, Rusty Conover wrote:

    Why does the use synonyms decrease relevancy of the returned results?
    Because the synonyms probably won't match documents that have the
    original terms (in the general case), so there's a lower proportion of
    terms in the query matching those documents.

    You can tweak the weighting scheme to ignore the within-query
    frequency of a term when generating weights (and hence percentage
    relevancy) in the MSet: you want to set k3 to 0. This may have a
    larger effect on the relevance calculations that you expect (and may
    well change document ordering in the MSet), but may be worth playing
    with.

    I suppose in theory we could have an operator that acts as OP_OR but
    returns the highest BM25 termweight or something (so the synonyms act
    as an expansion inside the query, rather than outside as at the
    moment), but I have no idea if that would be generally useful, or
    practical with respect to any of the optimisations we do.

    J

    --
    /--------------------------------------------------------------------------\
    James Aylett xapian.org
    james@tartarus.org uncertaintydivision.org
  • Olly Betts at Jan 3, 2008 at 5:21 pm

    On Thu, Jan 03, 2008 at 04:18:15PM +0000, James Aylett wrote:
    I suppose in theory we could have an operator that acts as OP_OR but
    returns the highest BM25 termweight or something (so the synonyms act
    as an expansion inside the query, rather than outside as at the
    moment), but I have no idea if that would be generally useful, or
    practical with respect to any of the optimisations we do.
    Richard is working on a new OP_SYNONYM operator on SVN branch opsynonym:

    http://svn.xapian.org/branches/opsynonym/

    See also:

    http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?idP

    OP_SYNONYM is like OP_OR except that the statistics are calculated as if
    all the sub-postlists were postings of the same term (some
    approximations are required to achieve this without the computations
    being prohibitively expensive).

    All being well this will be in 1.1.0.

    Cheers,
    Olly

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupxapian-discuss @
categoriesxapian
postedJan 3, '08 at 6:51a
activeJan 3, '08 at 5:21p
posts3
users3
websitexapian.org
irc#xapian

People

Translate

site design / logo © 2022 Grokbase