FAQ
hi
hi

how to achive Query with terms' weight to a Boolean matching?
i think my question is unclear/misleading... example

"I am eating an apple while using apple computer"

My xapian query:
apple(weight:4)
computer(weight:3)

instead of getting a weight of 11 of this doc (2Xapple 1Xcomputer), how to
make the matching in boolean way so i will get a weight of 7 for this
document?

Is it possible to add "penalty" in a query?
docA = "How to eat an apple while using apple computer"
docB = "I am eating an apple while using apple computer"

Query(apple:4,computer:3,how:-1) << is it possible to penalty / lost weight
when doc has the term "how" so the docB ranks heigher?

how heavy will it be if i add a value of "hash(md5 HTML<title> X
websiteDomain)" to each document, and then use this key to collapse
duplicated-title-in-domain using set_collapse_key? is it way too heavy?

Thanks and really appreciated
Andrey K.

## Search Discussions

•  at Nov 2, 2007 at 5:43 am ⇧

On Thu, Nov 01, 2007 at 09:49:39PM -0700, Andrey wrote:
"I am eating an apple while using apple computer"

My xapian query:
apple(weight:4)
computer(weight:3)

instead of getting a weight of 11 of this doc (2Xapple 1Xcomputer), how to
make the matching in boolean way so i will get a weight of 7 for this
document?
If I understand correctly, you want to ignore the wdf of terms - you can
do that by setting BM25's k1 parameter to 0:

http://www.xapian.org/docs/apidoc/html/classXapian_1_1BM25Weight.html#_details

That's not what I'd call "boolean" weighting though, so perhaps I'm
misunderstanding you...
Is it possible to add "penalty" in a query?
docA = "How to eat an apple while using apple computer"
docB = "I am eating an apple while using apple computer"

Query(apple:4,computer:3,how:-1) << is it possible to penalty / lost weight
when doc has the term "how" so the docB ranks heigher?
I don't think that's currently possible without indexing each document
which doesn't contain "how" with a "XNOThow" term, or something similar.

Several of the matcher's optimisations rely on the current fact that
terms can't contribute a negative amount, so I think the only way to do
this would be to add something to all documents which don't contain
"how". It would probably be possible to implement a query operator
which did that.

You can completely exclude documents which contain a particular term
though, using OP_AND_NOT.
how heavy will it be if i add a value of "hash(md5 HTML<title> X
websiteDomain)" to each document, and then use this key to collapse
duplicated-title-in-domain using set_collapse_key? is it way too heavy?
How much overhead it incurs will depend on the nature of your data (for
example if the sites you are indexing each have millions of pages with
each title, the cost will probably be higher as you'll be rejecting a
large number of matches).

It's not an obviously ridiculous idea in general, so all I can really
suggest is that you try it on your data and see if it performs
acceptably.

Cheers,
Olly

## Related Discussions

Discussion Overview
 group xapian-discuss categories xapian posted Nov 2, '07 at 4:49a active Nov 2, '07 at 5:43a posts 2 users 2 website xapian.org irc #xapian

### 2 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase