FAQ
Hi guys

I'm trying to figure out how I can use probabilistic searching on a
given field within a document; I've written to the list about this
before, but haven't quite figured out what's required and, following a
little research, I think I understand what I need to do but I'd like a
clarification on this.

o We have a database of a number of documents, with fields: title,
subtitle, summary and table of contents
o By default, we pass these fields into the
TermGenerator::index_text function to generate terms and add these to a
Xapian::Document, applying a weighting where required
o We then search these fields using XapianQueryParser::parse_query
o This gives a result which searches all of the fields for the
required string

I'd like to add the ability to search JUST one on the fields (title, in
this case) so according to the API documentation, here's what I
understand I need to do:

o When creating the index, call TermGenerator::index_text with the
prefix 'S' (i.e. index_text('some text', 150, 'S')
o When querying the index, call QueryParser.add_prefix('', 'S')
before calling parse_query with the string I want to use

However, the documentation is a little unclear as to how this actually
works - specifically, how I do a search for multiple words in just the
title. For example, I have a title: "Research into Cheese in China".
With the changes I have made to the indexer, I will have terms for this
both without the S prefix and also WITH the S prefix to allow title-only
searching.

When it comes to searching, I want to be able to take the string "Cheese
in China" as user input and pass this into the QueryParser and have it
perform the search with the 'S' prefix added internally somehow. From
the documentation, it looks like I do this:

Xapian::QueryParser qp;
qp.add_prefix("", "S");
Xapian::Query query = qp.parse_query("Cheese in China");

Is this correct?

Thanks,

Justin

--
Redwire Design Limited

54 Maltings Place
169 Tower Bridge Road
London SE1 3LJ
www.redwiredesign.com

[ 020 7403 1444 ] - voice
[ 020 7378 8711 ] - fax

Search Discussions

  • Olly Betts at Jul 27, 2011 at 12:59 pm

    On Wed, Jul 27, 2011 at 01:01:18PM +0100, Justin Finkelstein wrote:
    However, the documentation is a little unclear as to how this actually
    works - specifically, how I do a search for multiple words in just the
    title. For example, I have a title: "Research into Cheese in China".
    With the changes I have made to the indexer, I will have terms for this
    both without the S prefix and also WITH the S prefix to allow title-only
    searching.

    When it comes to searching, I want to be able to take the string "Cheese
    in China" as user input and pass this into the QueryParser and have it
    perform the search with the 'S' prefix added internally somehow. From
    the documentation, it looks like I do this:

    Xapian::QueryParser qp;
    qp.add_prefix("", "S");
    Xapian::Query query = qp.parse_query("Cheese in China");
    Yes, that should work.

    You can print out the query description to check what you got:

    cout << query.get_description() << endl;

    (If you want to see what terms have been generated by indexing to
    compare, see the delve utility which is in xapian-core/examples.)

    Any suggestions on how this could be made clearer in the docs?

    Cheers,
    Olly
  • Justin Finkelstein at Jul 27, 2011 at 1:15 pm

    On Wed, 2011-07-27 at 13:59 +0100, Olly Betts wrote:


    Yes, that should work.

    Excellent, thanks. I've just tested it and it seems to work as expect.

    You can print out the query description to check what you got:

    cout << query.get_description() << endl;

    I wasn't aware you could do that, so I'll add that in to our debug
    output.

    (If you want to see what terms have been generated by indexing to
    compare, see the delve utility which is in xapian-core/examples.)

    Got it. Delve shows the terms prefixed correctly.

    Any suggestions on how this could be made clearer in the docs?

    Absolutely. The issue with the Term Prefix page is that it doesn't show
    you a full example, specifically: here's how you index it and here's how
    you search it. In the last example, what would be good would be to have
    a separated out example showing how to JUST search the title, rather
    than title OR something else.

    Would you like to write you a short segment?
  • Bruce Zhang at Jul 27, 2011 at 1:08 pm
    Hi guys,

    I wonder if Xapian support the operations like value in [value1, value2,
    value3...]?

    from Xapian document, for query, the supported operations are index,
    boolean...
    but we have requirements to query see if a value is in a list of values.

    it is already supported or we need to add ourselves?

    thanks,
    Bruce
  • Olly Betts at Jul 27, 2011 at 2:01 pm

    On Wed, Jul 27, 2011 at 09:08:56PM +0800, Bruce Zhang wrote:
    I wonder if Xapian support the operations like value in [value1, value2,
    value3...]?

    from Xapian document, for query, the supported operations are index,
    boolean...
    but we have requirements to query see if a value is in a list of values.

    it is already supported or we need to add ourselves?
    http://xapian.org/docs/apidoc/html/classXapian_1_1ValueSetMatchDecider.html

    Being a MatchDecider, this can only be applied as a top level filter.
    If you want the check as a subquery of something more complex, you'll
    need to roll your own PostingSource subclass.

    Cheers,
    Olly
  • Bruno Rezende at Jul 27, 2011 at 2:06 pm
    Hi,
    On Wed, Jul 27, 2011 at 11:01 AM, Olly Betts wrote:
    On Wed, Jul 27, 2011 at 09:08:56PM +0800, Bruce Zhang wrote:
    I wonder if Xapian support the operations like value in [value1, value2,
    value3...]?

    from Xapian document, for query, the supported operations are index,
    boolean...
    but we have requirements to query see if a value is in a list of values.

    it is already supported or we need to add ourselves?
    http://xapian.org/docs/apidoc/html/classXapian_1_1ValueSetMatchDecider.html

    Being a MatchDecider, this can only be applied as a top level filter.
    If you want the check as a subquery of something more complex, you'll
    need to roll your own PostingSource subclass.
    what would be the difference if one did:

    (value=value1 OR value=value2 OR ...)

    ?

    --
    Bruno
  • Olly Betts at Aug 2, 2011 at 12:50 pm

    On Wed, Jul 27, 2011 at 11:06:27AM -0300, Bruno Rezende wrote:
    what would be the difference if one did:

    (value=value1 OR value=value2 OR ...)
    That's not really an efficient approach with values (I'm assuming we're
    talking about "value" in the sense of a Xapian document value, i.e.
    what you'd add with Document::add_value()).

    If you want to take that approach, index "value=value1", etc as prefixed
    terms (e.g. XVALUEvalue1) and then you just have an OR of several
    terms. That'll work well if there's a sane limit on the number of terms
    you want to search for at once (hundreds is certainly OK, millions
    almost certainly isn't).

    Cheers,
    Olly
  • Bruce Zhang at Jul 28, 2011 at 5:11 am
    I am still new to Xapian, and didn't make this successfully,



    for example, my data are:

    document 1

    name=xxxx,xxx

    desc=...

    country=us, fr



    document 2:

    name=...

    desc=...

    country=jp,cn



    I want to be able to search by country, like query document by us or jp or cn



    I use scriptindex to build indexing database. then how should I wrote my index scripts?

    country : ???



    I use omega to query, then how should I create command line?

    ./omega DB=default B=XCOUNTRY???



    thanks lot for help,



    Bruce





    ???: Matt Goodall [mailto:matt.goodall at gmail.com]
    ????: Wednesday, July 27, 2011 9:43 PM
    ???: Bruce Zhang
    ??: Re: [Xapian-discuss] Does Xapian support value in [value1, value2, value3...]?



    On 27 July 2011 14:08, Bruce Zhang wrote:

    Hi guys,

    I wonder if Xapian support the operations like value in [value1, value2,
    value3...]?

    from Xapian document, for query, the supported operations are index,
    boolean...
    but we have requirements to query see if a value is in a list of values.

    it is already supported or we need to add ourselves?



    You can achieve this by adding an exact (unstemmed, etc) term for each value and then querying for any one of the values in the usual way.



    For example. if you have a document that has some general, indexable content and a list of tags, ["foo", "bar"], you could add "Xtag:foo" and "Xtag:bar" terms for the tags. Then, configure a QueryParser with a "tag"->"Xtag:" prefix and you can search for "tag:foo", "tag:bar", "tag:foo AND tag:bar", etc.



    - Matt
  • Bruce Zhang at Jul 28, 2011 at 2:50 pm
    Can any one help me on below question further? new to Xapian. many thanks,
    On Thu, Jul 28, 2011 at 1:11 PM, Bruce Zhang wrote:

    I am still new to Xapian, and didn't make this successfully,****

    ** **

    for example, my data are:****

    document 1****

    name=xxxx,xxx****

    desc=...****

    country=us, fr****

    ** **

    document 2:****

    name=...****

    desc=...****

    country=jp,cn****

    ** **

    I want to be able to search by country, like query document by us or jp or
    cn****

    ** **

    I use scriptindex to build indexing database. then how should I wrote my
    index scripts?****

    country : ???****

    ** **

    I use omega to query, then how should I create command line?****

    ./omega DB=default B=XCOUNTRY???****

    ** **

    thanks lot for help,****

    ** **

    Bruce****

    ** **

    ** **

    *???**:* Matt Goodall [mailto:matt.goodall at gmail.com]
    *????**:* Wednesday, July 27, 2011 9:43 PM
    *???**:* Bruce Zhang
    *??**:* Re: [Xapian-discuss] Does Xapian support value in [value1, value2,
    value3...]?****

    ** **

    On 27 July 2011 14:08, Bruce Zhang wrote:****

    Hi guys,

    I wonder if Xapian support the operations like value in [value1, value2,
    value3...]?

    from Xapian document, for query, the supported operations are index,
    boolean...
    but we have requirements to query see if a value is in a list of values.

    it is already supported or we need to add ourselves?****

    ** **

    You can achieve this by adding an exact (unstemmed, etc) term for each
    value and then querying for any one of the values in the usual way.****

    ** **

    For example. if you have a document that has some general, indexable
    content and a list of tags, ["foo", "bar"], you could add "Xtag:foo" and
    "Xtag:bar" terms for the tags. Then, configure a QueryParser with a
    "tag"->"Xtag:" prefix and you can search for "tag:foo", "tag:bar", "tag:foo
    AND tag:bar", etc.****

    ** **

    - Matt****

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupxapian-discuss @
categoriesxapian
postedJul 27, '11 at 12:01p
activeAug 2, '11 at 12:50p
posts9
users4
websitexapian.org
irc#xapian

People

Translate

site design / logo © 2022 Grokbase