FAQ
Hi,

We have an application which has "main" records, like, for example, a
person, which may have several "dependent" records such as an
attachment, an address, an email, a phone number, a note left by an
user.

I want the results to look like:

john AND smart

----------------------

2 results:

- [John] Smith
Attachment: I am very [smart].
Note: He is not as [smart] as he said.

- Peter P?rez
Phone number: 555-[john]-is-[smart]


What I've done is:

People and other main records are indexed as one document which contains
all the its data and all the data of the dependent records. They also
have the Igrouped prefixed term. In the data of the document I keep the
type and the id of the record.

Notes, attachments, etc are also indexed separately one document at a
time. Each of them also has an IdependentOf<type><id> term.

To do a search:

1. I do a query with the "Igrouped" prefixed term and whatever the query
parser gave me.

2. I collect all the ids and types of the previous results in a list
which looks like this:
[IdependentOfperson1, IdependentOfcompany5, ...]

3. I remove all operators from the search the user entered (AND OR,
parenthesis) and get all the search terms in a list:
[john, smart]

4. I build a second query which has a negated Igrouped all the
dependentof terms are ORed together and all terms are ANDed with that,
example:
(IdependentOfperson1 OR IdependentOfcompany5) AND john AND smart AND NOT
Igrouped

5. I search for it and use the dependent record's data to locate the
master records and group them together in the results.

6. I use my relational db to get the full text of the results which
matched the search and use a simple algorithm which tries to look for
words which are written close to each other and cut show that text to
the user (which looks more or less like google's results).

Considerations:

1. I index all things twice, which affects the weight all terms get. To
compensate this I index not-grouped terms with a weight of 0.

2. I guess the index is much bigger than it should.

3. I could probably have two separate dbs for grouped and ungrouped
items.

4. I probably should have used "collapse keys" but I think that they are
essentially filters and don't really achieve what I want (which would be
to logically consider all the terms or some documents as being part of
one virtual bigger document). Therefore searching for john AND smart
wouldn't have found "John Smith" since the words are in separate
records.

Am I doing something wrong? Is there any better way to do it?

. A .

Search Discussions

  • Olly Betts at Aug 19, 2008 at 11:18 pm

    On Mon, Aug 18, 2008 at 09:15:17PM -0300, Agust?n wrote:
    4. I probably should have used "collapse keys" but I think that they are
    essentially filters and don't really achieve what I want (which would be
    to logically consider all the terms or some documents as being part of
    one virtual bigger document).
    Then just use that "virtual bigger document" as your "Xapian document".

    Cheers,
    Olly
  • Agustín at Aug 20, 2008 at 4:04 am
    Hi,

    But I still need to identify which "smaller documents" match the search.
    That's the reason I added the documents twice: all together and
    separatly.

    Also, I would love to know which terms matched the documents I searched
    (to avoid having to search again inside the document).

    Is there any way to get which terms from that query matched a returned
    document efficiently?

    Thanks!

    . A .
    On Wed, 2008-08-20 at 00:18 +0100, Olly Betts wrote:
    On Mon, Aug 18, 2008 at 09:15:17PM -0300, Agust?n wrote:
    4. I probably should have used "collapse keys" but I think that they are
    essentially filters and don't really achieve what I want (which would be
    to logically consider all the terms or some documents as being part of
    one virtual bigger document).
    Then just use that "virtual bigger document" as your "Xapian document".

    Cheers,
    Olly
  • Olly Betts at Aug 20, 2008 at 4:14 am

    On Wed, Aug 20, 2008 at 01:04:19AM -0300, Agust?n wrote:
    Is there any way to get which terms from that query matched a returned
    document efficiently?
    This should be pretty efficient:

    Xapian::Enquire::get_matching_terms_begin()

    If you're finding that's too slow, a testcase would be useful.

    Cheers,
    Olly
  • Agustín at Aug 20, 2008 at 4:07 am
    Or perhaps I could instead somehow put some info on the terms of the
    "virtual bigger document" indicating the origin. Could that be done?
    On Wed, 2008-08-20 at 00:18 +0100, Olly Betts wrote:
    On Mon, Aug 18, 2008 at 09:15:17PM -0300, Agust?n wrote:
    4. I probably should have used "collapse keys" but I think that they are
    essentially filters and don't really achieve what I want (which would be
    to logically consider all the terms or some documents as being part of
    one virtual bigger document).
    Then just use that "virtual bigger document" as your "Xapian document".

    Cheers,
    Olly

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupxapian-discuss @
categoriesxapian
postedAug 19, '08 at 12:15a
activeAug 20, '08 at 4:14a
posts5
users2
websitexapian.org
irc#xapian

2 users in discussion

Agustín: 3 posts Olly Betts: 2 posts

People

Translate

site design / logo © 2022 Grokbase