FAQ
Hello,

I am a newbie user of Lucene Java. I am trying to use Lucene to enable in
search in my application. I am stuck a little in actual Search. I'm facing a
design issue and want help from experts here.

Here is my scenario:
* I have 7-8 tables in the database and I create a separate index for each
table. Index is created in the filesystem. (The fact that there is one index
per table is not under my control and I have to use that as an input)
* Fields in the indices may or may not be the same
* When I search I need to search in all indices

I tried looking at MultiSearcher. But, since the fields in indices are not
the same, I can't make use of one query and give it to the multi searcher. I
need to construct a separate MultiFieldQueryParser for each of my searchers.

If I create an array of searchers myself and call 'search' method on each of
them, I am left with having to merge the results properly (based on score).
But in this case, since the results from each searcher is normalized for
search in that index, merging is not a stright-forward operation and I'll
have to rely on some sort of a HitCollector since I need raw scores which'll
help me in merging the results.

Have people hit this problem before? Is there any other approach I can take?
Any pointers will greatly help me.

Thanks a lot for your time,
Beerbal

Search Discussions

  • Erick Erickson at Sep 6, 2008 at 11:16 pm
    Well, what is the higher-level task you're trying to accomplish? I have
    real trouble understanding why you're using Lucene at all. Especially
    because scoring will be irrelevant *between* searches, since they're
    going at completely different "documents". That is, there is no relation
    between the score from a hit on table A and the score on table B.

    IMO, Lucene should *not* be used to try to emulate a database. The
    very fact that you've got a separate index for each table implies that
    you're going to try to do some database-like operations. I'd just use
    the database.

    Often, Lucene is used to index de-normalized abstractions of the
    database in order to enable vastly improved search behavior. But that's
    explicitly prohibited by your design it seems to me.

    Of course I could be completely off base, if so could you elaborate
    on what the high-level objective is?

    I'd really go back to the givens of the problem. That is, why on earth
    are you required to have one index per table? It's much easier to
    revisit this sort of constraint than support a poor design into the
    indefinite future <G>.

    Best
    Erick
    On Sat, Sep 6, 2008 at 5:20 AM, wrote:

    Hello,

    I am a newbie user of Lucene Java. I am trying to use Lucene to enable in
    search in my application. I am stuck a little in actual Search. I'm facing
    a
    design issue and want help from experts here.

    Here is my scenario:
    * I have 7-8 tables in the database and I create a separate index for each
    table. Index is created in the filesystem. (The fact that there is one
    index
    per table is not under my control and I have to use that as an input)
    * Fields in the indices may or may not be the same
    * When I search I need to search in all indices

    I tried looking at MultiSearcher. But, since the fields in indices are not
    the same, I can't make use of one query and give it to the multi searcher.
    I
    need to construct a separate MultiFieldQueryParser for each of my
    searchers.

    If I create an array of searchers myself and call 'search' method on each
    of
    them, I am left with having to merge the results properly (based on score).
    But in this case, since the results from each searcher is normalized for
    search in that index, merging is not a stright-forward operation and I'll
    have to rely on some sort of a HitCollector since I need raw scores
    which'll
    help me in merging the results.

    Have people hit this problem before? Is there any other approach I can
    take?
    Any pointers will greatly help me.

    Thanks a lot for your time,
    Beerbal
  • Beerbal at Sep 7, 2008 at 1:38 am
    The main reason behind having one index per table is that the tables
    grow to be very big and we want to keep updating the index as the
    table gets modified (may be at specified intervals). So, if table A
    changes (row modified, added or deleted) we want this change to be
    reflected in the index and we want this change to be localized for
    that index.

    Task I am trying to achieve, is to have my application search enabled.
    Tables represent model elements like User, Community, Blog etc etc.
    When I search for something, say 'foobar', I need to return all the
    elements (user, community, blog etc etc) which match the criteria. For
    this purpose, I need to search in all indices. You are completely
    right about scoring between searches losing relevance. Thats why I am
    trying to get de-normalized scores and do the merge of search results
    based on that raw-score. Is there any glaringly wrong in the approach?

    Thanks,
    Beerbal
    On 9/7/08, Erick Erickson wrote:
    Well, what is the higher-level task you're trying to accomplish? I have
    real trouble understanding why you're using Lucene at all. Especially
    because scoring will be irrelevant *between* searches, since they're
    going at completely different "documents". That is, there is no relation
    between the score from a hit on table A and the score on table B.

    IMO, Lucene should *not* be used to try to emulate a database. The
    very fact that you've got a separate index for each table implies that
    you're going to try to do some database-like operations. I'd just use
    the database.

    Often, Lucene is used to index de-normalized abstractions of the
    database in order to enable vastly improved search behavior. But that's
    explicitly prohibited by your design it seems to me.

    Of course I could be completely off base, if so could you elaborate
    on what the high-level objective is?

    I'd really go back to the givens of the problem. That is, why on earth
    are you required to have one index per table? It's much easier to
    revisit this sort of constraint than support a poor design into the
    indefinite future <G>.

    Best
    Erick
    On Sat, Sep 6, 2008 at 5:20 AM, wrote:

    Hello,

    I am a newbie user of Lucene Java. I am trying to use Lucene to enable in
    search in my application. I am stuck a little in actual Search. I'm facing
    a
    design issue and want help from experts here.

    Here is my scenario:
    * I have 7-8 tables in the database and I create a separate index for each
    table. Index is created in the filesystem. (The fact that there is one
    index
    per table is not under my control and I have to use that as an input)
    * Fields in the indices may or may not be the same
    * When I search I need to search in all indices

    I tried looking at MultiSearcher. But, since the fields in indices are not
    the same, I can't make use of one query and give it to the multi searcher.
    I
    need to construct a separate MultiFieldQueryParser for each of my
    searchers.

    If I create an array of searchers myself and call 'search' method on each
    of
    them, I am left with having to merge the results properly (based on
    score).
    But in this case, since the results from each searcher is normalized for
    search in that index, merging is not a stright-forward operation and I'll
    have to rely on some sort of a HitCollector since I need raw scores
    which'll
    help me in merging the results.

    Have people hit this problem before? Is there any other approach I can
    take?
    Any pointers will greatly help me.

    Thanks a lot for your time,
    Beerbal
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 6, '08 at 9:21a
activeSep 7, '08 at 1:38a
posts3
users2
websitelucene.apache.org

2 users in discussion

Beerbal: 2 posts Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase