FAQ
Hello,

Just to provide some backround information. We have 4 main entities that we
can search over in our product. These entities represent...
* Client Information (Has 30+ fields storing client information so user can
search clientName:"Ian")
* Project Information (Has 20+ fields)
* Contact Information (Has 30+ fields)
* Document Information (Has 5 fields with one containing tokenized text of
word, txt, pdf, etc documents)

All this information is stored within the one index. All the documents have
a type and an id field to uniquely identify the entity instance. We allow
the user to choose what entity types to search over e.g. Just Client and
Project information or all information etc. We use the type field to specify
what entity type to restrict the search to.

My question surrounds the best mechanism to index this information? Should I
be storing these entities in different indices? This would mean I might have
to preform four searches if searching over all information and maybe lose
search relevance ordering. The problem with using one index is that I have a
large amount of fields for each Lucene document that are not populated or
relevent when the document represents only one of my entities.

Thank you,
JP

Also, I am a Lucene.Net user. Is this still the best place for me to post my
questions?
--
View this message in context: http://www.nabble.com/Using-single-or-multiple-indices-for-searching-different-entity-types-tp17122272p17122272.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Karl Wettin at May 8, 2008 at 11:48 am

    JP O'Gorman skrev:

    All this information is stored within the one index. All the documents have
    a type and an id field to uniquely identify the entity instance. We allow
    the user to choose what entity types to search over e.g. Just Client and
    Project information or all information etc. We use the type field to specify
    what entity type to restrict the search to.
    Do you use a filter or a term query to restrict the search? Filters
    usually speed things up a lot.
    My question surrounds the best mechanism to index this information? Should I
    be storing these entities in different indices? This would mean I might have
    to preform four searches if searching over all information and maybe lose
    MultiReader
    search relevance ordering. The problem with using one index is that I have a
    large amount of fields for each Lucene document that are not populated or
    relevent when the document represents only one of my entities.
    FieldSelector


    What is your current problem? Do you want to optimize for reponse speed
    or for hardware resources? I think you want a single index. I also think
    you want to benchmark all sort of strategies so you will see what
    affects what.
    Also, I am a Lucene.Net user. Is this still the best place for me to post my
    questions?
    We can only help you with basic conceptual things. I don't even know if
    the things I've mentioned exists for Lucene.Net. Look here:

    http://incubator.apache.org/lucene.net/


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 8, '08 at 8:05a
activeMay 8, '08 at 11:48a
posts2
users2
websitelucene.apache.org

2 users in discussion

JP O'Gorman: 1 post Karl Wettin: 1 post

People

Translate

site design / logo © 2022 Grokbase