FAQ
Hi all,

I am wondering if there exist any implemenation of
org.apache.lucene.store.Directory which can be distributed across
multiple machines with comparable performance to a local FSDirectory
index, or is such an idea feasible in the first place.

By comparable performance I mean a 100G index distributed in 10
machines should achieve the same performance as a 10G index on a local
FSDirectory.

I know that optimization would be a problem for such a big index, but
would the partial optimization introduced in Lucene 2.3 help?

Any thoughts?

Regards,
Cedric Ho

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Karl Wettin at Jan 31, 2008 at 10:59 am

    31 jan 2008 kl. 09.42 skrev Cedric Ho:

    I am wondering if there exist any implemenation of
    org.apache.lucene.store.Directory which can be distributed across
    multiple machines with comparable performance to a local FSDirectory
    index, or is such an idea feasible in the first place.

    By comparable performance I mean a 100G index distributed in 10
    machines should achieve the same performance as a 10G index on a local
    FSDirectory.
    I never used these things and don't know about their caveats, but
    perhaps a combination of

    <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/RemoteSearchable.html
    >

    and

    <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/ParallelMultiSearcher.html
    >

    can help you?


    karl



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Cedric Ho at Feb 1, 2008 at 1:23 am
    Yes, I am aware of the RemoteSearchable and ParallelSearcher. And I am
    doing something similiar now. i.e. split the index on multiple
    machines.

    But managing such a set of indexes is not trivial. Especially when
    need to add redundancies for reliability and update frequently.

    I bumped into this a while ago:

    http://www.kimchy.org/compasslucene-and-datagrids/

    also I've heard there is a Directory implemented for the HDFS but is
    unfortunately very slow. which makes me wonder whether this type of
    approach is practical (i.e. having good performance, can update index
    easily, optimization won't takes too long, etc)

    Cedric

    On Jan 31, 2008 6:59 PM, Karl Wettin wrote:
    31 jan 2008 kl. 09.42 skrev Cedric Ho:
    I am wondering if there exist any implemenation of
    org.apache.lucene.store.Directory which can be distributed across
    multiple machines with comparable performance to a local FSDirectory
    index, or is such an idea feasible in the first place.

    By comparable performance I mean a 100G index distributed in 10
    machines should achieve the same performance as a 10G index on a local
    FSDirectory.
    I never used these things and don't know about their caveats, but
    perhaps a combination of

    <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/RemoteSearchable.html
    and

    <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/ParallelMultiSearcher.html
    can help you?


    karl



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Feb 1, 2008 at 1:48 am

    Cedric Ho wrote:
    But managing such a set of indexes is not trivial. Especially when
    need to add redundancies for reliability and update frequently.
    Agreed. Apparently the Solr guys are working on this now. Certainly not
    trivial to do right. You might want to check out that work.

    I want to start a project for this functionality myself soon - but just
    with Lucene. Personally, I think the only way to go is to use Jini, but
    I am waiting for the first release of Apache River before getting
    started (*very* soon I hope). That gets you through the 8 fallacies of
    distributed computing with almost no work right off the bat. Self
    discovery, leasing, redundancy, etc with minimal effort. Hopefully I
    will be able to recruit some help with this. From what I can tell, there
    is a lot of roll your own for this type of thing out there...it would be
    nice to focus some work on a system that can be used by all.

    - Mark

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Cedric Ho at Feb 1, 2008 at 8:59 am

    On Feb 1, 2008 9:47 AM, Mark Miller wrote:
    Cedric Ho wrote:
    But managing such a set of indexes is not trivial. Especially when
    need to add redundancies for reliability and update frequently.
    Agreed. Apparently the Solr guys are working on this now. Certainly not
    trivial to do right. You might want to check out that work.
    do you mean this? (SOLR-303) Distributed Search over HTTP. It seems
    quite complicated and is not ready for use yet.
    As for Solr, let's just say It provides a lot of great functionalities
    that I don't need, and a lot of functionalities that I need is not
    there. So I eventually stick with Lucene only.
    I want to start a project for this functionality myself soon - but just
    with Lucene. Personally, I think the only way to go is to use Jini, but
    I am waiting for the first release of Apache River before getting
    started (*very* soon I hope). That gets you through the 8 fallacies of
    distributed computing with almost no work right off the bat. Self
    discovery, leasing, redundancy, etc with minimal effort. Hopefully I
    will be able to recruit some help with this. From what I can tell, there
    is a lot of roll your own for this type of thing out there...it would be
    nice to focus some work on a system that can be used by all.
    I don't know much about Jini. But I'd be willing to help if you need any. =)

    - Mark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 31, '08 at 8:43a
activeFeb 1, '08 at 8:59a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase