FAQ
So this question has two parts:

1. How does Lucene scale, exactly? Do we distribute the index to multiple
servers somehow? Or is it one index, sitting on some sort of a shared
filesystem, shared by all Lucene servers? If it's the latter, the bottleneck
will be I/O ... anyway, elaborate on scalability please, and how you set it
up

2. High availability. How would one go about making Lucene redundant?

Search Discussions

  • Peter W. at Jan 4, 2007 at 10:02 pm
    Mark,

    My understanding of Lucene is limited, but the issues
    seem similar to web server farms in that it comes down to
    linear scalability by adding more boxes.

    This means separate machines with their own indexes.

    Shared filesystems such as NFS work well in smaller environments
    but experience problems with heavy load (lost mounts req. reboots).

    There's no mysql-like 'replication' with masters using
    binary files to update slaves. However, since the index is
    file based, you can close Indexwriters and make hot copies or
    perform backups for redundancy.

    If you know XML, use Solr to post and retrieve documents to and from
    your various Lucene indexes. It hides the complexity of remote
    object brokering such as RMI.

    Solr also allows you to get result sets using JSON so you could
    provide distributed Lucene results to browsers as a .js widget.

    While not reflecting the latest 2.0 version release the Lucene in Action
    book provides good background on combining separate indexes.

    Regards,

    Peter W.

    On Jan 4, 2007, at 7:51 AM, Mark Mei wrote:

    So this question has two parts:

    1. How does Lucene scale, exactly? Do we distribute the index to
    multiple
    servers somehow? Or is it one index, sitting on some sort of a shared
    filesystem, shared by all Lucene servers? If it's the latter, the
    bottleneck
    will be I/O ... anyway, elaborate on scalability please, and how
    you set it
    up

    2. High availability. How would one go about making Lucene redundant?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Russ at Jan 4, 2007 at 10:43 pm
    If you do this on windows, you might be able to replicate the indexes using DFS. On linux you can probably use rsync to keep the different servers up to date.

    If the size of the index is an issue, lustre could be used to have one volume that's spread over many servers. Performance is supposed to be good with lustre as well.

    If you want to speed up individual queries when searching a large index, you can probably split up the index in some way among the servers, query them all at the same time and then aggregate the results. This is just an idea, but I believe it was mentioned in "lucene in action".

    Russ
    Sent wirelessly via BlackBerry from T-Mobile.

    -----Original Message-----
    From: "Peter W." <peter@marketingbrokers.com>
    Date: Thu, 4 Jan 2007 14:02:00
    To:java-user@lucene.apache.org
    Subject: Re: lucene scalability questions

    Mark,

    My understanding of Lucene is limited, but the issues
    seem similar to web server farms in that it comes down to
    linear scalability by adding more boxes.

    This means separate machines with their own indexes.

    Shared filesystems such as NFS work well in smaller environments
    but experience problems with heavy load (lost mounts req. reboots).

    There's no mysql-like 'replication' with masters using
    binary files to update slaves. However, since the index is
    file based, you can close Indexwriters and make hot copies or
    perform backups for redundancy.

    If you know XML, use Solr to post and retrieve documents to and from
    your various Lucene indexes. It hides the complexity of remote
    object brokering such as RMI.

    Solr also allows you to get result sets using JSON so you could
    provide distributed Lucene results to browsers as a .js widget.

    While not reflecting the latest 2.0 version release the Lucene in Action
    book provides good background on combining separate indexes.

    Regards,

    Peter W.

    On Jan 4, 2007, at 7:51 AM, Mark Mei wrote:

    So this question has two parts:

    1. How does Lucene scale, exactly? Do we distribute the index to
    multiple
    servers somehow? Or is it one index, sitting on some sort of a shared
    filesystem, shared by all Lucene servers? If it's the latter, the
    bottleneck
    will be I/O ... anyway, elaborate on scalability please, and how
    you set it
    up

    2. High availability. How would one go about making Lucene redundant?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 4, '07 at 3:55p
activeJan 4, '07 at 10:43p
posts3
users3
websitelucene.apache.org

3 users in discussion

Peter W.: 1 post Russ: 1 post Mark Mei: 1 post

People

Translate

site design / logo © 2022 Grokbase