FAQ
Hi,

We are developing a site with a 4 tier design (RP, UI, WS, DB) and on
the WS tier are looking at how we would setup Lucene in a HA
configuration i.e. so there is no single point of failure. The initial
deployment will involve pairs of servers at each tier.

As there are at least 2 servers at the WS (Lucene) tier that implies at
least 2 indexes.

As far as best practices go:

1) What is the typical architecture for Lucene in a HA configuration?

2) How are indexes typically maintained in some sort of sync? i.e. if a
request comes in to do a search on the UI tier and returns a set of
results and we want the next page of results but aren't using say
stickiness if the indexes are out of sync this could be problematic.
No? How are these issues solved?

3) What types of things are done to the DB to keep track of updates e.g.
having a last indexed timestamp is great but if you have 2 indexes are
you adding 2 columns on each table?

4) Are there any white papers or references worth looking at?

FYI... the site is being designed to scale to millions of users and
already incorporates Sharding for user data and related content.

Much Appreciated.

--Nikolaos

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Ian Lea at Feb 9, 2011 at 10:27 am
    One way, not necessarily typical or best practice, but known to work,
    is to designate one of your WS layer machines, or another server, as
    the master indexer. Run all index updates on that server and copy
    indexes out to other server(s) using rsync. That is normally quick
    since it only takes changes. Have something that notified slave
    servers when they need to reopen their indexes so that searches don't
    get out of synch.

    Or look at solr. I believe it takes care of most of this out of the box.


    --
    Ian.


    On Tue, Feb 8, 2011 at 6:23 PM, BrightMinds Dev wrote:
    Hi,

    We are developing a site with a 4 tier design (RP, UI, WS, DB) and on the WS
    tier are looking at how we would setup Lucene in a HA configuration i.e. so
    there is no single point of failure.  The initial deployment will involve
    pairs of servers at each tier.

    As there are at least 2 servers at the WS (Lucene) tier that implies at
    least 2 indexes.

    As far as best practices go:

    1) What is the typical architecture for Lucene in a HA configuration?

    2) How are indexes typically maintained in some sort of sync?  i.e. if a
    request comes in to do a search on the UI tier and returns a set of results
    and we want the next page of results but aren't using say stickiness if the
    indexes are out of sync this could be problematic.  No?  How are these
    issues solved?

    3) What types of things are done to the DB to keep track of updates e.g.
    having a last indexed timestamp is great but if you have 2 indexes are you
    adding 2 columns on each table?

    4) Are there any white papers or references worth looking at?

    FYI... the site is being designed to scale to millions of users and already
    incorporates Sharding for user data and related content.

    Much Appreciated.

    --Nikolaos

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 8, '11 at 6:24p
activeFeb 9, '11 at 10:27a
posts2
users2
websitelucene.apache.org

2 users in discussion

Ian Lea: 1 post BrightMinds Dev: 1 post

People

Translate

site design / logo © 2023 Grokbase