FAQ
How have folks gone about setting up Lucene in a server farm? Just a network-accessible shared directory?

Regards,
Brian.
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

Search Discussions

  • Ben West at May 20, 2011 at 6:07 pm
    The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

    Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

    Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

    Hope this helps,
    -Ben

    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>
    Cc:
    Sent: Friday, May 20, 2011 12:40 PM
    Subject: [Lucene.Net] Server farm sharing Lucene

    How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Brian Sayatovic at May 20, 2011 at 6:28 pm
    I'm also concerned with the "liveliness". We have index updates happening in conjunction with writes to our database. Thus, if a user creates a record, it's instantly indexed. That means they can create an entry and instantly search for it.

    If I were to schedule period index updates, they wouldn't' be able to do this.

    Thus far, our dozens of developers have been all sharing a network accessible index in this manner. No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).

    Based on your statements, I may need to re-prioritize the risk mitigation.

    Regards,
    Brian.

    -----Original Message-----
    From: Ben West
    Sent: Friday, May 20, 2011 2:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

    Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

    Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

    Hope this helps,
    -Ben

    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>
    Cc:
    Sent: Friday, May 20, 2011 12:40 PM
    Subject: [Lucene.Net] Server farm sharing Lucene

    How have folks gone about setting up Lucene in a server farm? Just a network-accessible shared directory?

    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Ben West at May 20, 2011 at 6:44 pm
    The idea of a scheduled task was just a very simple one. I think Microsoft's DFS is a glorified form of this: it just listens for changes on one server and copies them over to the others. I'm sure you can find many other tools which do something similar. You would need to check IndexReader.IsCurrent periodically, but I guess you must already be doing that.

    Also: premature optimization is the root of all evil. If you don't have any problems with how it works now, don't let me confuse you into creating some :-) For a smallish index, all but the most egregious misuses of Lucene are still pretty fast.

    -Ben


    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>; Ben West <bwsithspawn00@yahoo.com>
    Cc:
    Sent: Friday, May 20, 2011 1:28 PM
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    I'm also concerned with the "liveliness".  We have index updates happening in conjunction with writes to our database.  Thus, if a user creates a record, it's instantly indexed.  That means they can create an entry and instantly search for it.

    If I were to schedule period index updates, they wouldn't' be able to do this.

    Thus far, our dozens of developers have been all sharing a network accessible index in this manner.  No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).

    Based on your statements, I may need to re-prioritize the risk mitigation.

    Regards,
    Brian.

    -----Original Message-----
    From: Ben West
    Sent: Friday, May 20, 2011 2:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

    Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

    Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

    Hope this helps,
    -Ben

    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>
    Cc:
    Sent: Friday, May 20, 2011 12:40 PM
    Subject: [Lucene.Net] Server farm sharing Lucene

    How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Shashi Kant at May 20, 2011 at 7:25 pm
    Not a direct answer, but have you looked at Elastic search?
    http://www.elasticsearch.org/

    On Fri, May 20, 2011 at 2:44 PM, Ben West wrote:
    The idea of a scheduled task was just a very simple one. I think Microsoft's DFS is a glorified form of this: it just listens for changes on one server and copies them over to the others. I'm sure you can find many other tools which do something similar. You would need to check IndexReader.IsCurrent periodically, but I guess you must already be doing that.

    Also: premature optimization is the root of all evil. If you don't have any problems with how it works now, don't let me confuse you into creating some :-) For a smallish index, all but the most egregious misuses of Lucene are still pretty fast.

    -Ben


    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>; Ben West <bwsithspawn00@yahoo.com>
    Cc:
    Sent: Friday, May 20, 2011 1:28 PM
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    I'm also concerned with the "liveliness".  We have index updates happening in conjunction with writes to our database.  Thus, if a user creates a record, it's instantly indexed.  That means they can create an entry and instantly search for it.

    If I were to schedule period index updates, they wouldn't' be able to do this.

    Thus far, our dozens of developers have been all sharing a network accessible index in this manner.  No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).

    Based on your statements, I may need to re-prioritize the risk mitigation.

    Regards,
    Brian.

    -----Original Message-----
    From: Ben West
    Sent: Friday, May 20, 2011 2:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

    Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

    Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

    Hope this helps,
    -Ben

    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <lucene-net-user@lucene.apache.org>
    Cc:
    Sent: Friday, May 20, 2011 12:40 PM
    Subject: [Lucene.Net] Server farm sharing Lucene

    How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Gustavo Sandrigo at May 20, 2011 at 8:37 pm
    Brian
    I am dealing with a similar situation as the liveliness of the index.
    I am looking for options to deal with this and so far I have found this open
    source project that liked-in created.
    Look at the documentation, it has a very nice way of dealing with this
    issue.

    http://sna-projects.com/zoie/

    I guess I will need to build my own implementation of the idea, or see about
    getting people to help with porting this to .net

    I hope this helps is some way.

    On Fri, May 20, 2011 at 12:24 PM, Shashi Kant wrote:

    Not a direct answer, but have you looked at Elastic search?
    http://www.elasticsearch.org/

    On Fri, May 20, 2011 at 2:44 PM, Ben West wrote:
    The idea of a scheduled task was just a very simple one. I think
    Microsoft's DFS is a glorified form of this: it just listens for changes on
    one server and copies them over to the others. I'm sure you can find many
    other tools which do something similar. You would need to check
    IndexReader.IsCurrent periodically, but I guess you must already be doing
    that.
    Also: premature optimization is the root of all evil. If you don't have
    any problems with how it works now, don't let me confuse you into creating
    some :-) For a smallish index, all but the most egregious misuses of Lucene
    are still pretty fast.
    -Ben


    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <
    lucene-net-user@lucene.apache.org>; Ben West <bwsithspawn00@yahoo.com>
    Cc:
    Sent: Friday, May 20, 2011 1:28 PM
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    I'm also concerned with the "liveliness". We have index updates
    happening in conjunction with writes to our database. Thus, if a user
    creates a record, it's instantly indexed. That means they can create an
    entry and instantly search for it.
    If I were to schedule period index updates, they wouldn't' be able to do this.
    Thus far, our dozens of developers have been all sharing a network
    accessible index in this manner. No one has complained, but then again,
    we're not yet focusing on performance of search (many other concerns in
    front of that).
    Based on your statements, I may need to re-prioritize the risk
    mitigation.
    Regards,
    Brian.

    -----Original Message-----
    From: Ben West
    Sent: Friday, May 20, 2011 2:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed)
    specifically warns against using remote file systems. Depending on what you
    mean by "network-accessible", it could be a lot slower. You (probably) want
    something of the form: the data is stored locally, but is updated
    periodically from a remote location. The simplest thing is a scheduled task
    which just copies over the new index every day at midnight.
    Even with an ideal filesystem, you're going to have to deal with paying
    an additional warmup penalty that you wouldn't get in an NRT configuration.
    Another thing to note is that, while it's very easy to have multiple
    readers, it is really hard to have multiple IndexWriters. We just have one
    writer, and deal with the fact that it's not highly available.
    Hope this helps,
    -Ben

    ----- Original Message -----
    From: Brian Sayatovic <bsayatovic@creditinfonet.com>
    To: "lucene-net-user@lucene.apache.org" <
    lucene-net-user@lucene.apache.org>
    Cc:
    Sent: Friday, May 20, 2011 12:40 PM
    Subject: [Lucene.Net] Server farm sharing Lucene

    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at: www.cinlegal.com<
    http://www.cinlegal.com>
    This message may contain confidential / proprietary information from CIN
    Legal Data Service and Credit Infonet, Inc.. If you are not an intended
    recipient, please refrain from the disclosure, copying, distribution or use
    of this information. All such unauthorized actions are strictly prohibited.
    If you have received this transmission in error, please notify the sender by
    e-mail at bsayatovic@creditinfonet.com and delete all copies of this
    material from any computer.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at: www.cinlegal.com<
    http://www.cinlegal.com>
    This message may contain confidential / proprietary information from CIN
    Legal Data Service and Credit Infonet, Inc.. If you are not an intended
    recipient, please refrain from the disclosure, copying, distribution or use
    of this information. All such unauthorized actions are strictly prohibited.
    If you have received this transmission in error, please notify the sender by
    e-mail at bsayatovic@creditinfonet.com and delete all copies of this
    material from any computer.
  • Ken Foskey at May 20, 2011 at 11:26 pm
    Shared directory means network so you have two latencies and much more traffic on the network.

    .net has file monitor which will trigger a function on change of file. You can use this to push a file on change. If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.

    Ken Foskey
    On 21/05/2011, at 3:40 AM, Brian Sayatovic wrote:

    How have folks gone about setting up Lucene in a server farm? Just a network-accessible shared directory?

    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Moray McConnachie at May 23, 2011 at 8:40 am
    If your traffic is high enough to warrant the server farm, and search is
    a highly used feature, it is also worth thinking about a dedicated
    search server (or pair of such synced as suggested by Ken and/or
    separately driven by your publishing tools depending on the degree of
    redundancy and failsafe you need).

    We use a dedicated search server as a service, running a custom wrapper
    - we pass a Lucene Query across the network using .NET Remoting -
    binary-serialization over TCP (stay away from other forms of
    serialization unless you have lots of resources to throw at search and
    lots of bandwidth), returning a custom object containing the results and
    other assorted metadata, including faceting.

    .NET remoting is a joy in this context, you only need to be careful
    about version synchronisation - upgrades need to be carefully planned so
    that servers with e.g. an upgraded Lucene only talk to a search server
    with an upgraded Lucene.

    Yours,
    Moray
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Ken Foskey
    Sent: 21 May 2011 00:25
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    Shared directory means network so you have two latencies and much more
    traffic on the network.

    .net has file monitor which will trigger a function on change of file.
    You can use this to push a file on change. If you do this copy it to
    the same file system (partition) then move it into place after so it is
    immediately copied.

    Ken Foskey

    On 21/05/2011, at 3:40 AM, Brian Sayatovic
    wrote:
    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from
    CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
    intended recipient, please refrain from the disclosure, copying,
    distribution or use of this information. All such unauthorized actions
    are strictly prohibited. If you have received this transmission in
    error, please notify the sender by e-mail at
    bsayatovic@creditinfonet.com and delete all copies of this material from
    any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------
  • Brian Sayatovic at May 23, 2011 at 1:06 pm
    Interesting!

    Right now, the index updating occurs on the same thread where the DB write is occurring. This is nice in that we have little room for one to happen without the other. With a dedicated search server, I'd have to see pushing the update off to that other server via a message queue, perhaps, and then the ability have all servers in the farm query through it.

    Still, I'd worry about fail over. We have some other failover strategies where every server in the farm is capable of a function, but only one server is actively doing it. But each server periodically checks if any other server still has an "active claim" (i.e. not too old) and if not, it will pick up. So in the event one server fails, another in the farm takes over.

    Perhaps I could marry the two.

    But, as said earlier in this thread, I won't prematurely optimize.

    -----Original Message-----
    From: Moray McConnachie
    Sent: Monday, May 23, 2011 4:40 AM
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    If your traffic is high enough to warrant the server farm, and search is a highly used feature, it is also worth thinking about a dedicated search server (or pair of such synced as suggested by Ken and/or separately driven by your publishing tools depending on the degree of redundancy and failsafe you need).

    We use a dedicated search server as a service, running a custom wrapper
    - we pass a Lucene Query across the network using .NET Remoting - binary-serialization over TCP (stay away from other forms of serialization unless you have lots of resources to throw at search and lots of bandwidth), returning a custom object containing the results and other assorted metadata, including faceting.

    .NET remoting is a joy in this context, you only need to be careful about version synchronisation - upgrades need to be carefully planned so that servers with e.g. an upgraded Lucene only talk to a search server with an upgraded Lucene.

    Yours,
    Moray
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Ken Foskey
    Sent: 21 May 2011 00:25
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    Shared directory means network so you have two latencies and much more traffic on the network.

    .net has file monitor which will trigger a function on change of file.
    You can use this to push a file on change. If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.

    Ken Foskey
    On 21/05/2011, at 3:40 AM, Brian Sayatovic wrote:

    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from
    CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Kevin Miller at May 23, 2011 at 1:27 pm
    Our solution for this concern was to create a search web service which
    spits out JSON results. Works great but you have to limit your search
    clients to the query interface the web service supports. This limits
    you sometimes from doing some of the crazy advanced things you can do
    with Lucene.

    Kevin Miller

    On May 23, 2011, at 8:06 AM, Brian Sayatovic
    wrote:
    Interesting!

    Right now, the index updating occurs on the same thread where the DB write is occurring. This is nice in that we have little room for one to happen without the other. With a dedicated search server, I'd have to see pushing the update off to that other server via a message queue, perhaps, and then the ability have all servers in the farm query through it.

    Still, I'd worry about fail over. We have some other failover strategies where every server in the farm is capable of a function, but only one server is actively doing it. But each server periodically checks if any other server still has an "active claim" (i.e. not too old) and if not, it will pick up. So in the event one server fails, another in the farm takes over.

    Perhaps I could marry the two.

    But, as said earlier in this thread, I won't prematurely optimize.

    -----Original Message-----
    From: Moray McConnachie
    Sent: Monday, May 23, 2011 4:40 AM
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    If your traffic is high enough to warrant the server farm, and search is a highly used feature, it is also worth thinking about a dedicated search server (or pair of such synced as suggested by Ken and/or separately driven by your publishing tools depending on the degree of redundancy and failsafe you need).

    We use a dedicated search server as a service, running a custom wrapper
    - we pass a Lucene Query across the network using .NET Remoting - binary-serialization over TCP (stay away from other forms of serialization unless you have lots of resources to throw at search and lots of bandwidth), returning a custom object containing the results and other assorted metadata, including faceting.

    .NET remoting is a joy in this context, you only need to be careful about version synchronisation - upgrades need to be carefully planned so that servers with e.g. an upgraded Lucene only talk to a search server with an upgraded Lucene.

    Yours,
    Moray
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Ken Foskey
    Sent: 21 May 2011 00:25
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    Shared directory means network so you have two latencies and much more traffic on the network.

    .net has file monitor which will trigger a function on change of file.
    You can use this to push a file on change. If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.

    Ken Foskey
    On 21/05/2011, at 3:40 AM, Brian Sayatovic wrote:

    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from
    CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

    ________________________________

    Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
  • Moray McConnachie at May 23, 2011 at 4:19 pm
    We use a message queue (MSMQ) for our indexing updates, and it works
    well. It's also not a bad one-to-many distribution system so you can
    update several search servers if you want to handle fail-over through
    multiple updates (rather than synchronising indices), and it can handle
    transactions, exceptions and retries pretty well too so you should be
    able to guarantee updates get synchronised.

    It's looked to me for a while as if MS will eventually get rid of MSMQ -
    but I might be out-of-date.

    I've had some bad experiences in the long distant past (v1.0) with
    .NET's file monitoring with updates failing to fire. No doubt this is
    all fixed in recent OS/.NET combinations, but it left me with a sour
    taste.

    M.

    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Brian Sayatovic
    Sent: 23 May 2011 14:06
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    Interesting!

    Right now, the index updating occurs on the same thread where the DB
    write is occurring. This is nice in that we have little room for one to
    happen without the other. With a dedicated search server, I'd have to
    see pushing the update off to that other server via a message queue,
    perhaps, and then the ability have all servers in the farm query through
    it.

    Still, I'd worry about fail over. We have some other failover
    strategies where every server in the farm is capable of a function, but
    only one server is actively doing it. But each server periodically
    checks if any other server still has an "active claim" (i.e. not too
    old) and if not, it will pick up. So in the event one server fails,
    another in the farm takes over.

    Perhaps I could marry the two.

    But, as said earlier in this thread, I won't prematurely optimize.

    -----Original Message-----
    From: Moray McConnachie
    Sent: Monday, May 23, 2011 4:40 AM
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    If your traffic is high enough to warrant the server farm, and search is
    a highly used feature, it is also worth thinking about a dedicated
    search server (or pair of such synced as suggested by Ken and/or
    separately driven by your publishing tools depending on the degree of
    redundancy and failsafe you need).

    We use a dedicated search server as a service, running a custom wrapper
    - we pass a Lucene Query across the network using .NET Remoting -
    binary-serialization over TCP (stay away from other forms of
    serialization unless you have lots of resources to throw at search and
    lots of bandwidth), returning a custom object containing the results and
    other assorted metadata, including faceting.

    .NET remoting is a joy in this context, you only need to be careful
    about version synchronisation - upgrades need to be carefully planned so
    that servers with e.g. an upgraded Lucene only talk to a search server
    with an upgraded Lucene.

    Yours,
    Moray
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Ken Foskey
    Sent: 21 May 2011 00:25
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    Shared directory means network so you have two latencies and much more
    traffic on the network.

    .net has file monitor which will trigger a function on change of file.
    You can use this to push a file on change. If you do this copy it to
    the same file system (partition) then move it into place after so it is
    immediately copied.

    Ken Foskey

    On 21/05/2011, at 3:40 AM, Brian Sayatovic
    wrote:
    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from
    CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
    intended recipient, please refrain from the disclosure, copying,
    distribution or use of this information. All such unauthorized actions
    are strictly prohibited. If you have received this transmission in
    error, please notify the sender by e-mail at
    bsayatovic@creditinfonet.com and delete all copies of this material from
    any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If
    this has been sent to you in error, please do not use, retain or
    disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN
    Legal Data Service and Credit Infonet, Inc.. If you are not an intended
    recipient, please refrain from the disclosure, copying, distribution or
    use of this information. All such unauthorized actions are strictly
    prohibited. If you have received this transmission in error, please
    notify the sender by e-mail at bsayatovic@creditinfonet.com and delete
    all copies of this material from any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------
  • Wyatt Barnett at May 23, 2011 at 4:22 pm
    Messaging +1, really the way to fly these days. Disk space is cheap,
    multiple local indexes don't hurt that much.
    On 5/23/11 12:19 PM, "Moray McConnachie" wrote:

    We use a message queue (MSMQ) for our indexing updates, and it works
    well. It's also not a bad one-to-many distribution system so you can
    update several search servers if you want to handle fail-over through
    multiple updates (rather than synchronising indices), and it can handle
    transactions, exceptions and retries pretty well too so you should be
    able to guarantee updates get synchronised.

    It's looked to me for a while as if MS will eventually get rid of MSMQ -
    but I might be out-of-date.

    I've had some bad experiences in the long distant past (v1.0) with
    .NET's file monitoring with updates failing to fire. No doubt this is
    all fixed in recent OS/.NET combinations, but it left me with a sour
    taste.

    M.

    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Brian Sayatovic
    Sent: 23 May 2011 14:06
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    Interesting!

    Right now, the index updating occurs on the same thread where the DB
    write is occurring. This is nice in that we have little room for one to
    happen without the other. With a dedicated search server, I'd have to
    see pushing the update off to that other server via a message queue,
    perhaps, and then the ability have all servers in the farm query through
    it.

    Still, I'd worry about fail over. We have some other failover
    strategies where every server in the farm is capable of a function, but
    only one server is actively doing it. But each server periodically
    checks if any other server still has an "active claim" (i.e. not too
    old) and if not, it will pick up. So in the event one server fails,
    another in the farm takes over.

    Perhaps I could marry the two.

    But, as said earlier in this thread, I won't prematurely optimize.

    -----Original Message-----
    From: Moray McConnachie
    Sent: Monday, May 23, 2011 4:40 AM
    To: lucene-net-user@lucene.apache.org
    Subject: RE: [Lucene.Net] Server farm sharing Lucene

    If your traffic is high enough to warrant the server farm, and search is
    a highly used feature, it is also worth thinking about a dedicated
    search server (or pair of such synced as suggested by Ken and/or
    separately driven by your publishing tools depending on the degree of
    redundancy and failsafe you need).

    We use a dedicated search server as a service, running a custom wrapper
    - we pass a Lucene Query across the network using .NET Remoting -
    binary-serialization over TCP (stay away from other forms of
    serialization unless you have lots of resources to throw at search and
    lots of bandwidth), returning a custom object containing the results and
    other assorted metadata, including faceting.

    .NET remoting is a joy in this context, you only need to be careful
    about version synchronisation - upgrades need to be carefully planned so
    that servers with e.g. an upgraded Lucene only talk to a search server
    with an upgraded Lucene.

    Yours,
    Moray
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com

    -----Original Message-----
    From: Ken Foskey
    Sent: 21 May 2011 00:25
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Server farm sharing Lucene

    Shared directory means network so you have two latencies and much more
    traffic on the network.

    .net has file monitor which will trigger a function on change of file.
    You can use this to push a file on change. If you do this copy it to
    the same file system (partition) then move it into place after so it is
    immediately copied.

    Ken Foskey

    On 21/05/2011, at 3:40 AM, Brian Sayatovic
    wrote:
    How have folks gone about setting up Lucene in a server farm? Just a
    network-accessible shared directory?
    Regards,
    Brian.
    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from
    CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
    intended recipient, please refrain from the disclosure, copying,
    distribution or use of this information. All such unauthorized actions
    are strictly prohibited. If you have received this transmission in
    error, please notify the sender by e-mail at
    bsayatovic@creditinfonet.com and delete all copies of this material from
    any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If
    this has been sent to you in error, please do not use, retain or
    disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

    ________________________________

    Learn more about the products, services and technology solutions
    available from CIN Legal Data Services at:
    www.cinlegal.com<http://www.cinlegal.com>

    This message may contain confidential / proprietary information from CIN
    Legal Data Service and Credit Infonet, Inc.. If you are not an intended
    recipient, please refrain from the disclosure, copying, distribution or
    use of this information. All such unauthorized actions are strictly
    prohibited. If you have received this transmission in error, please
    notify the sender by e-mail at bsayatovic@creditinfonet.com and delete
    all copies of this material from any computer.

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If
    this has been sent to you in error, please do not use, retain or disclose
    them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedMay 20, '11 at 5:41p
activeMay 23, '11 at 4:22p
posts12
users8
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase