FAQ
Hi,

I have setup cross data center replication using solr 6, i want to know why
the buffer needs to be enabled on the source cluster? Even if the buffer is
not enabled, i am able to replicate the data between source and target
sites. What is the advantages of enabling the buffer on the source site? If
i enable the buffer, the transaction logs are never deleted and over a
period of time we are running out of disk. Can you please let me know why
the buffer enabling is required?

--
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Search Discussions

  • Renaud Delbru at Jun 14, 2016 at 8:50 am
    Hi Bharath,

    The buffer is useful when you need to buffer updates on the source
    cluster before starting cdcr, if the source cluster might receive
    updates in the meanwhile and you want to be sure to not miss them.

    To understand this better, you need to understand how cdcr clean
    transaction logs. Cdcr when started (with the START action) will
    instantiate a log reader for each target cluster. The position of the
    log reader will indicate cdcr which transaction logs it can clean. If
    all the log readers are beyond a certain point, then cdcr can clean all
    the transaction logs up to this point.

    However, there might be cases when the source cluster will be up without
    any log readers instantiated:
    1) The source cluster is started, but cdcr is not started yet
    2) the source cluster is started, cdcr is started, but the target
    cluster was not accessible when cdcr was started. In this case, cdcr
    will not be able to instantiate a log reader for this cluster.

    In these two scenarios, if updates are received by the source cluster,
    then they might be cleaned out from the transaction log as per the
    normal update log cleaning procedure.
    That is where the buffer becomes useful. When you know that while
    starting up your clusters and cdcr, you will be in one of these two
    scenarios, then you can activate the buffer to be sure to not miss
    updates. Then when the source and target clusters are properly up and
    cdcr replication is properly started, you can turn off this buffer.

    --
    Renaud Delbru
    On 14/06/16 06:41, Bharath Kumar wrote:
    Hi,

    I have setup cross data center replication using solr 6, i want to know why
    the buffer needs to be enabled on the source cluster? Even if the buffer is
    not enabled, i am able to replicate the data between source and target
    sites. What is the advantages of enabling the buffer on the source site? If
    i enable the buffer, the transaction logs are never deleted and over a
    period of time we are running out of disk. Can you please let me know why
    the buffer enabling is required?
  • Davis, Daniel (NIH/NLM) [C] at Jun 14, 2016 at 1:50 pm
    I must chime in to clarify something - in case 2, would the source cluster eventually start a log reader on its own? That is, would the CDCR heal over time, or would manual action be required?

    -----Original Message-----
    From: Renaud Delbru
    Sent: Tuesday, June 14, 2016 4:51 AM
    To: solr-user@lucene.apache.org
    Subject: Re: Regarding CDCR SOLR 6

    Hi Bharath,

    The buffer is useful when you need to buffer updates on the source cluster before starting cdcr, if the source cluster might receive updates in the meanwhile and you want to be sure to not miss them.

    To understand this better, you need to understand how cdcr clean transaction logs. Cdcr when started (with the START action) will instantiate a log reader for each target cluster. The position of the log reader will indicate cdcr which transaction logs it can clean. If all the log readers are beyond a certain point, then cdcr can clean all the transaction logs up to this point.

    However, there might be cases when the source cluster will be up without any log readers instantiated:
    1) The source cluster is started, but cdcr is not started yet
    2) the source cluster is started, cdcr is started, but the target cluster was not accessible when cdcr was started. In this case, cdcr will not be able to instantiate a log reader for this cluster.

    In these two scenarios, if updates are received by the source cluster, then they might be cleaned out from the transaction log as per the normal update log cleaning procedure.
    That is where the buffer becomes useful. When you know that while starting up your clusters and cdcr, you will be in one of these two scenarios, then you can activate the buffer to be sure to not miss updates. Then when the source and target clusters are properly up and cdcr replication is properly started, you can turn off this buffer.

    --
    Renaud Delbru
    On 14/06/16 06:41, Bharath Kumar wrote:
    Hi,

    I have setup cross data center replication using solr 6, i want to
    know why the buffer needs to be enabled on the source cluster? Even if
    the buffer is not enabled, i am able to replicate the data between
    source and target sites. What is the advantages of enabling the buffer
    on the source site? If i enable the buffer, the transaction logs are
    never deleted and over a period of time we are running out of disk.
    Can you please let me know why the buffer enabling is required?
  • Bharath Kumar at Jun 15, 2016 at 2:10 am
    Hi Renaud,

    Thank you so much for your response. It is very helpful and it helped me
    understand the need for turning on buffering.

    Is it recommended to keep the buffering enabled all the time on the source
    cluster? If the target cluster is up and running and the cdcr is started,
    can i turn off the buffering on the source site?

    As you have mentioned, the transaction logs are kept on the source cluster,
    until the data is replicated on the target cluster, once the cdcr is
    started, is there a possibility that if on the target cluster


    On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] wrote:

    I must chime in to clarify something - in case 2, would the source cluster
    eventually start a log reader on its own? That is, would the CDCR heal
    over time, or would manual action be required?

    -----Original Message-----
    From: Renaud Delbru
    Sent: Tuesday, June 14, 2016 4:51 AM
    To: solr-user@lucene.apache.org
    Subject: Re: Regarding CDCR SOLR 6

    Hi Bharath,

    The buffer is useful when you need to buffer updates on the source cluster
    before starting cdcr, if the source cluster might receive updates in the
    meanwhile and you want to be sure to not miss them.

    To understand this better, you need to understand how cdcr clean
    transaction logs. Cdcr when started (with the START action) will
    instantiate a log reader for each target cluster. The position of the log
    reader will indicate cdcr which transaction logs it can clean. If all the
    log readers are beyond a certain point, then cdcr can clean all the
    transaction logs up to this point.

    However, there might be cases when the source cluster will be up without
    any log readers instantiated:
    1) The source cluster is started, but cdcr is not started yet
    2) the source cluster is started, cdcr is started, but the target cluster
    was not accessible when cdcr was started. In this case, cdcr will not be
    able to instantiate a log reader for this cluster.

    In these two scenarios, if updates are received by the source cluster,
    then they might be cleaned out from the transaction log as per the normal
    update log cleaning procedure.
    That is where the buffer becomes useful. When you know that while starting
    up your clusters and cdcr, you will be in one of these two scenarios, then
    you can activate the buffer to be sure to not miss updates. Then when the
    source and target clusters are properly up and cdcr replication is properly
    started, you can turn off this buffer.

    --
    Renaud Delbru
    On 14/06/16 06:41, Bharath Kumar wrote:
    Hi,

    I have setup cross data center replication using solr 6, i want to
    know why the buffer needs to be enabled on the source cluster? Even if
    the buffer is not enabled, i am able to replicate the data between
    source and target sites. What is the advantages of enabling the buffer
    on the source site? If i enable the buffer, the transaction logs are
    never deleted and over a period of time we are running out of disk.
    Can you please let me know why the buffer enabling is required?

    --
    Thanks & Regards,
    Bharath MV Kumar

    "Life is short, enjoy every moment of it"
  • Bharath Kumar at Jun 15, 2016 at 2:18 am
    Hi Renaud,

    Thank you so much for your response. It is very helpful and it helped me
    understand the need for turning on buffering.

    Is it recommended to keep the buffering enabled all the time on the source
    cluster? If the target cluster is up and running and the cdcr is started,
    can i turn off the buffering on the source site?

    As you have mentioned, the transaction logs are kept on the source cluster,
    until the data is replicated on the target cluster, once the cdcr is
    started. Is there a possibility that target cluster is out of sync with the
    source cluster and we need to do a hard recovery from the source cluster to
    sync up the target cluster?

    Also i have the below configuration on the source cluster to synchronize
    the update logs.
        <lst name="updateLogSynchronizer">
         <str name="schedule">1000</str>
       </lst>

    Regarding the monitoring of the replication, i am planning to add a script
    to check the queue size, to make sure the disk is not full in case the
    target site is down and the transaction log size keeps growing on the
    source site.
    Is there any other recommended approach?

    Thanks again, your inputs were very helpful.
    On Tue, Jun 14, 2016 at 7:10 PM, Bharath Kumar wrote:

    Hi Renaud,

    Thank you so much for your response. It is very helpful and it helped me
    understand the need for turning on buffering.

    Is it recommended to keep the buffering enabled all the time on the source
    cluster? If the target cluster is up and running and the cdcr is started,
    can i turn off the buffering on the source site?

    As you have mentioned, the transaction logs are kept on the source
    cluster, until the data is replicated on the target cluster, once the cdcr
    is started, is there a possibility that if on the target cluster



    On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] <
    daniel.davis@nih.gov> wrote:
    I must chime in to clarify something - in case 2, would the source
    cluster eventually start a log reader on its own? That is, would the CDCR
    heal over time, or would manual action be required?

    -----Original Message-----
    From: Renaud Delbru
    Sent: Tuesday, June 14, 2016 4:51 AM
    To: solr-user@lucene.apache.org
    Subject: Re: Regarding CDCR SOLR 6

    Hi Bharath,

    The buffer is useful when you need to buffer updates on the source
    cluster before starting cdcr, if the source cluster might receive updates
    in the meanwhile and you want to be sure to not miss them.

    To understand this better, you need to understand how cdcr clean
    transaction logs. Cdcr when started (with the START action) will
    instantiate a log reader for each target cluster. The position of the log
    reader will indicate cdcr which transaction logs it can clean. If all the
    log readers are beyond a certain point, then cdcr can clean all the
    transaction logs up to this point.

    However, there might be cases when the source cluster will be up without
    any log readers instantiated:
    1) The source cluster is started, but cdcr is not started yet
    2) the source cluster is started, cdcr is started, but the target cluster
    was not accessible when cdcr was started. In this case, cdcr will not be
    able to instantiate a log reader for this cluster.

    In these two scenarios, if updates are received by the source cluster,
    then they might be cleaned out from the transaction log as per the normal
    update log cleaning procedure.
    That is where the buffer becomes useful. When you know that while
    starting up your clusters and cdcr, you will be in one of these two
    scenarios, then you can activate the buffer to be sure to not miss updates.
    Then when the source and target clusters are properly up and cdcr
    replication is properly started, you can turn off this buffer.

    --
    Renaud Delbru
    On 14/06/16 06:41, Bharath Kumar wrote:
    Hi,

    I have setup cross data center replication using solr 6, i want to
    know why the buffer needs to be enabled on the source cluster? Even if
    the buffer is not enabled, i am able to replicate the data between
    source and target sites. What is the advantages of enabling the buffer
    on the source site? If i enable the buffer, the transaction logs are
    never deleted and over a period of time we are running out of disk.
    Can you please let me know why the buffer enabling is required?

    --
    Thanks & Regards,
    Bharath MV Kumar

    "Life is short, enjoy every moment of it"


    --
    Thanks & Regards,
    Bharath MV Kumar

    "Life is short, enjoy every moment of it"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 14, '16 at 5:41a
activeJun 15, '16 at 2:18a
posts5
users3
websitelucene.apache.org...

People

Translate

site design / logo © 2019 Grokbase