FAQ
Got this lovely message from a data node this weekend :

Time: Sunday 25 October 2009 - 15:04:01
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node
(Internal error, programming error or missing error message, please
report a bug)
Error: 2303
Error data: Node 4 killed this node because GCP stop was detected
Error object: NDBCNTR (Line: 263) 0x0000000a
Program: ndbd
Pid: 21156
Trace: /var/lib/mysql/ndb_4_trace.log.5
Version: mysql-5.1.35 ndb-7.0.7
***EOM***

Interesting thing is there was no other node that got restarted. I did
look this up and see that it can be caused by slow disk (we use disk
based NDB), or insufficient disk throughput but my disks are raided
SCSI, so I doubt it is that. From my config.ini (relevant parts only) :

DataMemory=10860M
IndexMemory=1358M
SharedGlobalMemory=384M
DiskPageBufferMemory=2048M

I did the following caluculations for memory allocation :

total memory = 16384 MG
OS reqs = 1126 MG
Buffer Memory = 900 MG
DataMemory = 10860 MG
IndexMemory = 1358 MG

DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
DPBM = 1712

Therefore DiskPageBufferMemory should be mimimum 1712M, so setting it to
2048M should leave me loads of room right ?

So can anyone tell me why we are having this issue with Stopped
datanodes ? It isn't doing much for the pointy-haired bosses'
confidence !

Obviously let me know if I really need to file a bug, and I'll upload
the tracelog etc ...

Richard

Search Discussions

  • Jonas Oreland at Oct 26, 2009 at 2:25 pm
    What kind of transactions do you run ?
    Disk-based NDB is currently a bit sensitive to "big" transactions

    /Jonas

    Richard McCluskey wrote:
    Got this lovely message from a data node this weekend :

    Time: Sunday 25 October 2009 - 15:04:01
    Status: Temporary error, restart node
    Message: System error, node killed during node restart by other node
    (Internal error, programming error or missing error message, please
    report a bug)
    Error: 2303
    Error data: Node 4 killed this node because GCP stop was detected
    Error object: NDBCNTR (Line: 263) 0x0000000a
    Program: ndbd
    Pid: 21156
    Trace: /var/lib/mysql/ndb_4_trace.log.5
    Version: mysql-5.1.35 ndb-7.0.7
    ***EOM***

    Interesting thing is there was no other node that got restarted. I did
    look this up and see that it can be caused by slow disk (we use disk
    based NDB), or insufficient disk throughput but my disks are raided
    SCSI, so I doubt it is that. From my config.ini (relevant parts only) :

    DataMemory=10860M
    IndexMemory=1358M
    SharedGlobalMemory=384M
    DiskPageBufferMemory=2048M

    I did the following caluculations for memory allocation :

    total memory = 16384 MG
    OS reqs = 1126 MG
    Buffer Memory = 900 MG
    DataMemory = 10860 MG
    IndexMemory = 1358 MG

    DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
    DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
    DPBM = 1712

    Therefore DiskPageBufferMemory should be mimimum 1712M, so setting it to
    2048M should leave me loads of room right ?

    So can anyone tell me why we are having this issue with Stopped
    datanodes ? It isn't doing much for the pointy-haired bosses'
    confidence !

    Obviously let me know if I really need to file a bug, and I'll upload
    the tracelog etc ...

    Richard
  • Richard McCluskey at Oct 26, 2009 at 2:34 pm

    On Mon, 2009-10-26 at 15:24 +0100, Jonas Oreland wrote:
    What kind of transactions do you run ?
    Disk-based NDB is currently a bit sensitive to "big" transactions
    90% of our DB work is single record reads/writes. we do use some
    functions and stored procedures, but we do nothing that pulls large
    datasets.
    E.G. Our current most common table has 9.8 million rows, but we only
    ever pull out single records by primary key. The cleanup cron for stale
    records runs in the middle of the night when our traffic is almost nil.

    I hope this is what you were asking !

    Richard


    /Jonas

    Richard McCluskey wrote:
    Got this lovely message from a data node this weekend :

    Time: Sunday 25 October 2009 - 15:04:01
    Status: Temporary error, restart node
    Message: System error, node killed during node restart by other node
    (Internal error, programming error or missing error message, please
    report a bug)
    Error: 2303
    Error data: Node 4 killed this node because GCP stop was detected
    Error object: NDBCNTR (Line: 263) 0x0000000a
    Program: ndbd
    Pid: 21156
    Trace: /var/lib/mysql/ndb_4_trace.log.5
    Version: mysql-5.1.35 ndb-7.0.7
    ***EOM***

    Interesting thing is there was no other node that got restarted. I did
    look this up and see that it can be caused by slow disk (we use disk
    based NDB), or insufficient disk throughput but my disks are raided
    SCSI, so I doubt it is that. From my config.ini (relevant parts only) :

    DataMemory=10860M
    IndexMemory=1358M
    SharedGlobalMemory=384M
    DiskPageBufferMemory=2048M

    I did the following caluculations for memory allocation :

    total memory = 16384 MG
    OS reqs = 1126 MG
    Buffer Memory = 900 MG
    DataMemory = 10860 MG
    IndexMemory = 1358 MG

    DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
    DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
    DPBM = 1712

    Therefore DiskPageBufferMemory should be mimimum 1712M, so setting it to
    2048M should leave me loads of room right ?

    So can anyone tell me why we are having this issue with Stopped
    datanodes ? It isn't doing much for the pointy-haired bosses'
    confidence !

    Obviously let me know if I really need to file a bug, and I'll upload
    the tracelog etc ...

    Richard
  • Sammut, Etienne, VF-MT at Oct 26, 2009 at 2:37 pm
    Hi Richard

    But what is the amount of deleted records during the night? I found this
    problem when my delete statement had to delete more than 200k records ..
    thus I modified my delete procedure to delete batches of 10000. Hope
    this helps you

    Regards
    Etienne Sammut

    -----Original Message-----
    From: Richard McCluskey
    Sent: Monday, October 26, 2009 3:35 PM
    To: Jonas Oreland
    Cc: cluster@lists.mysql.com
    Subject: Re: GCP stop signal brings down data node
    On Mon, 2009-10-26 at 15:24 +0100, Jonas Oreland wrote:
    What kind of transactions do you run ?
    Disk-based NDB is currently a bit sensitive to "big" transactions
    90% of our DB work is single record reads/writes. we do use some
    functions and stored procedures, but we do nothing that pulls large
    datasets.
    E.G. Our current most common table has 9.8 million rows, but we only
    ever pull out single records by primary key. The cleanup cron for stale
    records runs in the middle of the night when our traffic is almost nil.

    I hope this is what you were asking !

    Richard


    /Jonas

    Richard McCluskey wrote:
    Got this lovely message from a data node this weekend :

    Time: Sunday 25 October 2009 - 15:04:01
    Status: Temporary error, restart node
    Message: System error, node killed during node restart by other node
    (Internal error, programming error or missing error message, please
    report a bug)
    Error: 2303
    Error data: Node 4 killed this node because GCP stop was detected
    Error object: NDBCNTR (Line: 263) 0x0000000a
    Program: ndbd
    Pid: 21156
    Trace: /var/lib/mysql/ndb_4_trace.log.5
    Version: mysql-5.1.35 ndb-7.0.7
    ***EOM***

    Interesting thing is there was no other node that got restarted. I
    did
    look this up and see that it can be caused by slow disk (we use disk
    based NDB), or insufficient disk throughput but my disks are raided
    SCSI, so I doubt it is that. From my config.ini (relevant parts
    only) :
    DataMemory=10860M
    IndexMemory=1358M
    SharedGlobalMemory=384M
    DiskPageBufferMemory=2048M

    I did the following caluculations for memory allocation :

    total memory = 16384 MG
    OS reqs = 1126 MG
    Buffer Memory = 900 MG
    DataMemory = 10860 MG
    IndexMemory = 1358 MG

    DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
    DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
    DPBM = 1712

    Therefore DiskPageBufferMemory should be mimimum 1712M, so setting
    it to
    2048M should leave me loads of room right ?

    So can anyone tell me why we are having this issue with Stopped
    datanodes ? It isn't doing much for the pointy-haired bosses'
    confidence !

    Obviously let me know if I really need to file a bug, and I'll
    upload
    the tracelog etc ...

    Richard
    --
    MySQL Cluster Mailing List
    For list archives: http://lists.mysql.com/cluster
    To unsubscribe:
    http://lists.mysql.com/cluster?unsub=etienne.sammut@vodafone.com

    -------------------------------------------------------------------------------------
    Vodafone
    -------------------------------------------------------------------------------------

    This email is intended only for the use of individuals to whom it is addressed, as it may contain confidential or privileged information. If you are not a named addressee, intended recipient, or the person responsible for delivering the message to the named addressee, be advised that you have received this email in error and that you should not disseminate, distribute, print, copy this mail or otherwise divulge its contents. In such instances, please notify Vodafone Malta Limited on telephone number +356 99999247 and delete this email from your system. Since this transmission was affected via email, Vodafone Malta Limited cannot guarantee that it is secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Vodafone Malta Limited does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission.

    Save the environment for our children - Print e-mail only when necessary.
  • Richard McCluskey at Oct 26, 2009 at 2:47 pm
    currently the delete batch process is in the hundreds of rows realm, as
    we are still ramping up, so I dont think than can be the issue...
    thanks for the thought though, I'll definitely make sure we batch things
    in the future.

    Richard

    On Mon, 2009-10-26 at 15:37 +0100, Sammut, Etienne, VF-MT wrote:
    Hi Richard

    But what is the amount of deleted records during the night? I found this
    problem when my delete statement had to delete more than 200k records ..
    thus I modified my delete procedure to delete batches of 10000. Hope
    this helps you

    Regards
    Etienne Sammut

    -----Original Message-----
    From: Richard McCluskey
    Sent: Monday, October 26, 2009 3:35 PM
    To: Jonas Oreland
    Cc: cluster@lists.mysql.com
    Subject: Re: GCP stop signal brings down data node
    On Mon, 2009-10-26 at 15:24 +0100, Jonas Oreland wrote:
    What kind of transactions do you run ?
    Disk-based NDB is currently a bit sensitive to "big" transactions
    90% of our DB work is single record reads/writes. we do use some
    functions and stored procedures, but we do nothing that pulls large
    datasets.
    E.G. Our current most common table has 9.8 million rows, but we only
    ever pull out single records by primary key. The cleanup cron for stale
    records runs in the middle of the night when our traffic is almost nil.

    I hope this is what you were asking !

    Richard


    /Jonas

    Richard McCluskey wrote:
    Got this lovely message from a data node this weekend :

    Time: Sunday 25 October 2009 - 15:04:01
    Status: Temporary error, restart node
    Message: System error, node killed during node restart by other node
    (Internal error, programming error or missing error message, please
    report a bug)
    Error: 2303
    Error data: Node 4 killed this node because GCP stop was detected
    Error object: NDBCNTR (Line: 263) 0x0000000a
    Program: ndbd
    Pid: 21156
    Trace: /var/lib/mysql/ndb_4_trace.log.5
    Version: mysql-5.1.35 ndb-7.0.7
    ***EOM***

    Interesting thing is there was no other node that got restarted. I
    did
    look this up and see that it can be caused by slow disk (we use disk
    based NDB), or insufficient disk throughput but my disks are raided
    SCSI, so I doubt it is that. From my config.ini (relevant parts
    only) :
    DataMemory=10860M
    IndexMemory=1358M
    SharedGlobalMemory=384M
    DiskPageBufferMemory=2048M

    I did the following caluculations for memory allocation :

    total memory = 16384 MG
    OS reqs = 1126 MG
    Buffer Memory = 900 MG
    DataMemory = 10860 MG
    IndexMemory = 1358 MG

    DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
    DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
    DPBM = 1712

    Therefore DiskPageBufferMemory should be mimimum 1712M, so setting
    it to
    2048M should leave me loads of room right ?

    So can anyone tell me why we are having this issue with Stopped
    datanodes ? It isn't doing much for the pointy-haired bosses'
    confidence !

    Obviously let me know if I really need to file a bug, and I'll
    upload
    the tracelog etc ...

    Richard
    --
    MySQL Cluster Mailing List
    For list archives: http://lists.mysql.com/cluster
    To unsubscribe:
    http://lists.mysql.com/cluster?unsub=etienne.sammut@vodafone.com

    -------------------------------------------------------------------------------------
    Vodafone
    -------------------------------------------------------------------------------------

    This email is intended only for the use of individuals to whom it is addressed, as it may contain confidential or privileged information. If you are not a named addressee, intended recipient, or the person responsible for delivering the message to the named addressee, be advised that you have received this email in error and that you should not disseminate, distribute, print, copy this mail or otherwise divulge its contents. In such instances, please notify Vodafone Malta Limited on telephone number +356 99999247 and delete this email from your system. Since this transmission was affected via email, Vodafone Malta Limited cannot guarantee that it is secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Vodafone Malta Limited does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission.

    Save the environment for our children - Print e-mail only when necessary.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcluster @
categoriesmysql
postedOct 26, '09 at 2:21p
activeOct 26, '09 at 2:47p
posts5
users3
websitemysql.com
irc#mysql

People

Translate

site design / logo © 2018 Grokbase