FAQ
The default is set to 60s. many of my dfs -put commands would seem to
hang - and lowering the timeout (to 1s) seems to have made things a
whole lot better.



General curiosity - isn't 60s just huge for a rpc timeout? (a web search
indicates that nutch may be setting it to 10s - and even that seems
fairly large). Would love to get a backgrounder on why the default is
set to so large a value ..



Thanks,



Joydeep

Search Discussions

  • Devaraj Das at Sep 5, 2007 at 8:00 am
    This is to take care of cases where a particular server is too loaded to
    respond to client RPCs quick enough. Setting the timeout to a large value
    ensures that RPCs won't timeout that often and thereby potentially lead to
    lesser failures (for e.g., a map/reduce task kills itself when it fails to
    invoke an RPC on the tasktracker for three times in a row) and retries.
    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Wednesday, September 05, 2007 12:26 PM
    To: hadoop-user@lucene.apache.org
    Subject: ipc.client.timeout

    The default is set to 60s. many of my dfs -put commands would
    seem to hang - and lowering the timeout (to 1s) seems to
    have made things a whole lot better.



    General curiosity - isn't 60s just huge for a rpc timeout? (a
    web search indicates that nutch may be setting it to 10s -
    and even that seems fairly large). Would love to get a
    backgrounder on why the default is set to so large a value ..



    Thanks,



    Joydeep
  • Joydeep Sen Sarma at Sep 13, 2007 at 7:14 pm
    I would love to use a lower timeout. It seems that retries are either
    buggy or missing in some cases - that cause lots of failures. The cases
    I can see right now (0.13.1):

    - namenode.complete: looks like it retries - but may not be idempotent?

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
    complete write to file
    /user/facebook/profiles/binary/users_joined/_task_0018_r_000003_0/.part-
    00003.crc by DFSClient_task_0018_r_000003_0
    at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:353)


    - namenode.addBlock: no retry policy (looking at DFSClient.java)
    - namenode.mkdirs: no retry policy ('')

    We see plenty of all of these with a lowered timeout. With a high
    timeout - we have seen very slow recovery from some failures (jobs would
    hang on submission).

    Don't understand the fs protocol well enough - any idea if these are
    fixable?

    Thx,

    Joydeep

    -----Original Message-----
    From: Devaraj Das
    Sent: Wednesday, September 05, 2007 1:00 AM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    This is to take care of cases where a particular server is too loaded to
    respond to client RPCs quick enough. Setting the timeout to a large
    value
    ensures that RPCs won't timeout that often and thereby potentially lead
    to
    lesser failures (for e.g., a map/reduce task kills itself when it fails
    to
    invoke an RPC on the tasktracker for three times in a row) and retries.
    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Wednesday, September 05, 2007 12:26 PM
    To: hadoop-user@lucene.apache.org
    Subject: ipc.client.timeout

    The default is set to 60s. many of my dfs -put commands would
    seem to hang - and lowering the timeout (to 1s) seems to
    have made things a whole lot better.



    General curiosity - isn't 60s just huge for a rpc timeout? (a
    web search indicates that nutch may be setting it to 10s -
    and even that seems fairly large). Would love to get a
    backgrounder on why the default is set to so large a value ..



    Thanks,



    Joydeep
  • Dhruba Borthakur at Sep 13, 2007 at 8:38 pm
    Hi Jaydeep,

    The idea is to retry only those operations that are idempotent. addBlocks
    and mkdirs are non-idempotent, and that's why they are no retries for these
    calls.

    Can you tell me if a CPU bottleneck on your Namenode is causing you to
    encounter all these timeout?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 12:14 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    I would love to use a lower timeout. It seems that retries are either
    buggy or missing in some cases - that cause lots of failures. The cases
    I can see right now (0.13.1):

    - namenode.complete: looks like it retries - but may not be idempotent?

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
    complete write to file
    /user/facebook/profiles/binary/users_joined/_task_0018_r_000003_0/.part-
    00003.crc by DFSClient_task_0018_r_000003_0
    at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:353)


    - namenode.addBlock: no retry policy (looking at DFSClient.java)
    - namenode.mkdirs: no retry policy ('')

    We see plenty of all of these with a lowered timeout. With a high
    timeout - we have seen very slow recovery from some failures (jobs would
    hang on submission).

    Don't understand the fs protocol well enough - any idea if these are
    fixable?

    Thx,

    Joydeep

    -----Original Message-----
    From: Devaraj Das
    Sent: Wednesday, September 05, 2007 1:00 AM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    This is to take care of cases where a particular server is too loaded to
    respond to client RPCs quick enough. Setting the timeout to a large
    value
    ensures that RPCs won't timeout that often and thereby potentially lead
    to
    lesser failures (for e.g., a map/reduce task kills itself when it fails
    to
    invoke an RPC on the tasktracker for three times in a row) and retries.
    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Wednesday, September 05, 2007 12:26 PM
    To: hadoop-user@lucene.apache.org
    Subject: ipc.client.timeout

    The default is set to 60s. many of my dfs -put commands would
    seem to hang - and lowering the timeout (to 1s) seems to
    have made things a whole lot better.



    General curiosity - isn't 60s just huge for a rpc timeout? (a
    web search indicates that nutch may be setting it to 10s -
    and even that seems fairly large). Would love to get a
    backgrounder on why the default is set to so large a value ..



    Thanks,



    Joydeep
  • Ted Dunning at Sep 13, 2007 at 8:48 pm
    Can idempotency be retro-fitted with a client generated random key? That
    way the server can remember recent transaction keys (say the last minute of
    keys) and ignore redundant requests.

    On 9/13/07 1:37 PM, "Dhruba Borthakur" wrote:

    Hi Jaydeep,

    The idea is to retry only those operations that are idempotent. addBlocks
    and mkdirs are non-idempotent, and that's why they are no retries for these
    calls.

    Can you tell me if a CPU bottleneck on your Namenode is causing you to
    encounter all these timeout?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 12:14 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    I would love to use a lower timeout. It seems that retries are either
    buggy or missing in some cases - that cause lots of failures. The cases
    I can see right now (0.13.1):

    - namenode.complete: looks like it retries - but may not be idempotent?

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
    complete write to file
    /user/facebook/profiles/binary/users_joined/_task_0018_r_000003_0/.part-
    00003.crc by DFSClient_task_0018_r_000003_0
    at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:353)


    - namenode.addBlock: no retry policy (looking at DFSClient.java)
    - namenode.mkdirs: no retry policy ('')

    We see plenty of all of these with a lowered timeout. With a high
    timeout - we have seen very slow recovery from some failures (jobs would
    hang on submission).

    Don't understand the fs protocol well enough - any idea if these are
    fixable?

    Thx,

    Joydeep

    -----Original Message-----
    From: Devaraj Das
    Sent: Wednesday, September 05, 2007 1:00 AM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    This is to take care of cases where a particular server is too loaded to
    respond to client RPCs quick enough. Setting the timeout to a large
    value
    ensures that RPCs won't timeout that often and thereby potentially lead
    to
    lesser failures (for e.g., a map/reduce task kills itself when it fails
    to
    invoke an RPC on the tasktracker for three times in a row) and retries.
    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Wednesday, September 05, 2007 12:26 PM
    To: hadoop-user@lucene.apache.org
    Subject: ipc.client.timeout

    The default is set to 60s. many of my dfs -put commands would
    seem to hang - and lowering the timeout (to 1s) seems to
    have made things a whole lot better.



    General curiosity - isn't 60s just huge for a rpc timeout? (a
    web search indicates that nutch may be setting it to 10s -
    and even that seems fairly large). Would love to get a
    backgrounder on why the default is set to so large a value ..



    Thanks,



    Joydeep
  • Joydeep Sen Sarma at Sep 13, 2007 at 8:49 pm
    There is a retry for the 'complete' operation - those are erroring out
    as well. (DFSClient.java: methodNameToPolicyMap.put("complete",
    methodPolicy);)

    Quite likely it's because the namenode is also a data/task node.

    -----Original Message-----
    From: Dhruba Borthakur
    Sent: Thursday, September 13, 2007 1:38 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    Hi Jaydeep,

    The idea is to retry only those operations that are idempotent.
    addBlocks
    and mkdirs are non-idempotent, and that's why they are no retries for
    these
    calls.

    Can you tell me if a CPU bottleneck on your Namenode is causing you to
    encounter all these timeout?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 12:14 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    I would love to use a lower timeout. It seems that retries are either
    buggy or missing in some cases - that cause lots of failures. The cases
    I can see right now (0.13.1):

    - namenode.complete: looks like it retries - but may not be idempotent?

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
    complete write to file
    /user/facebook/profiles/binary/users_joined/_task_0018_r_000003_0/.part-
    00003.crc by DFSClient_task_0018_r_000003_0
    at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:353)


    - namenode.addBlock: no retry policy (looking at DFSClient.java)
    - namenode.mkdirs: no retry policy ('')

    We see plenty of all of these with a lowered timeout. With a high
    timeout - we have seen very slow recovery from some failures (jobs would
    hang on submission).

    Don't understand the fs protocol well enough - any idea if these are
    fixable?

    Thx,

    Joydeep

    -----Original Message-----
    From: Devaraj Das
    Sent: Wednesday, September 05, 2007 1:00 AM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    This is to take care of cases where a particular server is too loaded to
    respond to client RPCs quick enough. Setting the timeout to a large
    value
    ensures that RPCs won't timeout that often and thereby potentially lead
    to
    lesser failures (for e.g., a map/reduce task kills itself when it fails
    to
    invoke an RPC on the tasktracker for three times in a row) and retries.
    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Wednesday, September 05, 2007 12:26 PM
    To: hadoop-user@lucene.apache.org
    Subject: ipc.client.timeout

    The default is set to 60s. many of my dfs -put commands would
    seem to hang - and lowering the timeout (to 1s) seems to
    have made things a whole lot better.



    General curiosity - isn't 60s just huge for a rpc timeout? (a
    web search indicates that nutch may be setting it to 10s -
    and even that seems fairly large). Would love to get a
    backgrounder on why the default is set to so large a value ..



    Thanks,



    Joydeep
  • Doug Cutting at Sep 13, 2007 at 8:54 pm

    Joydeep Sen Sarma wrote:
    Quite likely it's because the namenode is also a data/task node.
    That doesn't sound like a "best practice"...

    Doug
  • Joydeep Sen Sarma at Sep 13, 2007 at 9:14 pm
    Learning the hard way :-)

    Second Ted's last mail (all the way back to Sun RPC - server can keep
    track of completed RPC calls and reply success to client retries if op
    already performed).

    -----Original Message-----
    From: Doug Cutting
    Sent: Thursday, September 13, 2007 1:54 PM
    To: hadoop-user@lucene.apache.org
    Subject: Re: ipc.client.timeout

    Joydeep Sen Sarma wrote:
    Quite likely it's because the namenode is also a data/task node.
    That doesn't sound like a "best practice"...

    Doug
  • Dhruba Borthakur at Sep 13, 2007 at 9:21 pm
    We have discussed the approach of remembering completed RPCs (and there
    status codes, return parameters, etc) so that a retry of a previously
    executed RPC can get back identical results. But we have not implemented
    this yet.

    In the short term, it would be nice if you can make the Namenode run on a
    dedicated machine (no Datanodes, tasktrackers, etc on this machine). Also,
    how many files does ur cluster have and how much is the main memory on the
    Namenode machine? How much memory is the Namenode jvm configured to use?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 2:16 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    Learning the hard way :-)

    Second Ted's last mail (all the way back to Sun RPC - server can keep
    track of completed RPC calls and reply success to client retries if op
    already performed).

    -----Original Message-----
    From: Doug Cutting
    Sent: Thursday, September 13, 2007 1:54 PM
    To: hadoop-user@lucene.apache.org
    Subject: Re: ipc.client.timeout

    Joydeep Sen Sarma wrote:
    Quite likely it's because the namenode is also a data/task node.
    That doesn't sound like a "best practice"...

    Doug
  • Joydeep Sen Sarma at Sep 13, 2007 at 9:43 pm
    - fixed namenode to not be data/task node
    - 31K files right now
    - haven't played around with memory options - namenode still running
    with xmx1000m - I can bump this up (8G memory available)

    Btw - from what I see in code - the server is likely discarding the
    client call (and not performing the operation at all). Another (dumber)
    approach for handling the idempotency issue would be for the client to
    retry anyway - in most cases, the server would not have performed the
    operation. In the minority of the cases where the server already
    performed the operation - the client can report a timeout error (instead
    of the actual error). (ie. It's almost as if the last retry was not
    performed). (there could be some flaw in this logic - just can't think
    of one right now)

    -----Original Message-----
    From: Dhruba Borthakur
    Sent: Thursday, September 13, 2007 2:21 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    We have discussed the approach of remembering completed RPCs (and there
    status codes, return parameters, etc) so that a retry of a previously
    executed RPC can get back identical results. But we have not implemented
    this yet.

    In the short term, it would be nice if you can make the Namenode run on
    a
    dedicated machine (no Datanodes, tasktrackers, etc on this machine).
    Also,
    how many files does ur cluster have and how much is the main memory on
    the
    Namenode machine? How much memory is the Namenode jvm configured to use?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 2:16 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    Learning the hard way :-)

    Second Ted's last mail (all the way back to Sun RPC - server can keep
    track of completed RPC calls and reply success to client retries if op
    already performed).

    -----Original Message-----
    From: Doug Cutting
    Sent: Thursday, September 13, 2007 1:54 PM
    To: hadoop-user@lucene.apache.org
    Subject: Re: ipc.client.timeout

    Joydeep Sen Sarma wrote:
    Quite likely it's because the namenode is also a data/task node.
    That doesn't sound like a "best practice"...

    Doug
  • Dhruba Borthakur at Sep 13, 2007 at 9:51 pm
    Hi Joydeep,

    Thanks for your comments. Really appreciate it.

    For the Namenode configuration, please see if you can use most of the memory
    available on the machine. Maybe a param of -xmx7000 or so shud do it. Also,
    you might want to bump up the number of Namenode handler threads,
    dfs.namenode.handler.count. By default this is set to 10. It might make
    sense to set this to 40 or so.

    Thanks,
    dhruba

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 2:45 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    - fixed namenode to not be data/task node
    - 31K files right now
    - haven't played around with memory options - namenode still running
    with xmx1000m - I can bump this up (8G memory available)

    Btw - from what I see in code - the server is likely discarding the
    client call (and not performing the operation at all). Another (dumber)
    approach for handling the idempotency issue would be for the client to
    retry anyway - in most cases, the server would not have performed the
    operation. In the minority of the cases where the server already
    performed the operation - the client can report a timeout error (instead
    of the actual error). (ie. It's almost as if the last retry was not
    performed). (there could be some flaw in this logic - just can't think
    of one right now)

    -----Original Message-----
    From: Dhruba Borthakur
    Sent: Thursday, September 13, 2007 2:21 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    We have discussed the approach of remembering completed RPCs (and there
    status codes, return parameters, etc) so that a retry of a previously
    executed RPC can get back identical results. But we have not implemented
    this yet.

    In the short term, it would be nice if you can make the Namenode run on
    a
    dedicated machine (no Datanodes, tasktrackers, etc on this machine).
    Also,
    how many files does ur cluster have and how much is the main memory on
    the
    Namenode machine? How much memory is the Namenode jvm configured to use?

    Thanks,
    dhruba


    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Thursday, September 13, 2007 2:16 PM
    To: hadoop-user@lucene.apache.org
    Subject: RE: ipc.client.timeout

    Learning the hard way :-)

    Second Ted's last mail (all the way back to Sun RPC - server can keep
    track of completed RPC calls and reply success to client retries if op
    already performed).

    -----Original Message-----
    From: Doug Cutting
    Sent: Thursday, September 13, 2007 1:54 PM
    To: hadoop-user@lucene.apache.org
    Subject: Re: ipc.client.timeout

    Joydeep Sen Sarma wrote:
    Quite likely it's because the namenode is also a data/task node.
    That doesn't sound like a "best practice"...

    Doug

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 5, '07 at 6:56a
activeSep 13, '07 at 9:51p
posts11
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase