FAQ
JobTracker should not try to promote a (map) task if it dis not write to DFS at all
-----------------------------------------------------------------------------------

Key: HADOOP-3140
URL: https://issues.apache.org/jira/browse/HADOOP-3140
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: Runping Qi



In most cases, map tasks do not write to dfs.
Thus, when they complete, they should not be put into commit_pending queue at all.
This will improve the task promotion significantly.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Arun C Murthy (JIRA) at Mar 31, 2008 at 9:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583888#action_12583888 ]

    Arun C Murthy commented on HADOOP-3140:
    ---------------------------------------

    I agree, in principle.

    However, there is currently no way to check if the maps wrote side-files to HDFS, in which case we either need a new api for tasks (or jobs) to tell whether they are writing side-files and hence they need promotion or worse, we need to look into the _${taskid} directories and try and guess. Both seem unsatisfactory ...
    JobTracker should not try to promote a (map) task if it dis not write to DFS at all
    -----------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 31, 2008 at 9:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Owen O'Malley updated HADOOP-3140:
    ----------------------------------

    Description:
    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.



    was:

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.



    Summary: JobTracker should not try to promote a (map) task if it does not write to DFS at all (was: JobTracker should not try to promote a (map) task if it dis not write to DFS at all)

    I think that the tasks should include a boolean in the done message to the task tracker that says if they have output to promote. (And it should delete everything in the case of failure, locally.) This is just an optimization. The framework (TaskTracker.Child.main) would look in the work output directory and set true if there is anything to promote. The TT would then set the state to commit-pending or success according to the flag value.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 1, 2008 at 3:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584183#action_12584183 ]

    Amar Kamat commented on HADOOP-3140:
    ------------------------------------

    How about this
    1) {{Task.done()}} method checks if the task has data to be promoted and passes this info to the TaskTracker via the {{TaskTracker.done()}} api.
    2) If there is no data to promote, the TaskTracker sets the task status as {{SUCCEEDED}} or {{FAILED}} depending on whether the task succeeds or fails.
    3) JobInProgress adds only {{COMMIT_PENDING}} tasks to the commit-pending queue. The commit-pending queue deals with {{KILLED/FAILED}} tasks only if the commit-pending thread fails to save the task output or if the TaskTracker is lost.
    4) Temporary data from {{FAILED/KILLED}} tasks will be deleted once the job completes (see HADOOP-2391).
    5) {{JobInProgress.updateTaskStatus()}} can now be called with {{SUCCEEDED}} state from TaskTracker (via heartbeat) or from the commit-pending queue.
    5) If a JobInProgress.updateTaskStatus() is called with {{SUCCEEDED}} state for a completed TIP it will be marked as {{KILLED}}.

    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Apr 2, 2008 at 6:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584419#action_12584419 ]

    Arun C Murthy commented on HADOOP-3140:
    ---------------------------------------

    {quote}
    1) Task.done() method checks if the task has data to be promoted and passes this info to the TaskTracker via the TaskTracker.done() api.
    2) If there is no data to promote, the TaskTracker sets the task status as SUCCEEDED or FAILED depending on whether the task succeeds or fails.
    {quote}

    +1

    In addition, we should discard outputs of failed tasks in TaskTracker.Child.main if feasible in the 'finally' clause in TaskTracker.Child.main. Then we could just set the status to 'FAILED/KILLED' and relieve of the need to discard outputs in a lot of cases. We could go further and do the same in the TT too to ensure that the JT only needs to promote outputs of successful tasks... clearly it needs some careful thought.


    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 6:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Work on HADOOP-3140 started by Amar Kamat.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 6:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat reassigned HADOOP-3140:
    ----------------------------------

    Assignee: Amar Kamat
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 2, 2008 at 6:31 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584427#action_12584427 ]

    Devaraj Das commented on HADOOP-3140:
    -------------------------------------

    We actually don't need to discard output (at the cost of creating some temp garbage on the dfs). The jobtracker deletes the temp dir for the job at the end of the job (HADOOP-2391). That way we will save a bunch of namenode RPCs.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 6:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584430#action_12584430 ]

    Amar Kamat commented on HADOOP-3140:
    ------------------------------------

    bq. In addition, we should discard outputs of failed tasks in TaskTracker.Child.main
    Reiterating #4 from my earlier comment. Here we might ignore the failed/killed tasks and never call discard. It will be taken care once the job completes. This is a simple approach. Another approach is to have a scavenger thread that will periodically do this cleanup business *offline*.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 6:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584433#action_12584433 ]

    Amar Kamat commented on HADOOP-3140:
    ------------------------------------

    But for now leaving the garbage as it is and reclaiming it once the job finishes seems to be a simple/better solution.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Apr 2, 2008 at 6:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584436#action_12584436 ]

    Arun C Murthy commented on HADOOP-3140:
    ---------------------------------------

    Right, I missed that...
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat

    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 2:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Attachment: HADOOP-3140-v1.patch
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Attachments: HADOOP-3140-v1.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 2:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Fix Version/s: 0.17.0
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Apr 2, 2008 at 2:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584603#action_12584603 ]

    Arun C Murthy commented on HADOOP-3140:
    ---------------------------------------

    Looks good, couple of comments:

    1. I'm a little bothered by
    {noformat}
    + // If the TIP is already completed and the task reports as SUCCEEDED then
    + // mark the task as KILLED.
    + // In case of task with no promotion the task tracker will mark the task
    + // as SUCCEEDED.
    + if (wasComplete && (status.getRunState() == TaskStatus.State.SUCCEEDED)) {
    + status.setRunState(TaskStatus.State.KILLED);
    + }
    boolean change = tip.updateStatus(status);
    if (change) {
    TaskStatus.State state = status.getRunState();
    {noformat}
    Normally I'd expect the first check inside the 'if (change)' to make sure the same status isn't being processed twice, and wrongly manipulates the state of the TIP - I'm happy if you can confirm that this works... just being careful.

    2. Please bump up TaskUmbilicalProtocol's version number.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 2, 2008 at 6:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584711#action_12584711 ]

    Amar Kamat commented on HADOOP-3140:
    ------------------------------------

    Arun, Two things
    1) If the status is replayed by the TaskTracker, the JobTracker will take care of that. The {{JobTracker.heartbeat()}} will simply discard it there and then.
    2) If at all the status gets replayed (in {{JobInProgress.updateTaskStatus()}}) it will be taken care as follows
    a) task t comes in as {{SUCCEEDED}} for a tip that is already completed.
    b) It will be marked (locally) as {{KILLED}} and the tasks status will be updated in the JT.
    c) If at all the status is resent, it will be marked locally as {{KILLED}}. Now the *change* in the status will result in as _false_ and nothing will happen.
    The reason for marking the task as {{KILLED}} (locally) is to make sure that the semantics of the trunk is retained. If the state is updated first and later marked as {{KILLED}} then the task status will be temporarily marked as {{SUCCEEDED}}.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 3, 2008 at 11:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585060#action_12585060 ]

    Amar Kamat commented on HADOOP-3140:
    ------------------------------------

    Looks like we can optimize it further. For checking whether the task output dir is empty or not we can do the following
    {code}
    if (taskOutputPath != null) {
    // Get the file-system for the task output directory
    FileSystem fs = taskOutputPath.getFileSystem(conf);
    // Check if it exists
    if (fs.exists(taskOutputPath)) {
    // Get the summary for the folder
    ContentSummary summary = fs.getContentSummary(taskOutputPath);
    // Check if the directory contains some data
    // i.e total-files + total-folders - 1(itself)
    if ((summary.getFileCount() + summary.getDirectoryCount() - 1) > 0) {
    shouldBePromoted = true;
    }
    }
    }
    {code}
    I have tested {{fs.getContentSummary()}} via the DFSClient and it works as expected. Comments?
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 3, 2008 at 2:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Attachment: HADOOP-3140-v2.patch
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 3, 2008 at 2:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Status: Patch Available (was: In Progress)
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Apr 3, 2008 at 5:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585199#action_12585199 ]

    dhruba borthakur commented on HADOOP-3140:
    ------------------------------------------

    Like Amar mentioned, it would be nice if we can eliminate the call to fs.exists() in the previous code snippet, especially if this code snippet is executed frequently. fs.getContentSummary() probably throws an exception if the file does not exists.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 3, 2008 at 8:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585273#action_12585273 ]

    Devaraj Das commented on HADOOP-3140:
    -------------------------------------

    Dhruba, is that a documented exception. I didn't see it in the FileSystem.getContentSummary API doc. So if it is not documented is it advisable to bank client code on the exception? For e.g., what if getContentSummary, later on, returns null for non existent paths? So, unless FileSystem provides a guarantee that an exception will be thrown for non-existent paths, i'd like to go in the lines of what Amar mentioned in the code snippet. Thoughts?
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Apr 3, 2008 at 8:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585275#action_12585275 ]

    Raghu Angadi commented on HADOOP-3140:
    --------------------------------------
    So, unless FileSystem provides a guarantee that an exception will be thrown for non-existent paths, i'd like to go in the lines of what Amar mentioned in the code snippet. Thoughts?
    Then, should the code handle summary being null? (exists() is previous line does not mean it exists during next line).

    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 4, 2008 at 1:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585363#action_12585363 ]

    Hadoop QA commented on HADOOP-3140:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12379270/HADOOP-3140-v2.patch
    against trunk revision 643282.

    @author +1. The patch does not contain any @author tags.

    tests included -1. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/console

    This message is automatically generated.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Apr 4, 2008 at 5:01 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585398#action_12585398 ]

    Owen O'Malley commented on HADOOP-3140:
    ---------------------------------------

    I'm very strongly against using exceptions as part of the nominal flow of the program.

    I much prefer the exists check.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 4, 2008 at 7:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Attachment: HADOOP-3140-v3.patch

    Attaching a patch with following changes
    1) _Not null_ check for summary
    2) In case of exception making the promotion necessary.

    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 4, 2008 at 7:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Status: Open (was: Patch Available)
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 4, 2008 at 7:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Status: Patch Available (was: Open)
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Apr 4, 2008 at 12:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amar Kamat updated HADOOP-3140:
    -------------------------------

    Attachment: HADOOP-3140-v3.patch

    One unnecessary import statement slipped in. This patch just removes that.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 4, 2008 at 12:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585506#action_12585506 ]

    Devaraj Das commented on HADOOP-3140:
    -------------------------------------

    +1
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 4, 2008 at 4:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585607#action_12585607 ]

    Hadoop QA commented on HADOOP-3140:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12379382/HADOOP-3140-v3.patch
    against trunk revision 643282.

    @author +1. The patch does not contain any @author tags.

    tests included -1. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/console

    This message is automatically generated.
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 4, 2008 at 4:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-3140:
    --------------------------------

    Resolution: Fixed
    Release Note: Tasks that don't generate any output are not inserted in the commit queue of the JobTracker. They are marked as SUCCESSFUL by the TaskTracker and the JobTracker updates their state short-circuiting the commit queue.
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Amar!
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Apr 5, 2008 at 12:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585966#action_12585966 ]

    Hudson commented on HADOOP-3140:
    --------------------------------

    Integrated in Hadoop-trunk #451 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/])
    JobTracker should not try to promote a (map) task if it does not write to DFS at all
    ------------------------------------------------------------------------------------

    Key: HADOOP-3140
    URL: https://issues.apache.org/jira/browse/HADOOP-3140
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Runping Qi
    Assignee: Amar Kamat
    Fix For: 0.17.0

    Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch


    In most cases, map tasks do not write to dfs.
    Thus, when they complete, they should not be put into commit_pending queue at all.
    This will improve the task promotion significantly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 31, '08 at 9:24p
activeApr 5, '08 at 12:16p
posts31
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 31 posts

People

Translate

site design / logo © 2022 Grokbase