FAQ
Within a task, the value ofJobConf.getOutputPath() method is modified
---------------------------------------------------------------------

Key: HADOOP-3041
URL: https://issues.apache.org/jira/browse/HADOOP-3041
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.16.1
Environment: all
Reporter: Alejandro Abdelnur
Priority: Blocker
Fix For: 0.16.2


Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.

For example: /user/foo/myoutput/part_00000

In 0.16.1, now it returns an internal hadoop for the task output temporary location.

For the above example: /user/foo/myoutput/_temporary/part_00000

This change breaks applications that use the getOutputPath() to compute other directories.

IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Devaraj Das (JIRA) at Mar 18, 2008 at 1:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579846#action_12579846 ]

    Devaraj Das commented on HADOOP-3041:
    -------------------------------------

    Alejandro, the reason for modifying the job's output dir is to let user apps transparently deal with things like creation of side files in the task's output directory, and, speculative tasks creating the same output files. Another reason is that the getOutputPath can be used (and is usually used) in the OutputFormat implementation. All user code could use getOutputPath and create task specific stuff there and the framework automatically promotes/discards these files upon successful/failed task completion. Look at the JavaDoc in JobConf.getOutputPath() to get a clear explanation of what i am trying to say (by the way this doc needs to be fixed to include _temporary).
    You are facing the problem since you create a directory in the _same level_ as the _actual_ output directory of the job. One way to address your problem is to provide an additional API like JobConf.getConfiguredOutputPath that would internally do things like getOutputPath.getParent(), etc. and return you the actual configured directory. This will ensure that your apps don't break when the framework changes the directory structure of the output path, etc. Not the best solution but we have to arrive at a compromise between your requirement and what we already document and provide. Thoughts?
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alejandro Abdelnur (JIRA) at Mar 18, 2008 at 3:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579899#action_12579899 ]

    Alejandro Abdelnur commented on HADOOP-3041:
    --------------------------------------------

    If there is a method returning the original path is OK.

    But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value?

    Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then.



    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 18, 2008 at 5:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579938#action_12579938 ]

    Devaraj Das commented on HADOOP-3041:
    -------------------------------------

    bq. But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value?

    Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it).. Also, if the user really intends to create a side file in the output directory of the job, it is slightly unintuitive IMO to have the user invoke getTaskOutputPath. But yes I agree that getOutputPath returning the task's output path is unintuitive as well. I wish this was clearer. I am unhappy about it too..

    bq. Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then.
    Hmm..


    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 19, 2008 at 11:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu reassigned HADOOP-3041:
    -----------------------------------------------

    Assignee: Amareshwari Sriramadasu
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Runping Qi (JIRA) at Mar 19, 2008 at 12:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580342#action_12580342 ]

    Runping Qi commented on HADOOP-3041:
    ------------------------------------


    bq: Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it)..

    Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath.
    I have many of such applications.


    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 19, 2008 at 5:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580447#action_12580447 ]

    Devaraj Das commented on HADOOP-3041:
    -------------------------------------

    bq. Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath.

    I guess the solution we are driving towards is that we will have an API called JobConf.getFinalOutputPath() and define a private job config variable that will store the dir what the user originally sets during job submission. This config variable is never updated except during job submission.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sameer Paranjpye (JIRA) at Mar 19, 2008 at 5:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580452#action_12580452 ]

    Sameer Paranjpye commented on HADOOP-3041:
    ------------------------------------------

    Silently changing public API semantics is bad, looks like we've done that here. How about we:
    - Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath()
    - Have getOutputPath() return the same thing as getCurrentOutputPath() because that breaks the least amount of code

    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 19, 2008 at 5:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580462#action_12580462 ]

    Devaraj Das commented on HADOOP-3041:
    -------------------------------------

    bq. Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath()
    +1
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 20, 2008 at 10:23 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Open)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 20, 2008 at 10:23 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    Here is a patch that deprecates getOutputPath() and adds new apis getCurrentOutputPath() and getFinalOutputPath() (all in JobConf)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 21, 2008 at 2:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580991#action_12580991 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378306/patch-3041.txt
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 15 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 21, 2008 at 3:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581094#action_12581094 ]

    Devaraj Das commented on HADOOP-3041:
    -------------------------------------

    Could you please submit a patch for the 0.16 branch. This one doesn't apply cleanly. Thanks!
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 21, 2008 at 3:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581094#action_12581094 ]

    devaraj edited comment on HADOOP-3041 at 3/21/08 8:33 AM:
    --------------------------------------------------------------

    I committed the patch to trunk. Could you please submit a patch for the 0.16 branch. This one doesn't apply cleanly. Thanks!

    was (Author: devaraj):
    Could you please submit a patch for the 0.16 branch. This one doesn't apply cleanly. Thanks!
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 21, 2008 at 6:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041-0.16.2.txt

    Patch for 0.16 branch. It passed all the ant tests successfully on my machine on branch 0.16.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 21, 2008 at 8:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-3041:
    --------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Amareshwari!
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Mar 23, 2008 at 5:39 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581342#action_12581342 ]

    Hudson commented on HADOOP-3041:
    --------------------------------

    Integrated in Hadoop-trunk #436 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/436/])
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 24, 2008 at 3:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581570#action_12581570 ]

    Owen O'Malley commented on HADOOP-3041:
    ---------------------------------------

    I think we need to revert this in 16.2. Breaking compatibility from 0.16.1 is worse that living with a recognized change from 0.16.0. Changing from getOutputPath makes the API inconsistent with setOutputPath, so for 0.17, I propose:

    * moving getOutputPath to be the final one
    * rename getCurrentOutputPath to getWorkOutputDirectory
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alejandro Abdelnur (JIRA) at Mar 25, 2008 at 10:43 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581863#action_12581863 ]

    Alejandro Abdelnur commented on HADOOP-3041:
    --------------------------------------------

    IMO Owen's proposal is the correct one.

    I'm fine with reverting behavior to the 0.16.0 if it gets addressed in 0.17 as we can workaround it.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 25, 2008 at 12:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das reopened HADOOP-3041:
    ---------------------------------


    I reverted the patch
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.16.2

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 25, 2008 at 12:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-3041:
    --------------------------------

    Fix Version/s: (was: 0.16.2)
    0.17.0
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 26, 2008 at 4:13 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582120#action_12582120 ]

    Amareshwari Sriramadasu commented on HADOOP-3041:
    -------------------------------------------------

    bq. so for 0.17, I propose:
    * moving getOutputPath to be the final one
    * rename getCurrentOutputPath to getWorkOutputDirectory


    Moving the getOutputPath to be the final one makes existing applications incompatible. ex. All the OutputFormat written by users has to use getWorkOutputDirectory instead of getOutputPath.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Mar 26, 2008 at 1:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582290#action_12582290 ]

    Hudson commented on HADOOP-3041:
    --------------------------------

    Integrated in Hadoop-trunk #442 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/442/])
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 27, 2008 at 11:07 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    Here is a patch adding
    1. getWorkOutputDirectory() returning task's temporary output directory which will be set by the framework.
    2. changing getOutputPath to hold the value specified by setOutputPath always.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 27, 2008 at 11:07 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Reopened)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 27, 2008 at 12:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    Patch has a documentation change from the earlier.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 27, 2008 at 12:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Open)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 27, 2008 at 12:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Open (was: Patch Available)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 27, 2008 at 1:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582643#action_12582643 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378708/patch-3041.txt
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 6 new or modified tests.

    javadoc -1. The javadoc tool appears to have generated 1 warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 27, 2008 at 4:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582708#action_12582708 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378715/patch-3041.txt
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 6 new or modified tests.

    javadoc -1. The javadoc tool appears to have generated 1 warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 28, 2008 at 4:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582892#action_12582892 ]

    Amareshwari Sriramadasu commented on HADOOP-3041:
    -------------------------------------------------

    Java doc warning is due to src/java/org/apache/hadoop/net/SocketInputStream.java, not related to this patch.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 31, 2008 at 10:04 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Open (was: Patch Available)

    Canceling patch to make setOutputPath and setWorkOutputDirectory have same code semantics.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 31, 2008 at 10:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    Patch has setWorkOutputDirectory code similar to setOutputPath.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Mar 31, 2008 at 10:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Open)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 31, 2008 at 10:18 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583620#action_12583620 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378939/patch-3041.txt
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 6 new or modified tests.

    patch -1. The patch command could not apply the patch.

    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2101/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Runping Qi (JIRA) at Mar 31, 2008 at 1:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583682#action_12583682 ]

    Runping Qi commented on HADOOP-3041:
    ------------------------------------


    My applications assume that getOutputPath() returns the temporary working dir.
    Will my apps be broken by this patch?
    I think many people are in similar situations.

    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Runping Qi (JIRA) at Mar 31, 2008 at 7:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583798#action_12583798 ]

    Runping Qi commented on HADOOP-3041:
    ------------------------------------


    I think it is better to removing the getOutputPath() from the api and replace it with something else.
    That way, the application can detect the problem at compile time.
    Otherwise, the apps will misbehave without any warnings.


    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Apr 1, 2008 at 4:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583999#action_12583999 ]

    Owen O'Malley commented on HADOOP-3041:
    ---------------------------------------

    *Sigh*

    One option is to move both get and set output path to FileInputFormat, which is where they actually belong. I only hate to do it because moving JobConf.setOutputPath will break 99% of map/reduce applications. In that case, it would look like:

    {code}
    FileInputFormat:
    getOutputPath
    setOutputPath
    getTaskOutputPath

    JobConf:
    deprecate getOutputPath
    deprecate setOutputPath
    {code}
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alejandro Abdelnur (JIRA) at Apr 1, 2008 at 5:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584003#action_12584003 ]

    Alejandro Abdelnur commented on HADOOP-3041:
    --------------------------------------------

    Our applications, which which are running on a previous hadoop version, when migrating to 0.16.0+ are failing because we assumed the returned value was the path of output part without any temporary stuff in it. So we are broken as well.

    As the Hadoop API gets refined changes like this will break things, for example FileSystem listPaths() now returns NULL instead an empty array when the dir does not exist.

    It is kind of painful but I would not deprecating methods with the right name because they were returning incorrect data.

    IMO the right thing to do is to have the getOutputPath() with the configured value and getWorkingOutputPath() with the temporary dir.

    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 3, 2008 at 11:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Open (was: Patch Available)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 3, 2008 at 11:58 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    After the discussions, attaching a patch that does the following:

    1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
    JobConf.getOutputPath() still returns the same value that it used to return.
    2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, are now extending FileOutputFormat.
    3. Adds the following APIs in FileOutputFormat :
    public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
    public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
    public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
    4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .

    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 3, 2008 at 12:00 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Open)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Apr 3, 2008 at 10:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585317#action_12585317 ]

    Owen O'Malley commented on HADOOP-3041:
    ---------------------------------------

    This looks like the right direction.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 3, 2008 at 10:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585335#action_12585335 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12379258/patch-3041.txt
    against trunk revision 643282.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 114 new or modified tests.

    patch -1. The patch command could not apply the patch.

    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2146/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 4, 2008 at 9:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Open (was: Patch Available)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 4, 2008 at 9:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Attachment: patch-3041.txt

    Patch in sync with the trunk
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amareshwari Sriramadasu (JIRA) at Apr 4, 2008 at 9:25 am
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HADOOP-3041:
    --------------------------------------------

    Status: Patch Available (was: Open)
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 4, 2008 at 5:32 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585647#action_12585647 ]

    Hadoop QA commented on HADOOP-3041:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12379370/patch-3041.txt
    against trunk revision 643282.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 114 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/console

    This message is automatically generated.
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 4, 2008 at 5:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-3041:
    --------------------------------

    Hadoop Flags: [Reviewed]

    +1
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley (JIRA) at Apr 4, 2008 at 8:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Nigel Daley updated HADOOP-3041:
    --------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks Amareshwari! Can we add a release note for this?
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Apr 4, 2008 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-3041:
    --------------------------------

    Release Note:
    1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
    JobConf.getOutputPath() still returns the same value that it used to return.
    2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat.
    3. Adds the following APIs in FileOutputFormat :
    public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
    public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
    public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
    4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .
    Within a task, the value ofJobConf.getOutputPath() method is modified
    ---------------------------------------------------------------------

    Key: HADOOP-3041
    URL: https://issues.apache.org/jira/browse/HADOOP-3041
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Environment: all
    Reporter: Alejandro Abdelnur
    Assignee: Amareshwari Sriramadasu
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt


    Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.
    For example: /user/foo/myoutput/part_00000
    In 0.16.1, now it returns an internal hadoop for the task output temporary location.
    For the above example: /user/foo/myoutput/_temporary/part_00000
    This change breaks applications that use the getOutputPath() to compute other directories.
    IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 18, '08 at 10:59a
activeApr 5, '08 at 12:16p
posts52
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 52 posts

People

Translate

site design / logo © 2023 Grokbase