FAQ
problem using top level s3 buckets as input/output directories
--------------------------------------------------------------

Key: HADOOP-5805
URL: https://issues.apache.org/jira/browse/HADOOP-5805
Project: Hadoop Core
Issue Type: Bug
Components: fs/s3
Affects Versions: 0.18.3
Environment: ec2, cloudera AMI, 20 nodes
Reporter: Arun Jacob


When I specify top level s3 buckets as input or output directories, I get the following exception.

hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output

java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

The workaround is to specify input/output buckets with sub-directories:


hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Ian Nowland (JIRA) at May 16, 2009 at 2:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ian Nowland updated HADOOP-5805:
    --------------------------------

    Attachment: HADOOP-5805-0.patch

    There are two problems here.

    The first is that S3N currently requires a terminating slash on the URI to indicate the root of a bucket. That is it accepts s3n://infocloud-input/ but not s3n://infocloud-input. This is fixed by the attached patch which allows either form to be used.

    This fixes the input bucket case but not the output one.

    The second problem is then that S3N requires any bucket to exist for it to be able to use it. But if you attempt to use its "root" as the output then you will get the standard Hadoop behavior of throwing an FileAlreadyExistsException exception from FileOutputFormat, even if the bucket is empty, as the root directory "/" of the bucket does exist. To me the ideal fix for this second problem is to change FileOutputFormat to not throw if the output directory exists but is empty. However that seems a fairly large change to the established behavior, so I did not include it with the more trivial patch.

    As an aside since each AWS account only gets 100 buckets that it can use, you generally don't want to be writing the output of each job to a new bucket anyway.

    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Attachments: HADOOP-5805-0.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ian Nowland (JIRA) at May 19, 2009 at 4:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ian Nowland updated HADOOP-5805:
    --------------------------------

    Fix Version/s: 0.21.0
    Status: Patch Available (was: Open)
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 21, 2009 at 3:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White reassigned HADOOP-5805:
    ---------------------------------

    Assignee: Ian Nowland
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 21, 2009 at 3:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643 ]

    Tom White commented on HADOOP-5805:
    -----------------------------------

    This looks like a good fix. The test should do an assert to check that it gets back an appropriate FileStatus object.

    The patch needs to be regenerated since the tests have moved from src/test to src/test/core.

    For the second problem, you could subclass your output format to override checkOutputSpecs() so it doesn't throw FileAlreadyExistsException. But I agree it would be nicer to deal with this generally. Perhaps open a separate Jira as it would affect more than NativeS3FileSystem.


    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ian Nowland (JIRA) at May 21, 2009 at 11:11 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ian Nowland updated HADOOP-5805:
    --------------------------------

    Attachment: HADOOP-5805-1.patch

    New patch against trunk. Moved test and added assert.

    Also created https://issues.apache.org/jira/browse/HADOOP-5889

    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 22, 2009 at 1:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711910#action_12711910 ]

    Hadoop QA commented on HADOOP-5805:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12408752/HADOOP-5805-1.patch
    against trunk revision 777330.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 4 new or modified tests.

    -1 patch. The patch command could not apply the patch.

    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/375/console

    This message is automatically generated.
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 26, 2009 at 10:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-5805:
    ------------------------------

    Status: Patch Available (was: Open)
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 26, 2009 at 10:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-5805:
    ------------------------------

    Status: Open (was: Patch Available)
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 26, 2009 at 10:38 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-5805:
    ------------------------------

    Attachment: HADOOP-5805-2.patch

    For some reason the patch didn't apply. Here's a regenerated version.
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 28, 2009 at 3:13 am
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713840#action_12713840 ]

    Hadoop QA commented on HADOOP-5805:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12409019/HADOOP-5805-2.patch
    against trunk revision 779338.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 4 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/console

    This message is automatically generated.
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 28, 2009 at 4:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-5805:
    ------------------------------

    Resolution: Fixed
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    I've just committed this. Thanks Ian!

    (The contrib test failure was unrelated.)
    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Jun 11, 2009 at 8:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718620#action_12718620 ]

    Hudson commented on HADOOP-5805:
    --------------------------------

    Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])

    problem using top level s3 buckets as input/output directories
    --------------------------------------------------------------

    Key: HADOOP-5805
    URL: https://issues.apache.org/jira/browse/HADOOP-5805
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.3
    Environment: ec2, cloudera AMI, 20 nodes
    Reporter: Arun Jacob
    Assignee: Ian Nowland
    Fix For: 0.21.0

    Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch


    When I specify top level s3 buckets as input or output directories, I get the following exception.
    hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
    java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    The workaround is to specify input/output buckets with sub-directories:

    hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 11, '09 at 5:36p
activeJun 11, '09 at 8:01p
posts13
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 13 posts

People

Translate

site design / logo © 2022 Grokbase