FAQ
Unhandled failures starting jobs with S3 as backing store
---------------------------------------------------------

Key: HADOOP-4637
URL: https://issues.apache.org/jira/browse/HADOOP-4637
Project: Hadoop Core
Issue Type: Bug
Components: fs/s3
Affects Versions: 0.18.1
Reporter: Robert


I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store.

When starting jobs, I sometimes get the following failure, which causes the job to be abandoned:

org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy4.retrieveBlock(Unknown Source)
at org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
at org.apache.hadoop.mapred.JobInProgress.(JobTracker.java:1783)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)

The stack trace suggests that copying the job file fails, because the HDFS S3 filesystem can't find all of the expected block objects when it needs them.

Since S3 is an "eventually consistent" kind of a filesystem, and does not always provide an up-to-date view of the stored data, this execution path probably should be strengthened - at least to retry these failed operations, or wait for the expected block file if it hasn't shown up yet.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Tom White (JIRA) at Nov 12, 2008 at 12:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646757#action_12646757 ]

    Tom White commented on HADOOP-4637:
    -----------------------------------

    The problem is that {{monospaced}}in{{monospaced}} can be null if the {{monospaced}}get(String key, long byteRangeStart){{monospaced}} method fails with a "NoSuchKey" from S3. This should throw an exception rather than return null, then the automatic retry would kick in.
    Unhandled failures starting jobs with S3 as backing store
    ---------------------------------------------------------

    Key: HADOOP-4637
    URL: https://issues.apache.org/jira/browse/HADOOP-4637
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.1
    Reporter: Robert

    I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store.
    When starting jobs, I sometimes get the following failure, which causes the job to be abandoned:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy4.retrieveBlock(Unknown Source)
    at org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
    at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
    at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:177)
    at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
    at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
    at org.apache.hadoop.ipc.Client.call(Client.java:715)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    The stack trace suggests that copying the job file fails, because the HDFS S3 filesystem can't find all of the expected block objects when it needs them.
    Since S3 is an "eventually consistent" kind of a filesystem, and does not always provide an up-to-date view of the stored data, this execution path probably should be strengthened - at least to retry these failed operations, or wait for the expected block file if it hasn't shown up yet.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Nov 12, 2008 at 12:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646757#action_12646757 ]

    tomwhite edited comment on HADOOP-4637 at 11/11/08 4:46 PM:
    -------------------------------------------------------------

    The problem is that {{in}} can be null if the {{get(String key, long byteRangeStart)}} method fails with a "NoSuchKey" from S3. This should throw an exception rather than return null, then the automatic retry would kick in.

    was (Author: tomwhite):
    The problem is that {{monospaced}}in{{monospaced}} can be null if the {{monospaced}}get(String key, long byteRangeStart){{monospaced}} method fails with a "NoSuchKey" from S3. This should throw an exception rather than return null, then the automatic retry would kick in.
    Unhandled failures starting jobs with S3 as backing store
    ---------------------------------------------------------

    Key: HADOOP-4637
    URL: https://issues.apache.org/jira/browse/HADOOP-4637
    Project: Hadoop Core
    Issue Type: Bug
    Components: fs/s3
    Affects Versions: 0.18.1
    Reporter: Robert

    I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store.
    When starting jobs, I sometimes get the following failure, which causes the job to be abandoned:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy4.retrieveBlock(Unknown Source)
    at org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
    at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
    at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:177)
    at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
    at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
    at org.apache.hadoop.ipc.Client.call(Client.java:715)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
    The stack trace suggests that copying the job file fails, because the HDFS S3 filesystem can't find all of the expected block objects when it needs them.
    Since S3 is an "eventually consistent" kind of a filesystem, and does not always provide an up-to-date view of the stored data, this execution path probably should be strengthened - at least to retry these failed operations, or wait for the expected block file if it hasn't shown up yet.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedNov 12, '08 at 12:14a
activeNov 12, '08 at 12:48a
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Tom White (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase