FAQ
Hi,

We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?


Regards,
Ilayaraja

Search Discussions

  • Owen O'Malley at Mar 21, 2010 at 7:27 pm
    I believe you need to take two jumps. I believe it is 15-> 18 -> 20.
    I'd strongly suggest trying a practice file system first. Did we have
    owners and perms in 15? If not, you'll need to set owners and perms.

    -- Owen
    On Mar 21, 2010, at 12:23 AM, "ilayaraja" wrote:

    Hi,

    We 've been using hadoop 15.5 in our production environment where we
    have about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our
    env. to Amazon Elastic Map Reduce (EMR) which throws the following
    questions to us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use
    the current data that were generated with hadoop 15.5 from hadoop
    19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0
    from hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja
  • Philip Zeyliger at Mar 22, 2010 at 1:45 am
    I believe 0.15 had HftpFileSystem.
    http://hadoop.apache.org/common/docs/r0.15.3/api/index.html

    You may be able to run 0.19's distcp to copy from your 0.15 (use HFTP
    as the source) to HDFS.

    -- Philip

    On Sun, Mar 21, 2010 at 12:26 PM, Owen O'Malley wrote:
    I believe you need to take two jumps. I believe it is 15-> 18 -> 20. I'd
    strongly suggest trying a practice file system first. Did we have owners and
    perms in 15? If not, you'll need to set owners and perms.

    -- Owen
    On Mar 21, 2010, at 12:23 AM, "ilayaraja" wrote:

    Hi,

    We 've been using hadoop 15.5 in our production environment where we have
    about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to
    Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use the
    current data that were generated with hadoop 15.5 from hadoop 19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0 from
    hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja
  • Hitchcock, Andrew at Mar 22, 2010 at 8:25 pm
    Hi,

    At this time Elastic MapReduce only supports Hadoop 0.18.3.

    The cluster that stores the 10 TB of data, is that currently running on Amazon EC2?

    Regards,
    Andrew
    On Mar 21, 2010, at 12:23 AM, "ilayaraja" wrote:
    Hi,

    We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to > us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja
  • Ilayaraja at Mar 23, 2010 at 4:32 am
    Re: Hadoop Compatibility and EMRHi Andrew,

    Yes. The data is on EC2 cluster only.

    Regards,
    Ilay
    ----- Original Message -----
    From: Hitchcock, Andrew
    To: common-dev@hadoop.apache.org ; ilayaraja@rediff.co.in
    Sent: Tuesday, March 23, 2010 1:57 AM
    Subject: Re: Hadoop Compatibility and EMR


    Hi,

    At this time Elastic MapReduce only supports Hadoop 0.18.3.

    The cluster that stores the 10 TB of data, is that currently running on Amazon EC2?

    Regards,
    Andrew
    On Mar 21, 2010, at 12:23 AM, "ilayaraja" wrote:
    Hi, >
    We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to > us: >
    1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0? >
    2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?
    >
    >
    Regards,
    Ilayaraja
  • Hitchcock, Andrew at Mar 23, 2010 at 6:11 pm
    We recommend that people use Amazon S3 as the durable store when using Elastic MapReduce. We consider the HDFS on Elastic MapReduce clusters to be transient.

    With that said, you need some way to get your data into S3 from HDFS. We recommend storing the files directly in S3 (with S3N) and not using the S3 block file system. That presents two challenges:

    1. Making sure all files on your cluster are less than 5 GB.
    2. Uploading your files without the use of S3N (which wasn't introduced until 0.18).

    You'll probably want to write a DistCp-like job which reads the files from HDFS and uploads them to S3. If necessary, it should also detect files that are larger than 5 GB and split them into multiple pieces.

    Andrew




    On 3/22/10 9:23 PM, "ilayaraja" wrote:

    Hi Andrew,

    Yes. The data is on EC2 cluster only.

    Regards,
    Ilay

    ----- Original Message -----

    From: Hitchcock, Andrew

    To: common-dev@hadoop.apache.org ; ilayaraja@rediff.co.in

    Sent: Tuesday, March 23, 2010 1:57 AM

    Subject: Re: Hadoop Compatibility and EMR


    Hi,

    At this time Elastic MapReduce only supports Hadoop 0.18.3.

    The cluster that stores the 10 TB of data, is that currently running on Amazon EC2?

    Regards,
    Andrew
    On Mar 21, 2010, at 12:23 AM, "ilayaraja" wrote:
    Hi,

    We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to > us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja
  • Sagar Shukla at Mar 26, 2010 at 3:49 pm
    Hi Ilayaraja,
    Hadoop HDFS has a utility distcp using which it should be possible to copy data between two different versions of hadoop. Though I am not quite sure if this utility would be able to read data from hadoop 15.5 . More information is available at the link - http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver

    There is no information on supported versions at this URL, but it could be worth a try.

    You can also create a local environment to push your HDFS data from older version to newer version. And then finally, move it on Amazon EMR.

    Regards,
    Sagar

    -----Original Message-----
    From: ilayaraja
    Sent: Sunday, March 21, 2010 12:53 PM
    To: hadoop-user; hadoop-dev
    Subject: Hadoop Compatibility and EMR

    Hi,

    We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Vibhooti Verma at Mar 25, 2010 at 5:04 am
    Up-gradation information is given in
    http://wiki.apache.org/hadoop/Hadoop_Upgrade
    Our team had done the upgrade from 15.5 to 18 with keeping the data intact
    but it did require lot of testing.
    I suggest to first do the upgrade on test data and then to production data.
    On Thu, Mar 25, 2010 at 9:58 AM, Sagar Shukla wrote:

    Hi Ilayaraja,
    Hadoop HDFS has a utility distcp using which it should be possible to
    copy data between two different versions of hadoop. Though I am not quite
    sure if this utility would be able to read data from hadoop 15.5 . More
    information is available at the link -
    http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver

    There is no information on supported versions at this URL, but it could be
    worth a try.

    You can also create a local environment to push your HDFS data from older
    version to newer version. And then finally, move it on Amazon EMR.

    Regards,
    Sagar

    -----Original Message-----
    From: ilayaraja
    Sent: Sunday, March 21, 2010 12:53 PM
    To: hadoop-user; hadoop-dev
    Subject: Hadoop Compatibility and EMR

    Hi,

    We 've been using hadoop 15.5 in our production environment where we have
    about 10 TB of data stored on the dfs.
    The files were generated as mapreduce output. We want to move our env. to
    Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

    1. EMR supports only hadoop 19.0 and above. Is it possible to use the
    current data that were generated with hadoop 15.5 from hadoop 19.0?

    2. Or how can we make it possible to use or update to hadoop 19.0 from
    hadoop 15.5? What are the issues expected while doing so?


    Regards,
    Ilayaraja


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    cheers,
    Vibhooti
  • Ilayaraja at Apr 23, 2010 at 6:41 pm
    The following error is thrown when distcp ing data from hdfs (hadoop 15.5)
    to S3 storage.
    This problem is creeping in after actually applying couple of bug fixes in
    hadoop 15.5 that were resolved in the later versions.
    Any thoughts would be greatly helpful.

    With failures, global counters are inaccurate; consider running with -i
    Copy failed: org.apache.hadoop.fs.s3.S3Exception:
    org.jets3t.service.S3ServiceException: S3 GET failed. XML Error Message:
    <?xml version="1.0"
    encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key
    does not
    exist.(Jets3tFileSystemStore.java:199)
    at
    org.apache.hadoop.fs.s3.Jets3tFileSystemStore.inodeExists(Jets3tFileSystemStore.java:169)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.inodeExists(Unknown Source)
    at
    org.apache.hadoop.fs.s3.S3FileSystem.exists(S3FileSystem.java:127)
    at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:675)
    at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
    at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)

    Regards & Thanks,
    Ilayaraja

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 21, '10 at 6:05p
activeApr 23, '10 at 6:41p
posts9
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase