FAQ
I am running two instances of Hadoop on a cluster and want to copy all the data from hadoop1 to the updated hadoop2. From hadoop2, I am running the command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of hadoop2. I get the following error:

11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
[Fatal Error] :1:215: XML document structures must start and end within the same entity.
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: invalid xml directory content
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
at org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
... 9 more

I am fairly certain that none of the XML files are malformed or corrupted. This thread (http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html) discusses a similar problem caused by file permissions but doesn't seem to offer a solution. Any help would be appreciated.

Thanks,
Mike

Search Discussions

  • Sonal Goyal at Feb 7, 2011 at 5:12 pm
    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Korb, Michael [USA] at Feb 7, 2011 at 5:39 pm
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Xavier Stevens at Feb 7, 2011 at 5:56 pm
    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Korb, Michael [USA] at Feb 7, 2011 at 6:08 pm
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:

    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Xavier Stevens at Feb 7, 2011 at 6:21 pm
    Mike,

    Are you just trying to upgrade then? I've never heard of anyone trying
    to run two versions of hadoop on the same cluster. I'm don't think
    that's even possible, but maybe someone else knows.

    -Xavier

    On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:

    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Korb, Michael [USA] at Feb 7, 2011 at 6:27 pm
    Xavier,

    Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.

    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:20 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    Are you just trying to upgrade then? I've never heard of anyone trying
    to run two versions of hadoop on the same cluster. I'm don't think
    that's even possible, but maybe someone else knows.

    -Xavier

    On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:

    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Xavier Stevens at Feb 7, 2011 at 6:48 pm
    You don't need to distcp to upgrade a cluster. You just need to go
    through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
    not even need to do anything other than stop the cluster processes, and
    then restart them using the 0.20.3 install.

    Here's a link to the upgrade and rollback docs:
    http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback


    -Xavier

    On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
    Xavier,

    Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.

    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:20 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    Are you just trying to upgrade then? I've never heard of anyone trying
    to run two versions of hadoop on the same cluster. I'm don't think
    that's even possible, but maybe someone else knows.

    -Xavier

    On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:

    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Korb, Michael [USA] at Feb 7, 2011 at 7:43 pm
    We're migrating from CDH3b3 to a recent build of 0.20-append published by Ryan Rawson. This isn't something covered by normal upgrade scripts. I've tried several commands with different protocols and port numbers, but now keep getting the same error:

    11/02/07 14:35:06 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 14:35:06 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Has anyone seen this before? What might be causing it?

    Thanks,
    Mike


    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:47 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    You don't need to distcp to upgrade a cluster. You just need to go
    through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
    not even need to do anything other than stop the cluster processes, and
    then restart them using the 0.20.3 install.

    Here's a link to the upgrade and rollback docs:
    http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback


    -Xavier

    On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
    Xavier,

    Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.

    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:20 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    Are you just trying to upgrade then? I've never heard of anyone trying
    to run two versions of hadoop on the same cluster. I'm don't think
    that's even possible, but maybe someone else knows.

    -Xavier

    On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:

    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:

    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to copy,
    but because for some reason, the source or destination listing can not be
    retrieved/parsed. Are you trying to copy between diff versions of clusters?
    As far as I know, your destination should be writable, distcp should be run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>




    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] wrote:

    I am running two instances of Hadoop on a cluster and want to copy all the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
    command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the
    same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Ted Dunning at Feb 8, 2011 at 2:05 am
    This is due to the security API not being available. You are crossing from
    a cluster with security to one without and that is causing confusion.
    Presumably your client assumes that it is available and your hadoop library
    doesn't provide it.

    Check your class path very carefully looking for version assumptions and
    confusions.

    On Mon, Feb 7, 2011 at 11:43 AM, Korb, Michael [USA]
    wrote:
    We're migrating from CDH3b3 to a recent build of 0.20-append published by
    Ryan Rawson. This isn't something covered by normal upgrade scripts. I've
    tried several commands with different protocols and port numbers, but now
    keep getting the same error:

    11/02/07 14:35:06 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 14:35:06 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError:
    org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Has anyone seen this before? What might be causing it?

    Thanks,
    Mike


    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:47 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    You don't need to distcp to upgrade a cluster. You just need to go
    through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
    not even need to do anything other than stop the cluster processes, and
    then restart them using the 0.20.3 install.

    Here's a link to the upgrade and rollback docs:

    http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback


    -Xavier

    On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
    Xavier,

    Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the
    same cluster. I'm trying to distcp everything from the 0.20.2 instance over
    to the 0.20.3 instance, without any luck yet.
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 1:20 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    Are you just trying to upgrade then? I've never heard of anyone trying
    to run two versions of hadoop on the same cluster. I'm don't think
    that's even possible, but maybe someone else knows.

    -Xavier

    On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
    Xavier,

    Both instances of Hadoop are running on the same cluster. I tried the
    command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/
    hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on
    mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting
    this:
    11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
    Exception in thread "main" java.lang.NoSuchMethodError:
    org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

    Thanks,
    Mike
    ________________________________________
    From: Xavier Stevens [xstevens@mozilla.com]
    Sent: Monday, February 07, 2011 12:56 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. I would make sure you only have one version of hadoop
    installed on your destination cluster. Also you should use hdfs as the
    destination protocol and run the command as the hdfs user if you're
    using hadoop security.

    Example (Running on destination cluster):

    sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
    hftp://mc00001:50070/ hdfs://mc00000:8020/

    Cheers,


    -Xavier

    On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
    I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the
    DistCp Guide but I think I know the problem. I'm trying to run the command
    on the destination cluster, but when I call hadoop, I think the path is set
    to run the hadoop1 executable. So I tried going to the hadoop2 install and
    running it with "./hadoop distcp -update hftp://mc00001:50070/
    hdfs://mc00000:55310/" but now I get this error:
    11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
    Exception in thread "main" java.lang.NoSuchMethodError:
    org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


    ________________________________________
    From: Sonal Goyal [sonalgoyal4@gmail.com]
    Sent: Monday, February 07, 2011 12:11 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop XML Error

    Mike,

    This error is not related to malformed XML files etc you are trying to
    copy,
    but because for some reason, the source or destination listing can not
    be
    retrieved/parsed. Are you trying to copy between diff versions of
    clusters?
    As far as I know, your destination should be writable, distcp should be
    run
    from the destination cluster. See more here:
    http://hadoop.apache.org/common/docs/r0.20.2/distcp.html

    Let us know how it goes.

    Thanks and Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
    Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>





    On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <
    Korb_Michael@bah.com>wrote:
    I am running two instances of Hadoop on a cluster and want to copy all
    the
    data from hadoop1 to the updated hadoop2. From hadoop2, I am running
    the
    command "hadoop distcp -update hftp://mc00001:50070/
    hftp://mc00000:50070/"
    where mc00001 is the namenode of hadoop1 and mc00000 is the namenode
    of
    hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end
    within the
    same entity.
    With failures, global counters are inaccurate; consider running with
    -i
    Copy failed: java.io.IOException: invalid xml directory content
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at
    org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must
    start and end within the same entity.
    at
    com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at
    org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or
    corrupted.
    This thread (
    http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
    discusses a similar problem caused by file permissions but doesn't
    seem to
    offer a solution. Any help would be appreciated.

    Thanks,
    Mike
  • Xavier Stevens at Feb 7, 2011 at 5:51 pm
    Mike,

    I've seen this when a directory has been removed or is missing from the
    time distcp starting stating the source files. You'll probably want to
    make sure that no code or person is messing with the filesystem during
    your copy. Also you should use hdfs as the destination protocol.

    Cheers,

    -Xavier

    On 2/7/11 7:51 AM, Korb, Michael [USA] wrote:
    I am running two instances of Hadoop on a cluster and want to copy all the data from hadoop1 to the updated hadoop2. From hadoop2, I am running the command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of hadoop2. I get the following error:

    11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
    11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
    [Fatal Error] :1:215: XML document structures must start and end within the same entity.
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: invalid xml directory content
    at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
    at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
    at org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
    at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
    ... 9 more

    I am fairly certain that none of the XML files are malformed or corrupted. This thread (http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html) discusses a similar problem caused by file permissions but doesn't seem to offer a solution. Any help would be appreciated.

    Thanks,
    Mike

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 7, '11 at 3:52p
activeFeb 8, '11 at 2:05a
posts11
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase