FAQ
Hi All,

I have been using hadoop archives programmatically to generate har
archives from some logfiles which are being dumped into the hdfs.

When the input directory to Hadoop Archiving program has files of size
more than 2GB, strangely the archiving fails with a error message saying

INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= Illegal Capacity: -1

Going into the code i found out that this was due to numMaps having the
Value of -1.

As per the code in org.apache.hadoop.util.HadoopArchives:
archive(List<Path> srcPaths, String archiveName, Path dest)

the numMaps is initialized as
int numMaps = (int)(totalSize/partSize);
//run atleast one map.
conf.setNumMapTasks(numMaps == 0? 1:numMaps);

partSize has been statically assigned the value of 2GB in the beginning
of the class as,

static final long partSize = 2 * 1024 * 1024 * 1024

Strangely enough, the value i find assigned to partSize is = - 2147483648

Hence as a result in case of input directories of greater size, numMaps
is assigned -1 which leads to the code throwing up error.

I am using hadoop-0.17.1 and I got the archiving facility after applying
the patch hadoop-3307_4 patch.

This looks like a bug for me, so please let me know how to go about it.

Pratyush Banerjee

Search Discussions

  • Mahadev Konar at Jul 21, 2008 at 6:20 pm
    HI Pratyush,

    I think this bug was fixed in
    https://issues.apache.org/jira/browse/HADOOP-3545.

    Can you apply the patch and see if it works?

    Mahadev

    On 7/21/08 5:56 AM, "Pratyush Banerjee" wrote:

    Hi All,

    I have been using hadoop archives programmatically to generate har
    archives from some logfiles which are being dumped into the hdfs.

    When the input directory to Hadoop Archiving program has files of size
    more than 2GB, strangely the archiving fails with a error message saying

    INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId= Illegal Capacity: -1

    Going into the code i found out that this was due to numMaps having the
    Value of -1.

    As per the code in org.apache.hadoop.util.HadoopArchives:
    archive(List<Path> srcPaths, String archiveName, Path dest)

    the numMaps is initialized as
    int numMaps = (int)(totalSize/partSize);
    //run atleast one map.
    conf.setNumMapTasks(numMaps == 0? 1:numMaps);

    partSize has been statically assigned the value of 2GB in the beginning
    of the class as,

    static final long partSize = 2 * 1024 * 1024 * 1024

    Strangely enough, the value i find assigned to partSize is = - 2147483648

    Hence as a result in case of input directories of greater size, numMaps
    is assigned -1 which leads to the code throwing up error.

    I am using hadoop-0.17.1 and I got the archiving facility after applying
    the patch hadoop-3307_4 patch.

    This looks like a bug for me, so please let me know how to go about it.

    Pratyush Banerjee
  • Pratyush Banerjee at Jul 22, 2008 at 5:50 am
    Thanks Mahadev,
    Thanks for letting me know of the patch. I have already applied it and
    the archiving seems to run fine for input directory size of about 5GB.

    Currently am testing the same programatically, but since it is working
    from the command line, it should ideally also work this way.

    thanks and regards~

    Pratyush

    mahadev@yahoo-inc.com wrote:
    HI Pratyush,

    I think this bug was fixed in
    https://issues.apache.org/jira/browse/HADOOP-3545.

    Can you apply the patch and see if it works?

    Mahadev


    On 7/21/08 5:56 AM, "Pratyush Banerjee" wrote:

    Hi All,

    I have been using hadoop archives programmatically to generate har
    archives from some logfiles which are being dumped into the hdfs.

    When the input directory to Hadoop Archiving program has files of size
    more than 2GB, strangely the archiving fails with a error message saying

    INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId= Illegal Capacity: -1

    Going into the code i found out that this was due to numMaps having the
    Value of -1.

    As per the code in org.apache.hadoop.util.HadoopArchives:
    archive(List<Path> srcPaths, String archiveName, Path dest)

    the numMaps is initialized as
    int numMaps = (int)(totalSize/partSize);
    //run atleast one map.
    conf.setNumMapTasks(numMaps == 0? 1:numMaps);

    partSize has been statically assigned the value of 2GB in the beginning
    of the class as,

    static final long partSize = 2 * 1024 * 1024 * 1024

    Strangely enough, the value i find assigned to partSize is = - 2147483648

    Hence as a result in case of input directories of greater size, numMaps
    is assigned -1 which leads to the code throwing up error.

    I am using hadoop-0.17.1 and I got the archiving facility after applying
    the patch hadoop-3307_4 patch.

    This looks like a bug for me, so please let me know how to go about it.

    Pratyush Banerjee
  • Snehal Nagmote at Aug 19, 2009 at 1:44 am
    Hi,

    how to unarchive the logical files from the har file ?Is there anyway t o
    unarchive the logical files.



    Pratyush Banerjee-2 wrote:
    Hi All,

    I have been using hadoop archives programmatically to generate har
    archives from some logfiles which are being dumped into the hdfs.

    When the input directory to Hadoop Archiving program has files of size
    more than 2GB, strangely the archiving fails with a error message saying

    INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId= Illegal Capacity: -1

    Going into the code i found out that this was due to numMaps having the
    Value of -1.

    As per the code in org.apache.hadoop.util.HadoopArchives:
    archive(List<Path> srcPaths, String archiveName, Path dest)

    the numMaps is initialized as
    int numMaps = (int)(totalSize/partSize);
    //run atleast one map.
    conf.setNumMapTasks(numMaps == 0? 1:numMaps);

    partSize has been statically assigned the value of 2GB in the beginning
    of the class as,

    static final long partSize = 2 * 1024 * 1024 * 1024

    Strangely enough, the value i find assigned to partSize is = -
    2147483648

    Hence as a result in case of input directories of greater size, numMaps
    is assigned -1 which leads to the code throwing up error.

    I am using hadoop-0.17.1 and I got the archiving facility after applying
    the patch hadoop-3307_4 patch.

    This looks like a bug for me, so please let me know how to go about it.

    Pratyush Banerjee

    --
    View this message in context: http://www.nabble.com/Hadoop-Archive-Error-for-size-of-input-data-%3E2GB-tp18568129p25036326.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Koji Noguchi at Aug 19, 2009 at 4:43 pm

    how to unarchive the logical files from the har file ?Is there anyway t o
    unarchive the logical files.
    Opened https://issues.apache.org/jira/browse/MAPREDUCE-883 for documenting,
    but the idea is you just need to copy :)


    +
    + <section>
    + <title> How to unarchive an archive?</title>
    + <p> Since all the fs shell commands in the archives work
    transparently,
    + unarchiving is just a matter of copying </p>
    + <p> To unarchive sequentially:</p>
    + <p><code> hadoop dfs -cp har:///user/zoo/foo.har/dir1
    hdfs:/user/zoo/newdir </code></p>
    + <p> To unarchive in parallel, use distcp: </p>
    + <p><code> hadoop distcp har:///user/zoo/foo.har/dir1
    hdfs:/user/zoo/newdir </code></p>
    + </section>


    As for the original email,
    somehow I was able to archive files larger than 2G in 0.18.3.
    Maybe there's additional condition I'm missing?

    Koji

    On 8/18/09 6:44 PM, "Snehal Nagmote" wrote:


    Hi,

    how to unarchive the logical files from the har file ?Is there anyway t o
    unarchive the logical files.



    Pratyush Banerjee-2 wrote:
    Hi All,

    I have been using hadoop archives programmatically to generate har
    archives from some logfiles which are being dumped into the hdfs.

    When the input directory to Hadoop Archiving program has files of size
    more than 2GB, strangely the archiving fails with a error message saying

    INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId= Illegal Capacity: -1

    Going into the code i found out that this was due to numMaps having the
    Value of -1.

    As per the code in org.apache.hadoop.util.HadoopArchives:
    archive(List<Path> srcPaths, String archiveName, Path dest)

    the numMaps is initialized as
    int numMaps = (int)(totalSize/partSize);
    //run atleast one map.
    conf.setNumMapTasks(numMaps == 0? 1:numMaps);

    partSize has been statically assigned the value of 2GB in the beginning
    of the class as,

    static final long partSize = 2 * 1024 * 1024 * 1024

    Strangely enough, the value i find assigned to partSize is = -
    2147483648

    Hence as a result in case of input directories of greater size, numMaps
    is assigned -1 which leads to the code throwing up error.

    I am using hadoop-0.17.1 and I got the archiving facility after applying
    the patch hadoop-3307_4 patch.

    This looks like a bug for me, so please let me know how to go about it.

    Pratyush Banerjee

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 21, '08 at 12:57p
activeAug 19, '09 at 4:43p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase