FAQ
Hello,

It's frustrating to be dealing with these simple problems (and I know
the fault is mine, i'm missing something).
I'm running word count (from 0.20-2) on a gzip file (very small), the
output has binary characters.
When I run the same on the ungzipped file, the output is correct ascii.

I'm using the native gzip library. The command is

hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip

(zip is gzip)

Any ideas?

Thanks
SG

Search Discussions

  • Niels Basjes at Mar 21, 2011 at 11:02 pm
    Hi,

    2011/3/21 Saptarshi Guha <saptarshi.guha@gmail.com>:
    It's frustrating to be dealing with these simple problems (and I know
    the fault is mine, i'm missing something).
    I'm running word count (from 0.20-2) on a gzip file (very small), the
    output has binary characters.
    When I run the same on the ungzipped file, the output is correct ascii.

    I'm using the native gzip library. The command is

    hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
    wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip

    (zip is gzip)
    No, .zip is "pkzip" and .gz is gzip.

    The applicable hadoop code actually chooses the decompressor on the
    extention of the filename.

    --
    Niels Basjes
  • Saptarshi Guha at Mar 21, 2011 at 11:11 pm
    True, my naming is
    Hmm, now i know.
    thanks
    On Mon, Mar 21, 2011 at 4:01 PM, Niels Basjes wrote:
    Hi,

    2011/3/21 Saptarshi Guha <saptarshi.guha@gmail.com>:
    It's frustrating to be dealing with these simple problems (and I know
    the fault is mine, i'm missing something).
    I'm running word count (from 0.20-2) on a gzip file (very small), the
    output has binary characters.
    When I run the same on the ungzipped file, the output is correct ascii.

    I'm using the native gzip library. The command is

    hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
    wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip

    (zip is gzip)
    No, .zip is "pkzip" and .gz is gzip.

    The applicable hadoop code actually chooses the decompressor on the
    extention of the filename.

    --
    Niels Basjes

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 21, '11 at 10:47p
activeMar 21, '11 at 11:11p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Saptarshi Guha: 2 posts Niels Basjes: 1 post

People

Translate

site design / logo © 2021 Grokbase