FAQ
Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs. Then I want to decompress the archive on HDFS? I can't figure out what "hd fs" type command would do such a thing.

Thanks.

________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
-- Keith Wiley
________________________________________________________________________________

Search Discussions

  • Harsh J at Aug 5, 2011 at 4:05 am
    Keith,

    The 'hadoop fs -text' tool does decompress a file given to it if
    needed/able, but what you could also do is run a distributed mapreduce
    job that converts from compressed to decompressed, that'd be much
    faster.
    On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley wrote:
    Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.

    Thanks.

    ________________________________________________________________________________
    Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

    "What I primarily learned in grad school is how much I *don't* know.
    Consequently, I left grad school with a higher ignorance to knowledge ratio than
    when I entered."
    --  Keith Wiley
    ________________________________________________________________________________


    --
    Harsh J
  • Keith Wiley at Aug 5, 2011 at 3:14 pm
    I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS. I just didn't think it should be necessary to *write a program* to do something so seemingly minimal. This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.

    I'll read up on "-text". Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all. Anyway, I'll see what I can find on that.

    Thanks.
    On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

    Keith,

    The 'hadoop fs -text' tool does decompress a file given to it if
    needed/able, but what you could also do is run a distributed mapreduce
    job that converts from compressed to decompressed, that'd be much
    faster.
    On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley wrote:
    Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs. Then I want to decompress the archive on HDFS? I can't figure out what "hd fs" type command would do such a thing.

    Thanks.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com

    "It's a fine line between meticulous and obsessive-compulsive and a slippery
    rope between obsessive-compulsive and debilitatingly slow."
    -- Keith Wiley
    ________________________________________________________________________________
  • Harsh J at Aug 5, 2011 at 3:22 pm
    I suppose we could do with a simple identity mapping/identity reducing
    example/tool that can easily be reutilized for purposes such as these.
    Could you file a JIRA on this?

    The -text is like -cat but has codec and some file format detection.
    Hopefully it should work for your case.
    On Fri, Aug 5, 2011 at 8:44 PM, Keith Wiley wrote:
    I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS.  I just didn't think it should be necessary to *write a program* to do something so seemingly minimal.  This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.

    I'll read up on "-text".  Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all.  Anyway, I'll see what I can find on that.

    Thanks.
    On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

    Keith,

    The 'hadoop fs -text' tool does decompress a file given to it if
    needed/able, but what you could also do is run a distributed mapreduce
    job that converts from compressed to decompressed, that'd be much
    faster.
    On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley wrote:
    Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.

    Thanks.

    ________________________________________________________________________________
    Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

    "It's a fine line between meticulous and obsessive-compulsive and a slippery
    rope between obsessive-compulsive and debilitatingly slow."
    --  Keith Wiley
    ________________________________________________________________________________


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 4, '11 at 11:29p
activeAug 5, '11 at 3:22p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Keith Wiley: 2 posts Harsh J: 2 posts

People

Translate

site design / logo © 2022 Grokbase