FAQ
Is there a particularly good reason for why the "hadoop fs" command
supports -cat and -tail, but not -head?

________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com
music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us
with
sense, reason, and intellect has intended us to forgo their use."
-- Galileo Galilei
________________________________________________________________________________

Search Discussions

  • Edward Capriolo at Sep 27, 2010 at 2:03 pm

    On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley wrote:
    Is there a particularly good reason for why the "hadoop fs" command supports
    -cat and -tail, but not -head?

    ________________________________________________________________________________
    Keith Wiley     kwiley@keithwiley.com     keithwiley.com
    music.keithwiley.com

    "I do not feel obliged to believe that the same God who has endowed us with
    sense, reason, and intellect has intended us to forgo their use."
    --  Galileo Galilei
    ________________________________________________________________________________
    Tail is needed to be done efficiently but head you can just do
    yourself. Most people probably use

    hadoop dfs -cat file | head -5.
  • Keith Wiley at Sep 27, 2010 at 3:13 pm

    On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
    On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley wrote:
    Is there a particularly good reason for why the "hadoop fs" command
    supports
    -cat and -tail, but not -head?
    Tail is needed to be done efficiently but head you can just do
    yourself. Most people probably use

    hadoop dfs -cat file | head -5.

    I disagree with your use of the word "efficiently". :-) To my
    understanding (and perhaps that's the source of my error), the
    approach you suggested reads the entire file over the net from the
    cluster to your client machine. That file could conceivably be of
    HDFS scales (100s of GBs, even TBs wouldn't be uncommon).

    What do you think? Am I wrong in my interpretation of how hadoopCat-
    pipe-head would work?

    Cheers!

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com keithwiley.com
    music.keithwiley.com

    "And what if we picked the wrong religion? Every week, we're just
    making God
    madder and madder!"
    -- Homer Simpson
    ________________________________________________________________________________
  • Edward Capriolo at Sep 27, 2010 at 8:47 pm

    On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley wrote:
    On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:

    On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <kwiley@keithwiley.com>
    wrote:
    Is there a particularly good reason for why the "hadoop fs" command
    supports
    -cat and -tail, but not -head?
    Tail is needed to be done efficiently but head you can just do
    yourself. Most people probably use

    hadoop dfs -cat file | head -5.

    I disagree with your use of the word "efficiently".  :-)  To my
    understanding (and perhaps that's the source of my error), the approach you
    suggested reads the entire file over the net from the cluster to your client
    machine.  That file could conceivably be of HDFS scales (100s of GBs, even
    TBs wouldn't be uncommon).

    What do you think?  Am I wrong in my interpretation of how
    hadoopCat-pipe-head would work?

    Cheers!

    ________________________________________________________________________________
    Keith Wiley     kwiley@keithwiley.com     keithwiley.com
    music.keithwiley.com

    "And what if we picked the wrong religion?  Every week, we're just making
    God
    madder and madder!"
    --  Homer Simpson
    ________________________________________________________________________________
    'hadoop dfs -cat' will output the file as it is read. head -5 will
    kill the first half of the pipe after 5 lines. With buffering more
    might be physically read then 5 lines but this invocation does not
    read the enter HDFS file before piping it to head.
  • Keith Wiley at Sep 28, 2010 at 12:35 am

    On Sep 27, 2010, at 13:46 , Edward Capriolo wrote:
    On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley wrote:
    On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:

    On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <kwiley@keithwiley.com>
    wrote:
    Is there a particularly good reason for why the "hadoop fs" command
    supports
    -cat and -tail, but not -head?
    Tail is needed to be done efficiently but head you can just do
    yourself. Most people probably use

    hadoop dfs -cat file | head -5.

    I disagree with your use of the word "efficiently". :-) To my
    understanding (and perhaps that's the source of my error), the approach you
    suggested reads the entire file over the net from the cluster to your client
    machine. That file could conceivably be of HDFS scales (100s of GBs, even
    TBs wouldn't be uncommon).

    What do you think? Am I wrong in my interpretation of how
    hadoopCat-pipe-head would work?
    'hadoop dfs -cat' will output the file as it is read. head -5 will
    kill the first half of the pipe after 5 lines. With buffering more
    might be physically read then 5 lines but this invocation does not
    read the enter HDFS file before piping it to head.

    Excellent. Thank you.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "I used to be with it, but then they changed what it was. Now, what I'm with
    isn't it, and what's it seems weird and scary to me."
    -- Abe (Grandpa) Simpson
    ________________________________________________________________________________

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 27, '10 at 7:24a
activeSep 28, '10 at 12:35a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Keith Wiley: 3 posts Edward Capriolo: 2 posts

People

Translate

site design / logo © 2021 Grokbase