|| at Sep 27, 2010 at 3:13 pm
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley wrote:
Is there a particularly good reason for why the "hadoop fs" command
-cat and -tail, but not -head?
Tail is needed to be done efficiently but head you can just do
yourself. Most people probably use
hadoop dfs -cat file | head -5.
I disagree with your use of the word "efficiently". :-) To my
understanding (and perhaps that's the source of my error), the
approach you suggested reads the entire file over the net from the
cluster to your client machine. That file could conceivably be of
HDFS scales (100s of GBs, even TBs wouldn't be uncommon).
What do you think? Am I wrong in my interpretation of how hadoopCat-
pipe-head would work?
Keith Wiley firstname.lastname@example.org keithwiley.com
"And what if we picked the wrong religion? Every week, we're just
madder and madder!"
-- Homer Simpson