Hello Olivier

It works like a charm :)

While we are on the subject, I've sent an email to
common-user@hadoop.apache.org about hdfs that remained unanswered. I'll
reproduce that here, I think it's a better place for it:

I want to achieve the 'hadoop dfs -getmerge' functionality over http. The
closest I could find is the 'Download this file' link but this is available
only for parts, not the whole directory (

It seems that you can push to Solr 1.4 a csv url file. That is a link to the
actual csv file. The problem is that a directory is not available for
download as a merged file, in the hadoop hdfs over http interface, just the
individual parts.

As all the pieces are already there, it doesn't make sense to me to add a
http (Apache?) server to this mix just to serve the processed files. I
should be able to do that with a special url or something, maybe along the
lines of ... /streamMergedFile?whateverPathToAFileOrDir

As you can see it's related to my initial question on this thread :)

thanks for your time,
On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier wrote:

Supposing you do have your part-r-XXXX fully ordered

you can do

hadoop dfs -cat "output/solr/part-*" > yourLocalFile

tada :)



Alex Parvulescu a écrit :

one minor correction.

I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
equivalent of '-get' and they both handle only files.

I'd like to see an equivalent of 'getmerge' to stdout.

sorry for the confusion

On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <
alex.parvulescu@gmail.com wrote:

Hello Olivier,

I've tried 'cat'. This is the error I get: 'cat: Source must be a
This happens when I try to get all parts from a directory as a
single .csv file.

Something like that:
hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
cat: Source must be a file.
This is what the dir looks like
hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
-rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
-rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36

It seems -get can merge everything to one file, but cannot output to
sdtout while 'cat' can do stdout, but it seems I have to fetch the
parts one by one.

Or am I missing something?


On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr

Hello Alex,

get writes down a file on your FileSystem

hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

src : your file in your hdfs
localdst : the name of the file with the collected data (from
src) on
your local filesystem

To get the results to STDOUT,
you can use cat

hadoop dfs [-cat <src>]

with src : your file in your hdfs


Alex Parvulescu a écrit :


Is there a reason for which 'hadoop dfs -get' will not
output to stdout?

I see 'hadoop dfs -put' can handle stdin. It would seem
that dfs would have to also support outputing to stdout.


Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 6 | next ›
Discussion Overview
grouphdfs-user @
postedMar 16, '10 at 9:59a
activeMar 17, '10 at 11:37a

2 users in discussion

Alex Parvulescu: 4 posts Varene Olivier: 2 posts



site design / logo © 2022 Grokbase