FAQ
Hello,

Is there a reason for which 'hadoop dfs -get' will not output to stdout?

I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would have
to also support outputing to stdout.


thanks,
alex

Search Discussions

  • Varene Olivier at Mar 16, 2010 at 10:21 am
    Hello Alex,

    get writes down a file on your FileSystem

    hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

    with
    src : your file in your hdfs
    localdst : the name of the file with the collected data (from src) on

    your local filesystem


    To get the results to STDOUT,
    you can use cat

    hadoop dfs [-cat <src>]

    with src : your file in your hdfs

    Regards
    Olivier

    Alex Parvulescu a écrit :
    Hello,

    Is there a reason for which 'hadoop dfs -get' will not output to stdout?

    I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
    have to also support outputing to stdout.


    thanks,
    alex

  • Alex Parvulescu at Mar 16, 2010 at 10:32 am
    Hello Olivier,

    I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
    This happens when I try to get all parts from a directory as a single .csv
    file.

    Something like that:
    hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
    cat: Source must be a file.

    This is what the dir looks like
    hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
    Found 3 items
    drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
    /user/hadoop-user/output/solr/_logs
    -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00000
    -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00001

    It seems -get can merge everything to one file, but cannot output to sdtout
    while 'cat' can do stdout, but it seems I have to fetch the parts one by
    one.

    Or am I missing something?

    thanks,
    alex
    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier wrote:

    Hello Alex,

    get writes down a file on your FileSystem

    hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

    with
    src : your file in your hdfs
    localdst : the name of the file with the collected data (from src) on
    your local filesystem


    To get the results to STDOUT,
    you can use cat

    hadoop dfs [-cat <src>]

    with src : your file in your hdfs

    Regards
    Olivier

    Alex Parvulescu a écrit :

    Hello,
    Is there a reason for which 'hadoop dfs -get' will not output to stdout?

    I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
    have to also support outputing to stdout.


    thanks,
    alex


  • Alex Parvulescu at Mar 16, 2010 at 1:19 pm
    Hello,

    one minor correction.

    I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
    equivalent of '-get' and they both handle only files.

    I'd like to see an equivalent of 'getmerge' to stdout.

    sorry for the confusion
    alex
    On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu wrote:

    Hello Olivier,

    I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
    This happens when I try to get all parts from a directory as a single .csv
    file.

    Something like that:
    hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
    cat: Source must be a file.

    This is what the dir looks like
    hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
    Found 3 items
    drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
    /user/hadoop-user/output/solr/_logs
    -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00000
    -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00001

    It seems -get can merge everything to one file, but cannot output to sdtout
    while 'cat' can do stdout, but it seems I have to fetch the parts one by
    one.

    Or am I missing something?

    thanks,
    alex

    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier wrote:

    Hello Alex,

    get writes down a file on your FileSystem

    hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

    with
    src : your file in your hdfs
    localdst : the name of the file with the collected data (from src) on
    your local filesystem


    To get the results to STDOUT,
    you can use cat

    hadoop dfs [-cat <src>]

    with src : your file in your hdfs

    Regards
    Olivier

    Alex Parvulescu a écrit :

    Hello,
    Is there a reason for which 'hadoop dfs -get' will not output to stdout?

    I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
    have to also support outputing to stdout.


    thanks,
    alex


  • Varene Olivier at Mar 16, 2010 at 3:45 pm
    Supposing you do have your part-r-XXXX fully ordered

    you can do

    hadoop dfs -cat "output/solr/part-*" > yourLocalFile

    tada :)

    Cheers
    Olivier


    Alex Parvulescu a écrit :
    Hello,

    one minor correction.

    I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
    equivalent of '-get' and they both handle only files.

    I'd like to see an equivalent of 'getmerge' to stdout.

    sorry for the confusion
    alex

    On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu
    wrote:

    Hello Olivier,

    I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
    This happens when I try to get all parts from a directory as a
    single .csv file.

    Something like that:
    hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
    cat: Source must be a file.

    This is what the dir looks like
    hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
    Found 3 items
    drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
    /user/hadoop-user/output/solr/_logs
    -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00000
    -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00001

    It seems -get can merge everything to one file, but cannot output to
    sdtout while 'cat' can do stdout, but it seems I have to fetch the
    parts one by one.

    Or am I missing something?

    thanks,
    alex


    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier wrote:

    Hello Alex,

    get writes down a file on your FileSystem

    hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

    with
    src : your file in your hdfs
    localdst : the name of the file with the collected data (from
    src) on
    your local filesystem


    To get the results to STDOUT,
    you can use cat

    hadoop dfs [-cat <src>]

    with src : your file in your hdfs

    Regards
    Olivier

    Alex Parvulescu a écrit :

    Hello,

    Is there a reason for which 'hadoop dfs -get' will not
    output to stdout?

    I see 'hadoop dfs -put' can handle stdin. It would seem
    that dfs would have to also support outputing to stdout.


    thanks,
    alex



  • Alex Parvulescu at Mar 17, 2010 at 11:37 am
    Hello Olivier

    It works like a charm :)

    While we are on the subject, I've sent an email to
    common-user@hadoop.apache.org about hdfs that remained unanswered. I'll
    reproduce that here, I think it's a better place for it:

    I want to achieve the 'hadoop dfs -getmerge' functionality over http. The
    closest I could find is the 'Download this file' link but this is available
    only for parts, not the whole directory (
    http://hadoop:50075/streamFile?filename=%2Fuser%2Fhadoop-user%2Foutput%2Fsolr%2F%2Fpart-00000
    )

    It seems that you can push to Solr 1.4 a csv url file. That is a link to the
    actual csv file. The problem is that a directory is not available for
    download as a merged file, in the hadoop hdfs over http interface, just the
    individual parts.

    As all the pieces are already there, it doesn't make sense to me to add a
    http (Apache?) server to this mix just to serve the processed files. I
    should be able to do that with a special url or something, maybe along the
    lines of ... /streamMergedFile?whateverPathToAFileOrDir

    As you can see it's related to my initial question on this thread :)

    thanks for your time,
    alex
    On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier wrote:


    Supposing you do have your part-r-XXXX fully ordered

    you can do

    hadoop dfs -cat "output/solr/part-*" > yourLocalFile

    tada :)

    Cheers

    Olivier


    Alex Parvulescu a écrit :
    Hello,

    one minor correction.

    I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
    equivalent of '-get' and they both handle only files.

    I'd like to see an equivalent of 'getmerge' to stdout.

    sorry for the confusion
    alex

    On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <
    alex.parvulescu@gmail.com wrote:

    Hello Olivier,

    I've tried 'cat'. This is the error I get: 'cat: Source must be a
    file.'
    This happens when I try to get all parts from a directory as a
    single .csv file.

    Something like that:
    hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
    cat: Source must be a file.
    This is what the dir looks like
    hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
    Found 3 items
    drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
    /user/hadoop-user/output/solr/_logs
    -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00000
    -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
    /user/hadoop-user/output/solr/part-00001

    It seems -get can merge everything to one file, but cannot output to
    sdtout while 'cat' can do stdout, but it seems I have to fetch the
    parts one by one.

    Or am I missing something?

    thanks,
    alex


    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr
    wrote:

    Hello Alex,

    get writes down a file on your FileSystem

    hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

    with
    src : your file in your hdfs
    localdst : the name of the file with the collected data (from
    src) on
    your local filesystem


    To get the results to STDOUT,
    you can use cat

    hadoop dfs [-cat <src>]

    with src : your file in your hdfs

    Regards
    Olivier

    Alex Parvulescu a écrit :

    Hello,

    Is there a reason for which 'hadoop dfs -get' will not
    output to stdout?

    I see 'hadoop dfs -put' can handle stdin. It would seem
    that dfs would have to also support outputing to stdout.


    thanks,
    alex




Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMar 16, '10 at 9:59a
activeMar 17, '10 at 11:37a
posts6
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Alex Parvulescu: 4 posts Varene Olivier: 2 posts

People

Translate

site design / logo © 2022 Grokbase