FAQ
Hi everybody,
The 10 different map-reducers store their respective outputs in 10
different files. This is the snap shot

hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -ls output5
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16
/user/hadoop/output5/MatrixA-Row1
drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16
/user/hadoop/output5/MatrixA-Row2

Now when I try to open any of these files I get an error message
hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat output5/MatrixA-Row1
cat: Source must be a file.
hadoop@zeus:~/hadoop-0.19.1$

But if I run
hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat output5/MatrixA-Row1/part-00000

I get the correct output. I do not understand why I have to give this extra
"part-00000". Now when I run a map reduce task to merge the outputs of all the
files, I give the name of the directory output5 as the Input path. But I get a
bug saying

java.io.IOException: Not a file: hdfs://zeus:18004/user/hadoop/output5/MatrixA-Row1

I cannot understand how to make the frame work read my files.

Alternatively I tried to avoid the map reduce approach for combining files and do
it via a simple program, but I am unable to start. Can some one give me some
sample implementation or something.

Any help is appreciated

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09 5:48 PM , aa225@buffalo.edu sent:
Hellow,
If I write the output of the 10 tasks in 10 different files then how do
Igo about merging the output ? Is there some in built functionality or do I
haveto write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09 5:40 PM , Gang Luo lgpubli
c@yahoo.com.cn sent:> Hi. If the output path already exists, it seems
you could not execute any> task with the same output path. I think you can
output the results of the> 10 tasks to 10 different paths, and then do sth
more (by the 11th task, for> example) to merge the 10 results into 1 file.
Gang Luo
---------
Department of Computer Science
Duke University
(919)316-0993
gang.luo@du> ke.edu


-----
å��å&Acir
c;§ï¿½Ã©ï¿½Â&r
eg;件 ---->
å��ä&A
circ;»Â¶Ã¤ÂºÂºÃ
¯Â¼ï¿½ "aa225@buffa> lo.edu" <aa225@buffa>
lo.edu>�Ã&curre
n;»¶äºº&Ati
lde;¯Â¼ï¿½ common-user@hadoop.apache.orgå�ï
¿½Ã©ï¿½ï¿½&
Atilde;¦ï¿½Â¥Ã¦ï&iqu
est;½ï¿½Ã¯Â¼ï&iques
t;½ 2009/11/22>
(å�¨æï&ique
st;½Â¥) 5:25:55
ä¸�åï&iqu
est;½ï¿½Ã¤Â¸Â&raq
uo;
é¢�ïÂ&frac
14;� Help in Hadoop>
Hello Everybody,
I have a doubt in a map reduce program and I
would appreciate any> help. I run the program using the command
bin/hadoop jar HomeWork.jar prg1> inputoutput. Ideally from within prg1, I want to
sequentially launch 10 map-> reducetasks. I want to store the output of all
these map reduce tasks in some> file.Currently I have kept the input format and
output format of the jobs as> TextInputFormat and TextOutputFormat
respectively. Now I have the> followingquestions.
1. When I run more than 1 task from the same
program, the output file of> all thetasks is same. The framework does not
allows the 2 map reduce task to> have thesame output file as task 1.
2. Before the 2 task launches I also get this error >
Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= -> alreadyinitialized
3. When the 2 map reduce tasks writes its output
to file> "output", wont theprevious content of
this file get over written ?>
Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

___________________________________________________________ >
好ç�
;©è´ºåï&i
quest;½Â¡&cce>
dil;­�ä½Â
; Ã¥ï¿½ï¿½Ã&ma
cr;¼�é�
;®ç>
;®±è´ºÃ
;¥ï¿½Â¡Ã¥ï¿&frac1
2;¨æ�°>
;ä¸�çÂ&
ordm;¿ï¼� http://card.mail.cn.yahoo.com/>

Search Discussions

  • Jason Venner at Nov 23, 2009 at 2:28 am
    set the number of reduce tasks to 1.

    2009/11/22 <aa225@buffalo.edu>
    Hi everybody,
    The 10 different map-reducers store their respective outputs in
    10
    different files. This is the snap shot

    hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -ls output5
    Found 2 items
    drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16
    /user/hadoop/output5/MatrixA-Row1
    drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16
    /user/hadoop/output5/MatrixA-Row2

    Now when I try to open any of these files I get an error message
    hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat output5/MatrixA-Row1
    cat: Source must be a file.
    hadoop@zeus:~/hadoop-0.19.1$

    But if I run
    hadoop@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat
    output5/MatrixA-Row1/part-00000

    I get the correct output. I do not understand why I have to give this extra
    "part-00000". Now when I run a map reduce task to merge the outputs of all
    the
    files, I give the name of the directory output5 as the Input path. But I
    get a
    bug saying

    java.io.IOException: Not a file:
    hdfs://zeus:18004/user/hadoop/output5/MatrixA-Row1

    I cannot understand how to make the frame work read my files.

    Alternatively I tried to avoid the map reduce approach for combining files
    and do
    it via a simple program, but I am unable to start. Can some one give me
    some
    sample implementation or something.

    Any help is appreciated

    Thank You

    Abhishek Agrawal

    SUNY- Buffalo
    (716-435-7122)

    On Sun 11/22/09 5:48 PM , aa225@buffalo.edu sent:
    Hellow,
    If I write the output of the 10 tasks in 10 different files then how do
    Igo about merging the output ? Is there some in built functionality or do I
    haveto write some code for that ?

    Thank You

    Abhishek Agrawal

    SUNY- Buffalo
    (716-435-7122)

    On Sun 11/22/09 5:40 PM , Gang Luo lgpubli
    c@yahoo.com.cn sent:> Hi. If the output path already exists, it seems
    you could not execute any> task with the same output path. I think you can
    output the results of the> 10 tasks to 10 different paths, and then do sth
    more (by the 11th task, for> example) to merge the 10 results into 1
    file.
    Gang Luo
    ---------
    Department of Computer Science
    Duke University
    (919)316-0993
    gang.luo@du> ke.edu


    -----
    å��å&Acir
    c;§ï¿½Ã©ï¿½Â&r
    eg;件 ---->
    å��ä&A
    circ;»Â¶Ã¤ÂºÂºÃ
    ¯Â¼ï¿½ "aa225@buffa> lo.edu" <aa225@buffa>
    lo.edu>�Ã&curre
    n;»¶äºº&Ati
    lde;¯Â¼ï¿½ common-user@hadoop.apache.orgå�ï
    ¿½Ã©ï¿½ï¿½&
    Atilde;¦ï¿½Â¥Ã¦ï&iqu
    est;½ï¿½Ã¯Â¼ï&iques
    t;½ 2009/11/22>
    (å�¨æï&ique
    st;½Â¥) 5:25:55
    ä¸�åï&iqu
    est;½ï¿½Ã¤Â¸Â&raq
    uo;
    é¢�ïÂ&frac
    14;� Help in Hadoop>
    Hello Everybody,
    I have a doubt in a map reduce program and I
    would appreciate any> help. I run the program using the command
    bin/hadoop jar HomeWork.jar prg1> inputoutput. Ideally from within prg1, I want to
    sequentially launch 10 map-> reducetasks. I want to store the output of all
    these map reduce tasks in some> file.Currently I have kept the input
    format and
    output format of the jobs as> TextInputFormat and TextOutputFormat
    respectively. Now I have the> followingquestions.
    1. When I run more than 1 task from the same
    program, the output file of> all thetasks is same. The framework does not
    allows the 2 map reduce task to> have thesame output file as task 1.
    2. Before the 2 task launches I also get this error >
    Cannot initialize JVM Metrics with
    processName=JobTracker, sessionId= -> alreadyinitialized
    3. When the 2 map reduce tasks writes its output
    to file> "output", wont theprevious content of
    this file get over written ?>
    Thank You

    Abhishek Agrawal

    SUNY- Buffalo
    (716-435-7122)

    ___________________________________________________________ >
    好ç�
    ;©è´ºåï&i
    quest;½Â¡&cce>
    dil;­�ä½Â
    ; �Ã&ma
    cr;¼�é�
    ;®ç>
    ;®±è´ºÃ
    ;¥ï¿½Â¡Ã¥ï¿&frac1
    2;¨æ�°>
    ;ä¸�çÂ&
    ordm;¿ï¼� http://card.mail.cn.yahoo.com/>

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 23, '09 at 1:44a
activeNov 23, '09 at 2:28a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Aa225: 1 post Jason Venner: 1 post

People

Translate

site design / logo © 2022 Grokbase