FAQ
Hi, Sorry for the cross-post. But just trying to see if anyone else
has had this issue before.
Thanks


---------- Forwarded message ----------
From: bmdevelopment <bmdevelopment@gmail.com>
Date: Fri, Jun 25, 2010 at 10:56 AM
Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
To: mapreduce-user@hadoop.apache.org


Hello,
Thanks so much for the reply.
See inline.
On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala wrote:
Hi,
I've been getting the following error when trying to run a very simple
MapReduce job.
Map finishes without problem, but error occurs as soon as it enters
Reduce phase.

10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
attempt_201006241812_0001_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

I am running a 5 node cluster and I believe I have all my settings correct:

* ulimit -n 32768
* DNS/RDNS configured properly
* hdfs-site.xml : http://pastebin.com/xuZ17bPM
* mapred-site.xml : http://pastebin.com/JraVQZcW

The program is very simple - just counts a unique string in a log file.
See here: http://pastebin.com/5uRG3SFL

When I run, the job fails and I get the following output.
http://pastebin.com/AhW6StEb

However, runs fine when I do *not* use substring() on the value (see
map function in code above).

This runs fine and completes successfully:
String str = val.toString();

This causes error and fails:
String str = val.toString().substring(0,10);

Please let me know if you need any further information.
It would be greatly appreciated if anyone could shed some light on this problem.
It catches attention that changing the code to use a substring is
causing a difference. Assuming it is consistent and not a red herring,
Yes, this has been consistent over the last week. I was running 0.20.1
first and then
upgrade to 0.20.2 but results have been exactly the same.
can you look at the counters for the two jobs using the JobTracker web
UI - things like map records, bytes etc and see if there is a
noticeable difference ?
Ok, so here is the first job using write.set(value.toString()); having
*no* errors:
http://pastebin.com/xvy0iGwL

And here is the second job using
write.set(value.toString().substring(0, 10)); that fails:
http://pastebin.com/uGw6yNqv

And here is even another where I used a longer, and therefore unique string,
by write.set(value.toString().substring(0, 20)); This makes every line
unique, similar to first job.
Still fails.
http://pastebin.com/GdQ1rp8i
Also, are the two programs being run against
the exact same input data ?
Yes, exactly the same input: a single csv file with 23K lines.
Using a shorter string leads to more like keys and therefore more
combining/reducing, but going
by the above it seems to fail whether the substring/key is entirely
unique (23000 combine output records) or
mostly the same (9 combine output records).
Also, since the cluster size is small, you could also look at the
tasktracker logs on the machines where the maps have run to see if
there are any failures when the reduce attempts start failing.
Here is the TT log from the last failed job. I do not see anything
besides the shuffle failure, but there
may be something I am overlooking or simply do not understand.
http://pastebin.com/DKFTyGXg

Thanks again!
Thanks
Hemanth

Search Discussions

  • Deepak Diwakar at Jul 27, 2010 at 7:31 pm
    Hey friends,

    I got stuck on setting up hdfs cluster and getting this error while running
    simple wordcount example(I did that 2 yrs back not had any problem).

    Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed from
    (
    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    ).

    I checked the firewall settings and /etc/hosts there is no issue there.
    Also master and slave are accessible both ways.

    Also the input size very low ~ 3 MB and hence there shouldn't be no issue
    because ulimit(its btw of 4096).

    Would be really thankful if anyone can guide me to resolve this.

    Thanks & regards,
    - Deepak Diwakar,



    On 28 June 2010 18:39, bmdevelopment wrote:

    Hi, Sorry for the cross-post. But just trying to see if anyone else
    has had this issue before.
    Thanks


    ---------- Forwarded message ----------
    From: bmdevelopment <bmdevelopment@gmail.com>
    Date: Fri, Jun 25, 2010 at 10:56 AM
    Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
    bailing-out.
    To: mapreduce-user@hadoop.apache.org


    Hello,
    Thanks so much for the reply.
    See inline.
    On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala wrote:
    Hi,
    I've been getting the following error when trying to run a very simple
    MapReduce job.
    Map finishes without problem, but error occurs as soon as it enters
    Reduce phase.

    10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
    attempt_201006241812_0001_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

    I am running a 5 node cluster and I believe I have all my settings
    correct:
    * ulimit -n 32768
    * DNS/RDNS configured properly
    * hdfs-site.xml : http://pastebin.com/xuZ17bPM
    * mapred-site.xml : http://pastebin.com/JraVQZcW

    The program is very simple - just counts a unique string in a log file.
    See here: http://pastebin.com/5uRG3SFL

    When I run, the job fails and I get the following output.
    http://pastebin.com/AhW6StEb

    However, runs fine when I do *not* use substring() on the value (see
    map function in code above).

    This runs fine and completes successfully:
    String str = val.toString();

    This causes error and fails:
    String str = val.toString().substring(0,10);

    Please let me know if you need any further information.
    It would be greatly appreciated if anyone could shed some light on this
    problem.
    It catches attention that changing the code to use a substring is
    causing a difference. Assuming it is consistent and not a red herring,
    Yes, this has been consistent over the last week. I was running 0.20.1
    first and then
    upgrade to 0.20.2 but results have been exactly the same.
    can you look at the counters for the two jobs using the JobTracker web
    UI - things like map records, bytes etc and see if there is a
    noticeable difference ?
    Ok, so here is the first job using write.set(value.toString()); having
    *no* errors:
    http://pastebin.com/xvy0iGwL

    And here is the second job using
    write.set(value.toString().substring(0, 10)); that fails:
    http://pastebin.com/uGw6yNqv

    And here is even another where I used a longer, and therefore unique
    string,
    by write.set(value.toString().substring(0, 20)); This makes every line
    unique, similar to first job.
    Still fails.
    http://pastebin.com/GdQ1rp8i
    Also, are the two programs being run against
    the exact same input data ?
    Yes, exactly the same input: a single csv file with 23K lines.
    Using a shorter string leads to more like keys and therefore more
    combining/reducing, but going
    by the above it seems to fail whether the substring/key is entirely
    unique (23000 combine output records) or
    mostly the same (9 combine output records).
    Also, since the cluster size is small, you could also look at the
    tasktracker logs on the machines where the maps have run to see if
    there are any failures when the reduce attempts start failing.
    Here is the TT log from the last failed job. I do not see anything
    besides the shuffle failure, but there
    may be something I am overlooking or simply do not understand.
    http://pastebin.com/DKFTyGXg

    Thanks again!
    Thanks
    Hemanth
  • He Chen at Jul 27, 2010 at 9:01 pm
    Hey Deepak Diwakar

    Try to keep the /etc/hosts file as the same among all your cluster nodes.
    See whether this problem will disappear.
    On Tue, Jul 27, 2010 at 2:31 PM, Deepak Diwakar wrote:

    Hey friends,

    I got stuck on setting up hdfs cluster and getting this error while running
    simple wordcount example(I did that 2 yrs back not had any problem).

    Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed
    from
    (

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    ).

    I checked the firewall settings and /etc/hosts there is no issue there.
    Also master and slave are accessible both ways.

    Also the input size very low ~ 3 MB and hence there shouldn't be no issue
    because ulimit(its btw of 4096).

    Would be really thankful if anyone can guide me to resolve this.

    Thanks & regards,
    - Deepak Diwakar,



    On 28 June 2010 18:39, bmdevelopment wrote:

    Hi, Sorry for the cross-post. But just trying to see if anyone else
    has had this issue before.
    Thanks


    ---------- Forwarded message ----------
    From: bmdevelopment <bmdevelopment@gmail.com>
    Date: Fri, Jun 25, 2010 at 10:56 AM
    Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
    bailing-out.
    To: mapreduce-user@hadoop.apache.org


    Hello,
    Thanks so much for the reply.
    See inline.

    On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhemanth@gmail.com>
    wrote:
    Hi,
    I've been getting the following error when trying to run a very simple
    MapReduce job.
    Map finishes without problem, but error occurs as soon as it enters
    Reduce phase.

    10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
    attempt_201006241812_0001_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

    I am running a 5 node cluster and I believe I have all my settings
    correct:
    * ulimit -n 32768
    * DNS/RDNS configured properly
    * hdfs-site.xml : http://pastebin.com/xuZ17bPM
    * mapred-site.xml : http://pastebin.com/JraVQZcW

    The program is very simple - just counts a unique string in a log
    file.
    See here: http://pastebin.com/5uRG3SFL

    When I run, the job fails and I get the following output.
    http://pastebin.com/AhW6StEb

    However, runs fine when I do *not* use substring() on the value (see
    map function in code above).

    This runs fine and completes successfully:
    String str = val.toString();

    This causes error and fails:
    String str = val.toString().substring(0,10);

    Please let me know if you need any further information.
    It would be greatly appreciated if anyone could shed some light on
    this
    problem.
    It catches attention that changing the code to use a substring is
    causing a difference. Assuming it is consistent and not a red herring,
    Yes, this has been consistent over the last week. I was running 0.20.1
    first and then
    upgrade to 0.20.2 but results have been exactly the same.
    can you look at the counters for the two jobs using the JobTracker web
    UI - things like map records, bytes etc and see if there is a
    noticeable difference ?
    Ok, so here is the first job using write.set(value.toString()); having
    *no* errors:
    http://pastebin.com/xvy0iGwL

    And here is the second job using
    write.set(value.toString().substring(0, 10)); that fails:
    http://pastebin.com/uGw6yNqv

    And here is even another where I used a longer, and therefore unique
    string,
    by write.set(value.toString().substring(0, 20)); This makes every line
    unique, similar to first job.
    Still fails.
    http://pastebin.com/GdQ1rp8i
    Also, are the two programs being run against
    the exact same input data ?
    Yes, exactly the same input: a single csv file with 23K lines.
    Using a shorter string leads to more like keys and therefore more
    combining/reducing, but going
    by the above it seems to fail whether the substring/key is entirely
    unique (23000 combine output records) or
    mostly the same (9 combine output records).
    Also, since the cluster size is small, you could also look at the
    tasktracker logs on the machines where the maps have run to see if
    there are any failures when the reduce attempts start failing.
    Here is the TT log from the last failed job. I do not see anything
    besides the shuffle failure, but there
    may be something I am overlooking or simply do not understand.
    http://pastebin.com/DKFTyGXg

    Thanks again!
    Thanks
    Hemanth


    --
    Best Wishes!
    顺送商祺!

    --
    Chen He
    (402)613-9298
    PhD. student of CSE Dept.
    Research Assistant of Holland Computing Center
    University of Nebraska-Lincoln
    Lincoln NE 68588
  • C.V.Krishnakumar at Jul 27, 2010 at 9:30 pm
    Hi Deepak,

    YOu could refer this too : http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results
    I tried those instructions and it is working for me.
    Regards,
    Krishna
    On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote:

    Hey friends,

    I got stuck on setting up hdfs cluster and getting this error while running
    simple wordcount example(I did that 2 yrs back not had any problem).

    Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed from
    (
    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    ).

    I checked the firewall settings and /etc/hosts there is no issue there.
    Also master and slave are accessible both ways.

    Also the input size very low ~ 3 MB and hence there shouldn't be no issue
    because ulimit(its btw of 4096).

    Would be really thankful if anyone can guide me to resolve this.

    Thanks & regards,
    - Deepak Diwakar,



    On 28 June 2010 18:39, bmdevelopment wrote:

    Hi, Sorry for the cross-post. But just trying to see if anyone else
    has had this issue before.
    Thanks


    ---------- Forwarded message ----------
    From: bmdevelopment <bmdevelopment@gmail.com>
    Date: Fri, Jun 25, 2010 at 10:56 AM
    Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
    bailing-out.
    To: mapreduce-user@hadoop.apache.org


    Hello,
    Thanks so much for the reply.
    See inline.

    On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhemanth@gmail.com>
    wrote:
    Hi,
    I've been getting the following error when trying to run a very simple
    MapReduce job.
    Map finishes without problem, but error occurs as soon as it enters
    Reduce phase.

    10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
    attempt_201006241812_0001_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

    I am running a 5 node cluster and I believe I have all my settings
    correct:
    * ulimit -n 32768
    * DNS/RDNS configured properly
    * hdfs-site.xml : http://pastebin.com/xuZ17bPM
    * mapred-site.xml : http://pastebin.com/JraVQZcW

    The program is very simple - just counts a unique string in a log file.
    See here: http://pastebin.com/5uRG3SFL

    When I run, the job fails and I get the following output.
    http://pastebin.com/AhW6StEb

    However, runs fine when I do *not* use substring() on the value (see
    map function in code above).

    This runs fine and completes successfully:
    String str = val.toString();

    This causes error and fails:
    String str = val.toString().substring(0,10);

    Please let me know if you need any further information.
    It would be greatly appreciated if anyone could shed some light on this
    problem.
    It catches attention that changing the code to use a substring is
    causing a difference. Assuming it is consistent and not a red herring,
    Yes, this has been consistent over the last week. I was running 0.20.1
    first and then
    upgrade to 0.20.2 but results have been exactly the same.
    can you look at the counters for the two jobs using the JobTracker web
    UI - things like map records, bytes etc and see if there is a
    noticeable difference ?
    Ok, so here is the first job using write.set(value.toString()); having
    *no* errors:
    http://pastebin.com/xvy0iGwL

    And here is the second job using
    write.set(value.toString().substring(0, 10)); that fails:
    http://pastebin.com/uGw6yNqv

    And here is even another where I used a longer, and therefore unique
    string,
    by write.set(value.toString().substring(0, 20)); This makes every line
    unique, similar to first job.
    Still fails.
    http://pastebin.com/GdQ1rp8i
    Also, are the two programs being run against
    the exact same input data ?
    Yes, exactly the same input: a single csv file with 23K lines.
    Using a shorter string leads to more like keys and therefore more
    combining/reducing, but going
    by the above it seems to fail whether the substring/key is entirely
    unique (23000 combine output records) or
    mostly the same (9 combine output records).
    Also, since the cluster size is small, you could also look at the
    tasktracker logs on the machines where the maps have run to see if
    there are any failures when the reduce attempts start failing.
    Here is the TT log from the last failed job. I do not see anything
    besides the shuffle failure, but there
    may be something I am overlooking or simply do not understand.
    http://pastebin.com/DKFTyGXg

    Thanks again!
    Thanks
    Hemanth
  • C.V.Krishnakumar at Jul 27, 2010 at 9:39 pm
    Hi Deepak,

    Maybe I did not make my mail clear. I had tried the instructions in the blog you mentioned. They are working for me.
    Did you change the /etc/hosts file at any point of time?

    Regards,
    Krishna
    On Jul 27, 2010, at 2:30 PM, C.V.Krishnakumar wrote:

    Hi Deepak,

    YOu could refer this too : http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results
    I tried those instructions and it is working for me.
    Regards,
    Krishna
    On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote:

    Hey friends,

    I got stuck on setting up hdfs cluster and getting this error while running
    simple wordcount example(I did that 2 yrs back not had any problem).

    Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed from
    (
    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    ).

    I checked the firewall settings and /etc/hosts there is no issue there.
    Also master and slave are accessible both ways.

    Also the input size very low ~ 3 MB and hence there shouldn't be no issue
    because ulimit(its btw of 4096).

    Would be really thankful if anyone can guide me to resolve this.

    Thanks & regards,
    - Deepak Diwakar,



    On 28 June 2010 18:39, bmdevelopment wrote:

    Hi, Sorry for the cross-post. But just trying to see if anyone else
    has had this issue before.
    Thanks


    ---------- Forwarded message ----------
    From: bmdevelopment <bmdevelopment@gmail.com>
    Date: Fri, Jun 25, 2010 at 10:56 AM
    Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
    bailing-out.
    To: mapreduce-user@hadoop.apache.org


    Hello,
    Thanks so much for the reply.
    See inline.

    On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhemanth@gmail.com>
    wrote:
    Hi,
    I've been getting the following error when trying to run a very simple
    MapReduce job.
    Map finishes without problem, but error occurs as soon as it enters
    Reduce phase.

    10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
    attempt_201006241812_0001_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

    I am running a 5 node cluster and I believe I have all my settings
    correct:
    * ulimit -n 32768
    * DNS/RDNS configured properly
    * hdfs-site.xml : http://pastebin.com/xuZ17bPM
    * mapred-site.xml : http://pastebin.com/JraVQZcW

    The program is very simple - just counts a unique string in a log file.
    See here: http://pastebin.com/5uRG3SFL

    When I run, the job fails and I get the following output.
    http://pastebin.com/AhW6StEb

    However, runs fine when I do *not* use substring() on the value (see
    map function in code above).

    This runs fine and completes successfully:
    String str = val.toString();

    This causes error and fails:
    String str = val.toString().substring(0,10);

    Please let me know if you need any further information.
    It would be greatly appreciated if anyone could shed some light on this
    problem.
    It catches attention that changing the code to use a substring is
    causing a difference. Assuming it is consistent and not a red herring,
    Yes, this has been consistent over the last week. I was running 0.20.1
    first and then
    upgrade to 0.20.2 but results have been exactly the same.
    can you look at the counters for the two jobs using the JobTracker web
    UI - things like map records, bytes etc and see if there is a
    noticeable difference ?
    Ok, so here is the first job using write.set(value.toString()); having
    *no* errors:
    http://pastebin.com/xvy0iGwL

    And here is the second job using
    write.set(value.toString().substring(0, 10)); that fails:
    http://pastebin.com/uGw6yNqv

    And here is even another where I used a longer, and therefore unique
    string,
    by write.set(value.toString().substring(0, 20)); This makes every line
    unique, similar to first job.
    Still fails.
    http://pastebin.com/GdQ1rp8i
    Also, are the two programs being run against
    the exact same input data ?
    Yes, exactly the same input: a single csv file with 23K lines.
    Using a shorter string leads to more like keys and therefore more
    combining/reducing, but going
    by the above it seems to fail whether the substring/key is entirely
    unique (23000 combine output records) or
    mostly the same (9 combine output records).
    Also, since the cluster size is small, you could also look at the
    tasktracker logs on the machines where the maps have run to see if
    there are any failures when the reduce attempts start failing.
    Here is the TT log from the last failed job. I do not see anything
    besides the shuffle failure, but there
    may be something I am overlooking or simply do not understand.
    http://pastebin.com/DKFTyGXg

    Thanks again!
    Thanks
    Hemanth
  • Deepak Diwakar at Jul 28, 2010 at 9:52 am
    Thanks Krishna and Chen.

    Yes problem was in /etc/hosts. In fact on each node there was unique
    identifier like necromancer/rocker etc.. which is the only difference in
    /etc/hosts amongst the nodes. Once I put same identifier for all, it worked.


    Thanks & regards
    - Deepak Diwakar,



    On 28 July 2010 03:09, C.V.Krishnakumar wrote:

    Hi Deepak,

    Maybe I did not make my mail clear. I had tried the instructions in the
    blog you mentioned. They are working for me.
    Did you change the /etc/hosts file at any point of time?

    Regards,
    Krishna
    On Jul 27, 2010, at 2:30 PM, C.V.Krishnakumar wrote:

    Hi Deepak,

    YOu could refer this too :
    http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results
    I tried those instructions and it is working for me.
    Regards,
    Krishna
    On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote:

    Hey friends,

    I got stuck on setting up hdfs cluster and getting this error while
    running
    simple wordcount example(I did that 2 yrs back not had any problem).

    Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed
    from
    (
    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    ).

    I checked the firewall settings and /etc/hosts there is no issue there.
    Also master and slave are accessible both ways.

    Also the input size very low ~ 3 MB and hence there shouldn't be no
    issue
    because ulimit(its btw of 4096).

    Would be really thankful if anyone can guide me to resolve this.

    Thanks & regards,
    - Deepak Diwakar,



    On 28 June 2010 18:39, bmdevelopment wrote:

    Hi, Sorry for the cross-post. But just trying to see if anyone else
    has had this issue before.
    Thanks


    ---------- Forwarded message ----------
    From: bmdevelopment <bmdevelopment@gmail.com>
    Date: Fri, Jun 25, 2010 at 10:56 AM
    Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
    bailing-out.
    To: mapreduce-user@hadoop.apache.org


    Hello,
    Thanks so much for the reply.
    See inline.

    On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhemanth@gmail.com
    wrote:
    Hi,
    I've been getting the following error when trying to run a very
    simple
    MapReduce job.
    Map finishes without problem, but error occurs as soon as it enters
    Reduce phase.

    10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
    attempt_201006241812_0001_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

    I am running a 5 node cluster and I believe I have all my settings
    correct:
    * ulimit -n 32768
    * DNS/RDNS configured properly
    * hdfs-site.xml : http://pastebin.com/xuZ17bPM
    * mapred-site.xml : http://pastebin.com/JraVQZcW

    The program is very simple - just counts a unique string in a log
    file.
    See here: http://pastebin.com/5uRG3SFL

    When I run, the job fails and I get the following output.
    http://pastebin.com/AhW6StEb

    However, runs fine when I do *not* use substring() on the value (see
    map function in code above).

    This runs fine and completes successfully:
    String str = val.toString();

    This causes error and fails:
    String str = val.toString().substring(0,10);

    Please let me know if you need any further information.
    It would be greatly appreciated if anyone could shed some light on
    this
    problem.
    It catches attention that changing the code to use a substring is
    causing a difference. Assuming it is consistent and not a red herring,
    Yes, this has been consistent over the last week. I was running 0.20.1
    first and then
    upgrade to 0.20.2 but results have been exactly the same.
    can you look at the counters for the two jobs using the JobTracker web
    UI - things like map records, bytes etc and see if there is a
    noticeable difference ?
    Ok, so here is the first job using write.set(value.toString()); having
    *no* errors:
    http://pastebin.com/xvy0iGwL

    And here is the second job using
    write.set(value.toString().substring(0, 10)); that fails:
    http://pastebin.com/uGw6yNqv

    And here is even another where I used a longer, and therefore unique
    string,
    by write.set(value.toString().substring(0, 20)); This makes every line
    unique, similar to first job.
    Still fails.
    http://pastebin.com/GdQ1rp8i
    Also, are the two programs being run against
    the exact same input data ?
    Yes, exactly the same input: a single csv file with 23K lines.
    Using a shorter string leads to more like keys and therefore more
    combining/reducing, but going
    by the above it seems to fail whether the substring/key is entirely
    unique (23000 combine output records) or
    mostly the same (9 combine output records).
    Also, since the cluster size is small, you could also look at the
    tasktracker logs on the machines where the maps have run to see if
    there are any failures when the reduce attempts start failing.
    Here is the TT log from the last failed job. I do not see anything
    besides the shuffle failure, but there
    may be something I am overlooking or simply do not understand.
    http://pastebin.com/DKFTyGXg

    Thanks again!
    Thanks
    Hemanth

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 28, '10 at 1:10p
activeJul 28, '10 at 9:52a
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase