Thanks for the reply Arun. We've recompiled the hadoop native binaries and
they seem to be loading fine. We are rerunning the job to see if it works
now.
Regards,
-vishal.
-----Original Message-----
From: Arun C Murthy
Sent: Wednesday, May 23, 2007 11:40 PM
To:
[email protected];
[email protected]Cc:
[email protected]Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3
Vishal,
On Wed, May 23, 2007 at 02:15:50PM +0530, Vishal Shah wrote:
Hi Arun,
Thanks for the reply. We figured out the root cause of the problem. We are
not using the hadoop native libs right now, and the Sun Java Deflater hangs
sometimes during the reduce phase. Our glibc version is 2.3.5, where as the
hadoop native libs need 2.4, that's why they are not being used by hadoop.
Personally I've never seen Sun's Deflater hang... but ymmv.
Anyway, there is nothing in native hadoop libs which need glibc-2.4 - looks
like this is just an artifact of the fact that the machine on which the
release was cut (i.e. the native libs in 0.12.3 were built) had glibc-2.4. I
have glibb-2.3.6-r3 on my machine and things work fine...
I was wondering if there is a version of the native libs that would work
with glibc 2.3.
I don't have access right-away to a box with glibc-2.3.5, but it's really
easy to build them yourself - details here:
http://wiki.apache.org/lucene-hadoop/NativeHadoophth,
Arun
If not, we'll have to upgrade the glibc version on all
machines to 2.4.
Regards,
-vishal.
-----Original Message-----
From: Arun C Murthy
Sent: Tuesday, May 22, 2007 4:24 PM
To:
[email protected]Cc:
[email protected]Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3
Vishal Shah wrote:
Hi,
We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
which is the latest version of hadoop 0.12.
After the upgrade, I am seeing task failures during the reduce phase
for
parse and fetch (without the parsing option) sometimes.
Usually, it's just one reduce task that creates this problem. The
jobtracker kills this task saying "Task failed to report status for 602
seconds. Killing task"
I tried running the task using IsolationRunner, and it works fine. I am
suspecting that there is probably a long computation happening during the
reduce phase for one of the keys due to which the tasktracker isn't able to
report status to the jobtracker in time.
If you suspect the long computation one way is to use the 'reporter'
parameter to your mapper/reducer to provide status updates and ensure
that the TaskTracker doesn't kill the task i.e. doesn't assume the task
has been lost.
hth,
Arun
I was wondering if anyone else has seen a similar problem and if there is
a fix for it.
Thanks,
-vishal.