Hi Jeff,
Thanks for the suggestion, however, I'm running a small (2 machines)
cluster with CDH2 with a folder that contains two files one corrupt and
the other not but I'm still have the exception and the streamjob is
kill.
This is ok but I want to know a way to manage this exception
(java.util.zip.ZipException: invalid block type or any other) using
streaming (python).
I really appreciate If you point me to some way to catch the exception,
Thanks again
Xavier
-----Original Message-----
From: Jeff Hammerbacher
Sent: Monday, October 19, 2009 11:02 AM
To: common-user@hadoop.apache.org
Subject: Re: How to IO catch exceptions using python
Hey Xavier,
The functionality you are looking for was added to 0.19 and above:
http://issues.apache.org/jira/browse/HADOOP-3828. If you upgrade your
cluster to CDH2, you should be good to go.
Regards,
Jeff
On Mon, Oct 19, 2009 at 10:58 AM,
wrote:
Hi Everybody,
I'm doing a project where I have to read a large set of compress files
(gz). I'm using python and streaming to achieve my goals. However, I
have a problem, there are corrupt compress files that are killing my
map/reduce jobs.
My environment is the following:
Hadoop-0.18.3 (CDH1)
Do you guys have some recommendations how to manage this case?
How I can catch that exception using python so that my jobs don't fail?
How I can identify these files using python and move them to a corrupt
file folder?
I really appreciate any recommendation
Xavier