FAQ
How about introducing a distributed coordination and locking mechanism?
ZooKeeper would be a good candidate for that kind of thing.


On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg wrote:

Hi,

I have an HDFS folder and M/R job that periodically updates it by
replacing the data with newly generated data.

I have a different M/R job that periodically or ad-hoc process the data in
the folder.

The second job ,naturally, fails sometime, when the data is replaced by
newly generated data and the job plan including the input paths have
already been submitted.

Is there an elegant solution ?

My current though is to query the jobtracker for running jobs and go over
all the input files, in the job XML to know if The swap should block until
the input path is no longer in any current executed input path job.



Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 5 | next ›
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedAug 13, '12 at 10:53a
activeAug 13, '12 at 1:01p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase