I am curious to know if anyone has tried using map-reduce across multiple
data centers? The use case that I have in my mind where the dataset is
geographically distributed across multiple data centers and it may be not be
cost effective to move the data to a single site (e.g. due to limitation of
network bandwidth across sites etc.) How such scenario is taken care today?
As per my understanding, there is a feature request filed against HDFS to be
distributed across data centers (e.g. for disaster recovery etc.). For
details, please refer to following link
Can anyone share any thoughts regarding pros and cons of this approach?