At our side we are using Lustre and it seems to work fine. The below
configuration will suite to any shared/cluster filesystem:
core-site.xml-
<property>
<name>hadoop.tmp.dir</name>
<value><directory on GlustreFS>/mapredClusterLocalDir/${user.name}</value>
</property>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value><directory on GlustreFS>/nameNode/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value><directory on GlustreFS>/dataNode/</value>
</property>
mapred-site.xml
<property>
<name>mapred.system.dir</name>
<value><directory on GlustreFS>/mapredSystemDir/</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value><directory on GlustreFS>/mapredTempDir/</value>
</property>
<property>
<name>mapred.local.dir</name>
<value><directory on GlustreFS>/mapredLocalDir/${hadoop.host}/</value>
</property>
############
mapred.local.dir is to be uniq(non-shared) for every tasktracker, so
absolute value will not work here.
What we have done is to set
HADOOP_OPTS="-Dhadoop.host=$HOSTNAME" before starting the tasktracker.
Hope this helps.
On Wed, Feb 23, 2011 at 2:22 AM, Andrew Levine wrote:
Hello all,
I am using a Eucalyptus cloud and can get things working using
directories local to individual instances. There is an instance of
GlusterFS for the cloud. Has anyone used GlusterFS as a replacement
for HDFS? Do you sample configs? Any help anyone can give would be
great.
Does anyone have any experience with GlusterFS? Or is this a "RUN FOR
THE HILLS" type situation?
andrew
--
---
Rishi Pathak
National PARAM Supercomputing Facility
C-DAC, Pune, India