I am trying to use 4 SATA disks per node in my hadoop cluster. This is
a JBOD configuration, no RAID is involved. There is one single xfs
partition per disk, each one mounted as /local/, /local2/, /local3,
/local4 - with sufficient privileges for running hadoop jobs. HDFS is
setup across the 4 disks for a single user usage (user2) with the
following comma separated list in hadoop.tmp.dir
<description>A base for other temporary directories.</description>
What I see is that most or all data is stored on disks /local and
/local4 across nodes. Directories local2 and local3 from the other
disks are not used. I have verified that these disks can be written to
and have free space.
Isn't HDFS supposed to use all disks in a round-robin way? (provided
there is free space on all). Do I need to change another config
parameter for HDFS to spread I/O across all provided mount points?