I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS cluster and replication factor is 1. A large file is there on one of those three cluster machines in its local file system. If I put that file in HDFS will it be divided and distributed across all three machines? I had this doubt as HDFS "moving computation is cheaper than moving data".
If file is distributed across all three machines, lots of data transfer will be there, whereas, if file is NOT distributed then compute power of other machine will be unused. Am I missing something here?