I am new user of hadoop.
I am writing a mapreduce query on a relatively huge file (3 Gb). First I had
a single node hadoop installed which took approx 200 seconds.
Now I installed hadoop cluster on 10 machines and tried to use the same
query. It took nearly 230 seconds this time.
The query Im using to insert data into hdfs is -
hadoop dfs -put *.dat /data/
How to check weather the file is distributed among the 10 machines? And how
to distribute the file amongst the datanodes to make it faster?