Unfortunately you can't control the servers that blocks go on. Hadoop does
block allocation for you, and it tries its best to distribute data evenly
among the cluster, so long as replicated blocks reside on different
machines, on different racks (assuming you've made Hadoop rack-aware).
Hope this clears things up.
Alex
2009/6/23 Hyunsik Choi <c0d3h4ck@gmail.com>
Hi all,
I would like to give data locality. In other words, I want to place
certain data blocks on one machine. In some problems, subsets of an
entire dataset need one another for answer. Most of the graph problems
are good examples.
Is it possible? If impossible, can you advice about that?
Thank you in advance.
- Hyunsik Choi -
I would like to give data locality. In other words, I want to place
certain data blocks on one machine. In some problems, subsets of an
entire dataset need one another for answer. Most of the graph problems
are good examples.
Is it possible? If impossible, can you advice about that?
Thank you in advance.
- Hyunsik Choi -