We are experimenting with HDFS as a backend store for an (Fedora Commons<http://fedora-commons.org/
>based) image repository. The image file sizes range from 50-200MB. We are
testing a Block Placement Policy that will place all blocks of an image
file to a single datanode (Replication nodes will have all blocks too. i.e.
If we have an image file with a replication factor of three, we will have
three datanodes that will have the image file in its entirety). This is
will grant us data locality for the MapReduce jobs, where each task will be
processing images in their entirety.
I know there will be both disadvantages and advantages for this approach. I
know I might also have to build a custom node choser, rather than using the
random node choser to avoid unbalancimg the cluster.
I would love hear any suggestions/comments you have.
On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:
Can you explain your need to use a different block placement policy?
In general, Cloudera doesn't recommend that people use pluggable block placement
policies. There are many corner cases that need to be taken into
consideration while writing a new policy. For example, you need to take
into account that the Balancer doesn't use the pluggable policy. This is
one of the reasons that CM doesn't expose the dfs.block.replicator.classname
Great! Thanks a lot!
On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:
You would set this in the Namenode configuration safety-valve. This is a
place where you can paste
in the raw XML (the same XML you quoted). Then, after you restart the
namenode, it will be merged
into the generated config file.
On 8 March 2013 09:30, Mohamed Mohideen wrote:
I am using Cloudera Manager 4.5 Free edition to deploy/manage our
experimental hadoop cluster. I need to add a new configuration property to
enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385
I am new to Cloudera Manager, and couldn't figure out the way to add
this configuration. The admin console only has the option to modify values
of existing properties.
If Cloudera Manager will not allow adding new configuration property,
is there an workaround to modify the configuration files directly (but,
still have other configurations managed through Cloudera Manager)?
Thanks in advance.