FAQ
Hi,

I am using Cloudera Manager 4.5 Free edition to deploy/manage our
experimental hadoop cluster. I need to add a new configuration property to
enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
.


<property>
<name>dfs.block.replicator.classname</name>
<value>org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyNEW</value>
</property>



I am new to Cloudera Manager, and couldn't figure out the way to add this
configuration. The admin console only has the option to modify values of
existing properties.

If Cloudera Manager will not allow adding new configuration property, is
there an workaround to modify the configuration files directly (but, still
have other configurations managed through Cloudera Manager)?

Thanks in advance.

- Mohamed

Search Discussions

  • Philip Langdale at Mar 8, 2013 at 5:34 pm
    You would set this in the Namenode configuration safety-valve. This is a
    place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .


    <property>
    <name>dfs.block.replicator.classname</name>
    <value>org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyNEW</value>
    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add this
    configuration. The admin console only has the option to modify values of
    existing properties.

    If Cloudera Manager will not allow adding new configuration property, is
    there an workaround to modify the configuration files directly (but, still
    have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Mohamed Mohideen at Mar 8, 2013 at 5:45 pm
    Great! Thanks a lot!
    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This is a
    place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil


    On 8 March 2013 09:30, Mohamed Mohideen <mohid...@gmail.com <javascript:>>wrote:
    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .


    <property>
    <name>dfs.block.replicator.classname</name>
    <value>org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyNEW</value>
    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add this
    configuration. The admin console only has the option to modify values of
    existing properties.

    If Cloudera Manager will not allow adding new configuration property, is
    there an workaround to modify the configuration files directly (but, still
    have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Vinithra Varadharajan at Mar 12, 2013 at 2:00 am
    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block
    placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This is
    one of the reasons that CM doesn't expose the dfs.block.replicator.classname
    property.

    -Vinithra
    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen wrote:

    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This is a
    place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .



    <property>
    <name>dfs.block.replicator.**classname</name>

    <value>org.apache.hadoop.hdfs.**server.namenode.**BlockPlacementPolicyNEW</**value>

    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add
    this configuration. The admin console only has the option to modify values
    of existing properties.

    If Cloudera Manager will not allow adding new configuration property, is
    there an workaround to modify the configuration files directly (but, still
    have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Mohamed Mohideen at Mar 12, 2013 at 12:35 pm
    Hello Vinithra,

    We are experimenting with HDFS as a backend store for an (Fedora Commons<http://fedora-commons.org/>based) image repository. The image file sizes range from 50-200MB. We are
    testing a Block Placement Policy that will place all blocks of an image
    file to a single datanode (Replication nodes will have all blocks too. i.e.
    If we have an image file with a replication factor of three, we will have
    three datanodes that will have the image file in its entirety). This is
    will grant us data locality for the MapReduce jobs, where each task will be
    processing images in their entirety.

    I know there will be both disadvantages and advantages for this approach. I
    know I might also have to build a custom node choser, rather than using the
    random node choser to avoid unbalancimg the cluster.

    I would love hear any suggestions/comments you have.

    Thanks
    On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:

    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This is
    one of the reasons that CM doesn't expose the dfs.block.replicator.classname
    property.

    -Vinithra

    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen <mohid...@gmail.com<javascript:>
    wrote:
    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This is a
    place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .




    <property>
    <name>dfs.block.replicator.**classname</name>


    <value>org.apache.hadoop.hdfs.**server.namenode.**BlockPlacementPolicyNEW</**value>


    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add
    this configuration. The admin console only has the option to modify values
    of existing properties.

    If Cloudera Manager will not allow adding new configuration property,
    is there an workaround to modify the configuration files directly (but,
    still have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Vinithra Varadharajan at Mar 12, 2013 at 6:59 pm
    Mohamed,

    You will be better off setting the HDFS Block size (dfs.blocksize) to
    something like 200MB, i.e. according to the range of your file sizes. The
    block size can also be set on the client side - so you could arguably
    change the block size per image file.

    -Vinithra
    On Tue, Mar 12, 2013 at 5:35 AM, Mohamed Mohideen wrote:

    Hello Vinithra,

    We are experimenting with HDFS as a backend store for an (Fedora Commons<http://fedora-commons.org/>based) image repository. The image file sizes range from 50-200MB. We are
    testing a Block Placement Policy that will place all blocks of an image
    file to a single datanode (Replication nodes will have all blocks too. i.e.
    If we have an image file with a replication factor of three, we will have
    three datanodes that will have the image file in its entirety). This is
    will grant us data locality for the MapReduce jobs, where each task will be
    processing images in their entirety.

    I know there will be both disadvantages and advantages for this approach.
    I know I might also have to build a custom node choser, rather than using
    the random node choser to avoid unbalancimg the cluster.

    I would love hear any suggestions/comments you have.

    Thanks

    On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:

    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This is
    one of the reasons that CM doesn't expose the dfs.block.replicator.**classname
    property.

    -Vinithra
    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen wrote:

    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This is
    a place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .





    <property>
    <name>dfs.block.replicator.**cla**ssname</name>



    <value>org.apache.hadoop.hdfs.****server.namenode.**BlockPlacement**PolicyNEW</**value>


    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add
    this configuration. The admin console only has the option to modify values
    of existing properties.

    If Cloudera Manager will not allow adding new configuration property,
    is there an workaround to modify the configuration files directly (but,
    still have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Mohamed Mohideen at Mar 12, 2013 at 8:45 pm
    Thought about it earlliar. Would that affect the performance of read
    operations? (i.e. Will reading a file from multiple blocks (from multiple
    datanodes - replicated) improve the client read speed?) I guess this would
    depend on the DFSClient implementation. I couldn't figure out how the
    default client works?
    On Tuesday, 12 March 2013 14:59:20 UTC-4, Vinithra wrote:

    Mohamed,

    You will be better off setting the HDFS Block size (dfs.blocksize) to
    something like 200MB, i.e. according to the range of your file sizes. The
    block size can also be set on the client side - so you could arguably
    change the block size per image file.

    -Vinithra

    On Tue, Mar 12, 2013 at 5:35 AM, Mohamed Mohideen <mohid...@gmail.com<javascript:>
    wrote:
    Hello Vinithra,

    We are experimenting with HDFS as a backend store for an (Fedora Commons<http://fedora-commons.org/>based) image repository. The image file sizes range from 50-200MB. We are
    testing a Block Placement Policy that will place all blocks of an image
    file to a single datanode (Replication nodes will have all blocks too. i.e.
    If we have an image file with a replication factor of three, we will have
    three datanodes that will have the image file in its entirety). This is
    will grant us data locality for the MapReduce jobs, where each task will be
    processing images in their entirety.

    I know there will be both disadvantages and advantages for this approach.
    I know I might also have to build a custom node choser, rather than using
    the random node choser to avoid unbalancimg the cluster.

    I would love hear any suggestions/comments you have.

    Thanks

    On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:

    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This
    is one of the reasons that CM doesn't expose the dfs.block.replicator.**classname
    property.

    -Vinithra
    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen wrote:

    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This is
    a place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .






    <property>
    <name>dfs.block.replicator.**cla**ssname</name>




    <value>org.apache.hadoop.hdfs.****server.namenode.**BlockPlacement**PolicyNEW</**value>



    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add
    this configuration. The admin console only has the option to modify values
    of existing properties.

    If Cloudera Manager will not allow adding new configuration property,
    is there an workaround to modify the configuration files directly (but,
    still have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Vinithra Varadharajan at Mar 13, 2013 at 12:15 am
    Read performance usually improves with data locality, unless your network
    speed is faster than your disks. In any case, I'd highly recommend you
    first try modifying the block size and running some performance benchmarks,
    before you go down the route of replacing the block placement policy.
    On Tue, Mar 12, 2013 at 1:45 PM, Mohamed Mohideen wrote:

    Thought about it earlliar. Would that affect the performance of read
    operations? (i.e. Will reading a file from multiple blocks (from multiple
    datanodes - replicated) improve the client read speed?) I guess this would
    depend on the DFSClient implementation. I couldn't figure out how the
    default client works?
    On Tuesday, 12 March 2013 14:59:20 UTC-4, Vinithra wrote:

    Mohamed,

    You will be better off setting the HDFS Block size (dfs.blocksize) to
    something like 200MB, i.e. according to the range of your file sizes. The
    block size can also be set on the client side - so you could arguably
    change the block size per image file.

    -Vinithra
    On Tue, Mar 12, 2013 at 5:35 AM, Mohamed Mohideen wrote:

    Hello Vinithra,

    We are experimenting with HDFS as a backend store for an (Fedora Commons<http://fedora-commons.org/>based) image repository. The image file sizes range from 50-200MB. We are
    testing a Block Placement Policy that will place all blocks of an image
    file to a single datanode (Replication nodes will have all blocks too. i.e.
    If we have an image file with a replication factor of three, we will have
    three datanodes that will have the image file in its entirety). This is
    will grant us data locality for the MapReduce jobs, where each task will be
    processing images in their entirety.

    I know there will be both disadvantages and advantages for this
    approach. I know I might also have to build a custom node choser, rather
    than using the random node choser to avoid unbalancimg the cluster.

    I would love hear any suggestions/comments you have.

    Thanks

    On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:

    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This
    is one of the reasons that CM doesn't expose the dfs.block.replicator.*
    *class**name property.

    -Vinithra
    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen wrote:

    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This
    is a place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart the
    namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .







    <property>
    <name>dfs.block.replicator.**cla****ssname</name>





    <value>org.apache.hadoop.hdfs.******server.namenode.**BlockPlacement****PolicyNEW</**value>




    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to add
    this configuration. The admin console only has the option to modify values
    of existing properties.

    If Cloudera Manager will not allow adding new configuration
    property, is there an workaround to modify the configuration files directly
    (but, still have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

  • Mohamed Mohideen at Mar 13, 2013 at 12:38 pm
    Thanks for your suggestions! Will surely try.

    -Mohamed
    On Tuesday, 12 March 2013 20:14:51 UTC-4, Vinithra wrote:

    Read performance usually improves with data locality, unless your network
    speed is faster than your disks. In any case, I'd highly recommend you
    first try modifying the block size and running some performance benchmarks,
    before you go down the route of replacing the block placement policy.

    On Tue, Mar 12, 2013 at 1:45 PM, Mohamed Mohideen <mohid...@gmail.com<javascript:>
    wrote:
    Thought about it earlliar. Would that affect the performance of read
    operations? (i.e. Will reading a file from multiple blocks (from multiple
    datanodes - replicated) improve the client read speed?) I guess this would
    depend on the DFSClient implementation. I couldn't figure out how the
    default client works?
    On Tuesday, 12 March 2013 14:59:20 UTC-4, Vinithra wrote:

    Mohamed,

    You will be better off setting the HDFS Block size (dfs.blocksize) to
    something like 200MB, i.e. according to the range of your file sizes. The
    block size can also be set on the client side - so you could arguably
    change the block size per image file.

    -Vinithra
    On Tue, Mar 12, 2013 at 5:35 AM, Mohamed Mohideen wrote:

    Hello Vinithra,

    We are experimenting with HDFS as a backend store for an (Fedora
    Commons <http://fedora-commons.org/> based) image repository. The
    image file sizes range from 50-200MB. We are testing a Block Placement
    Policy that will place all blocks of an image file to a single datanode
    (Replication nodes will have all blocks too. i.e. If we have an image file
    with a replication factor of three, we will have three datanodes that will
    have the image file in its entirety). This is will grant us data locality
    for the MapReduce jobs, where each task will be processing images in their
    entirety.

    I know there will be both disadvantages and advantages for this
    approach. I know I might also have to build a custom node choser, rather
    than using the random node choser to avoid unbalancimg the cluster.

    I would love hear any suggestions/comments you have.

    Thanks

    On Monday, 11 March 2013 22:00:05 UTC-4, Vinithra wrote:

    Mohamed,

    Can you explain your need to use a different block placement policy?

    In general, Cloudera doesn't recommend that people use pluggable block placement
    policies. There are many corner cases that need to be taken into
    consideration while writing a new policy. For example, you need to take
    into account that the Balancer doesn't use the pluggable policy. This
    is one of the reasons that CM doesn't expose the dfs.block.replicator.
    **class**name property.

    -Vinithra
    On Fri, Mar 8, 2013 at 9:45 AM, Mohamed Mohideen wrote:

    Great! Thanks a lot!

    On Friday, 8 March 2013 12:34:45 UTC-5, Philip Langdale wrote:

    You would set this in the Namenode configuration safety-valve. This
    is a place where you can paste
    in the raw XML (the same XML you quoted). Then, after you restart
    the namenode, it will be merged
    into the generated config file.

    --phil

    On 8 March 2013 09:30, Mohamed Mohideen wrote:

    Hi,

    I am using Cloudera Manager 4.5 Free edition to deploy/manage our
    experimental hadoop cluster. I need to add a new configuration property to
    enable the pluggable BlockPlacementPolicy feature<https://issues.apache.org/jira/browse/HDFS-385>
    .








    <property>
    <name>dfs.block.replicator.**cla****ssname</name>






    <value>org.apache.hadoop.hdfs.******server.namenode.**BlockPlacement****PolicyNEW</**value>





    </property>



    I am new to Cloudera Manager, and couldn't figure out the way to
    add this configuration. The admin console only has the option to modify
    values of existing properties.

    If Cloudera Manager will not allow adding new configuration
    property, is there an workaround to modify the configuration files directly
    (but, still have other configurations managed through Cloudera Manager)?

    Thanks in advance.

    - Mohamed

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 8, '13 at 5:30p
activeMar 13, '13 at 12:38p
posts9
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase