Grokbase Groups HBase dev July 2011
FAQ
Running the DataNode inside of an HBase process seems like this could
be a good option to enable?

Specifically because it would reduce the number of processes on an
HBase instance. Eg, I think one of the barriers to adoption for HBase
in general is the multiple processes management part. Are there any
known issues with doing this?

In addition to the DataNode, one could auto-specify which servers
should be running Zookeeper and start ZK inside of the HBase
process(es).

Search Discussions

  • Ted Dunning at Jul 17, 2011 at 2:35 am

    On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen wrote:

    Running the DataNode inside of an HBase process seems like this could
    be a good option to enable?
    My gut is that this would be a maintenance headache.

    Specifically because it would reduce the number of processes on an
    HBase instance. Eg, I think one of the barriers to adoption for HBase
    in general is the multiple processes management part. Are there any
    known issues with doing this?
    Well, I think you are right about adoption. To take Mongo as a straw man,
    the new user impression is that you untar a file and run a program. Then
    you run another one on another machine. Leaving aside the fact that Mongo
    has admin issues at scale, this style of installation definitely enhances
    the adoption for simple instances.

    I am not sure, however, whether this option is really available for HBase.
    HDFS is not a simple animal no matter how you package it.

    In addition to the DataNode, one could auto-specify which servers
    should be running Zookeeper and start ZK inside of the HBase
    process(es).
    Internal management of ZK is already an option (and I don't recommend that
    either, for different reasons).
  • Jason Rutherglen at Jul 18, 2011 at 4:32 pm
    My gut is that this would be a maintenance headache
    What specifically do you think would cause a problem?
    Internal management of ZK is already an option (and I don't recommend that
    either, for different reasons)
    What are the reasons?
    On Sat, Jul 16, 2011 at 7:34 PM, Ted Dunning wrote:
    On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen <
    [email protected]> wrote:
    Running the DataNode inside of an HBase process seems like this could
    be a good option to enable?
    My gut is that this would be a maintenance headache.

    Specifically because it would reduce the number of processes on an
    HBase instance.  Eg, I think one of the barriers to adoption for HBase
    in general is the multiple processes management part.  Are there any
    known issues with doing this?
    Well, I think you are right about adoption.  To take Mongo as a straw man,
    the new user impression is that you untar a file and run a program.  Then
    you run another one on another machine.  Leaving aside the fact that Mongo
    has admin issues at scale, this style of installation definitely enhances
    the adoption for simple instances.

    I am not sure, however, whether this option is really available for HBase.
    HDFS is not a simple animal no matter how you package it.

    In addition to the DataNode, one could auto-specify which servers
    should be running Zookeeper and start ZK inside of the HBase
    process(es).
    Internal management of ZK is already an option (and I don't recommend that
    either, for different reasons).
  • Ted Dunning at Jul 19, 2011 at 12:14 am

    On Mon, Jul 18, 2011 at 9:32 AM, Jason Rutherglen wrote:

    My gut is that this would be a maintenance headache
    What specifically do you think would cause a problem?
    Tracking versions for one. Everybody has a different favorite. That is the
    nice thing about standards. There are so many to choose from.

    Besides, how do you handle people who want the snapshots and higher
    performance that you get from maprfs?
    Internal management of ZK is already an option (and I don't recommend that
    either, for different reasons)
    What are the reasons?
    The basic issue is that it is nice to use ZK to determine which services are
    up and to avoid race conditions as services come up. If some of the
    services are actually running ZK, how do you distinguish that process
    getting hung from not being up?

    Also, ZK is very reliable and that is the primary virtue we are trying to
    capitalize on when we use it as a coordination service. Given that, how is
    it a good thing to incorporate it into software that is inevitably less
    stable? Isn't that tantamount to giving ZK's primary virtue?
  • Jean-Daniel Cryans at Jul 20, 2011 at 10:51 pm
    Just for reference, there's this jira:
    https://issues.apache.org/jira/browse/HBASE-2811

    J-D

    On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen
    wrote:
    Running the DataNode inside of an HBase process seems like this could
    be a good option to enable?

    Specifically because it would reduce the number of processes on an
    HBase instance.  Eg, I think one of the barriers to adoption for HBase
    in general is the multiple processes management part.  Are there any
    known issues with doing this?

    In addition to the DataNode, one could auto-specify which servers
    should be running Zookeeper and start ZK inside of the HBase
    process(es).

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 17, '11 at 1:29a
activeJul 20, '11 at 10:51p
posts5
users3
websitehbase.apache.org

People

Translate

site design / logo © 2023 Grokbase