On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen wrote:
Running the DataNode inside of an HBase process seems like this could
be a good option to enable?
My gut is that this would be a maintenance headache.
Specifically because it would reduce the number of processes on an
HBase instance. Eg, I think one of the barriers to adoption for HBase
in general is the multiple processes management part. Are there any
known issues with doing this?
Well, I think you are right about adoption. To take Mongo as a straw man,
the new user impression is that you untar a file and run a program. Then
you run another one on another machine. Leaving aside the fact that Mongo
has admin issues at scale, this style of installation definitely enhances
the adoption for simple instances.
I am not sure, however, whether this option is really available for HBase.
HDFS is not a simple animal no matter how you package it.
In addition to the DataNode, one could auto-specify which servers
should be running Zookeeper and start ZK inside of the HBase
process(es).
Internal management of ZK is already an option (and I don't recommend that
either, for different reasons).