4 clusters? You mean 4 machines?
I don't know much about how your job works (is it multithreaded, is it
a mapreduce job, etc) and it would be nice if you tell us more about
it, so I'm going to assume you are inserting it in a single thread.
If you have a single thread inserting into a 1 machine HBase cluster,
then the data is stored once. If you have 4 machines, and you set the
replication to 3 which is the default, then 2GB becomes 6GB and it's
all inserted sequentially. I would expect a slow down.
Now, 40 minutes VS 4 minutes is an order of magnitude slower and it
doesn't seem right. Have you looked into where it's slow? Can you
investigate more and give us some other data points?
On Mon, Feb 28, 2011 at 6:00 AM, Cavus,M.,Fa. Post Direkt
I've a simple job. It imports 2 GB of data in 4 minutes to hbase with
hadoop and not cluster.
If I configure full distributed mode, it imports 2 GB of data in 40
minutes to my 4 clusters.
Did anyone have same problems?