FAQ
Hi,



I've a simple job. It imports 2 GB of data in 4 minutes to hbase with
hadoop and not cluster.

If I configure full distributed mode, it imports 2 GB of data in 40
minutes to my 4 clusters.



Did anyone have same problems?



Regards

Musa

Search Discussions

  • Stack at Feb 28, 2011 at 5:20 pm

    On Mon, Feb 28, 2011 at 6:00 AM, Cavus,M.,Fa. Post Direkt wrote:
    I've a simple job. It imports 2 GB of data in 4 minutes to hbase with
    hadoop and not cluster.

    If I configure full distributed mode, it imports 2 GB of data in 40
    minutes to my 4 clusters.
    So, running a mapreduce job when all is in standalone mode runs in 4
    minutes but distributed its 40 minutes? That sounds a bit odd. Can
    you tell what is going on for 40 minutes? How many maptasks? How
    many hbase regions? Is it actually doing anything during this time?

    St.Ack
  • Jean-Daniel Cryans at Feb 28, 2011 at 5:22 pm
    4 clusters? You mean 4 machines?

    I don't know much about how your job works (is it multithreaded, is it
    a mapreduce job, etc) and it would be nice if you tell us more about
    it, so I'm going to assume you are inserting it in a single thread.

    If you have a single thread inserting into a 1 machine HBase cluster,
    then the data is stored once. If you have 4 machines, and you set the
    replication to 3 which is the default, then 2GB becomes 6GB and it's
    all inserted sequentially. I would expect a slow down.

    Now, 40 minutes VS 4 minutes is an order of magnitude slower and it
    doesn't seem right. Have you looked into where it's slow? Can you
    investigate more and give us some other data points?

    J-D

    On Mon, Feb 28, 2011 at 6:00 AM, Cavus,M.,Fa. Post Direkt
    wrote:
    Hi,



    I've a simple job. It imports 2 GB of data in 4 minutes to hbase with
    hadoop and not cluster.

    If I configure full distributed mode, it imports 2 GB of data in 40
    minutes to my 4 clusters.



    Did anyone have same problems?



    Regards

    Musa


  • M. C. Srivas at Feb 28, 2011 at 5:27 pm
    How many reducers are you running? By default, the system gives you 1. And
    when everything is local (on 1 machine), that works very well.
    On Mon, Feb 28, 2011 at 6:00 AM, Cavus,M.,Fa. Post Direkt wrote:

    Hi,



    I've a simple job. It imports 2 GB of data in 4 minutes to hbase with
    hadoop and not cluster.

    If I configure full distributed mode, it imports 2 GB of data in 40
    minutes to my 4 clusters.



    Did anyone have same problems?



    Regards

    Musa


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedFeb 28, '11 at 2:00p
activeFeb 28, '11 at 5:27p
posts4
users4
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase