Grokbase Groups HBase user June 2012
FAQ
I'm struggling to understand why my deletes are taking longer than my inserts. My understanding is
that a delete is just an insertion of a tombstone. And I'm deleting the entire row.

I do a simple loop (pseudo code) and insert the 100 byte rows:

for (int i=0; i < 50000; i++)
{
puts.append(new Put(rowkey[i], oneHundredBytes[i]));

if (puts.size() % 1000 == 0)
{
Benchmark.start();
table.batch(puts);
Benchmark.stop();
}
}


The above takes about 8282ms total.

However the delete takes more than twice as long:

Iterator it = table.getScannerScan(rowkey[0], rowkey[50000-1]).iterator();
while(it.hasNext())
{
r = it.next();
deletes.append(new Delete(r.getRow()));
if (deletes.size() % 1000 == 0)
{
Benchmark.start();
table.batch(deletes);
Benchmark.stop();
}
}

The above takes 17369ms total.

I'm only benchmarking the deletion time and not the scan time. Additionally if I batch the deletes
into one big one at the end (rather than while I'm scanning) it takes about the same amount of time.
I am deleting the entire row so I wouldn't think it would be doing a read before the delete
(http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E).

Any thoughts on why it is slower and how I can speed it up?

Thanks,
~Jeff

--
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

Search Discussions

  • Ted Yu at Jun 27, 2012 at 9:16 pm
    bq. if I batch the deletes into one big one at the end (rather than while
    I'm scanning)
    That's what you should do.

    See also HBASE-6284 where an optimization, HRegion#doMiniBatchDelete(), is
    under development.
    On Wed, Jun 27, 2012 at 2:03 PM, Jeff Whiting wrote:

    I'm struggling to understand why my deletes are taking longer than my
    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.

    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0]**,
    rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.
    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read before
    the delete (http://mail-archives.apache.**org/mod_mbox/hbase-user/**
    201206.mbox/%**3CE83D30E8F408F94A96F992785FC2**9D82063395D6@s2k3mntaexc1.*
    *mentacapital.local%3E<http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E>
    ).

    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Amitanand Aiyer at Jun 27, 2012 at 10:07 pm
    There was some difference in the way locks are taken for batched deletes and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone
    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my inserts. My understanding is that a delete is just an insertion of a tombstone. And I'm deleting the entire row.

    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0], rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time. Additionally if I batch the deletes into one big one at the end (rather than while I'm scanning) it takes about the same amount of time. I am deleting the entire row so I wouldn't think it would be doing a read before the delete (http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E).

    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Ted Yu at Jun 27, 2012 at 10:12 pm
    Amit:
    Can you point us to the JIRA or changelist in 0.89-fb ?

    Thanks
    On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote:

    There was some difference in the way locks are taken for batched deletes
    and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone
    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my
    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.
    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0],
    rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.
    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read before
    the delete (
    http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E
    ).
    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Jeff Whiting at Jun 27, 2012 at 11:16 pm
    Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that is the
    reason for the performance degradation. Additionally HBASE-5941 with the locks is also contributing
    to the performance degradation.

    So until those changes get into an hbase release I just have to live with the slower performance.
    Is there anything I need to do on my end?

    Just as a sanity check, I tried setting a timestamp in the delete object but it made no difference.
    I'll batch my deletes at end as you suggested (as memory allows).

    Thanks,
    ~Jeff
    On 6/27/2012 4:11 PM, Ted Yu wrote:
    Amit:
    Can you point us to the JIRA or changelist in 0.89-fb ?

    Thanks
    On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote:

    There was some difference in the way locks are taken for batched deletes
    and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone
    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my
    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.
    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0],
    rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.
    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read before
    the delete (
    http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E
    ).
    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Ted Yu at Jun 27, 2012 at 11:51 pm
    I created HBASE-6287 <https://issues.apache.org/jira/browse/HBASE-6287> for
    porting HBASE-5941 to trunk.

    Jeff:
    What version of HBase are you using ?

    Since HBASE-5941 is an improvement, a vote may be raised for porting it to
    other branches.
    On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting wrote:

    Looking at HBASE-6284 it seems that deletes are not batched at the
    regionserver level so that is the reason for the performance degradation.
    Additionally HBASE-5941 with the locks is also contributing to the
    performance degradation.

    So until those changes get into an hbase release I just have to live with
    the slower performance. Is there anything I need to do on my end?

    Just as a sanity check, I tried setting a timestamp in the delete object
    but it made no difference. I'll batch my deletes at end as you suggested
    (as memory allows).

    Thanks,
    ~Jeff
    On 6/27/2012 4:11 PM, Ted Yu wrote:

    Amit:
    Can you point us to the JIRA or changelist in 0.89-fb ?

    Thanks


    On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <amitanand.s@fb.com>
    wrote:

    There was some difference in the way locks are taken for batched deletes
    and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone

    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my
    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.
    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0]**,
    rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.
    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read
    before
    the delete (
    http://mail-archives.apache.**org/mod_mbox/hbase-user/**201206.mbox/%**
    3CE83D30E8F408F94A96F992785FC2**9D82063395D6@s2k3mntaexc1.**
    mentacapital.local%3E<http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E>
    ).
    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com


  • Jeff Whiting at Jun 28, 2012 at 2:38 pm
    0.90.4-cdh3u3 is the version I'm running.

    ~Jeff
    On 6/27/2012 5:50 PM, Ted Yu wrote:
    I created HBASE-6287 <https://issues.apache.org/jira/browse/HBASE-6287> for porting HBASE-5941 to
    trunk.

    Jeff:
    What version of HBase are you using ?

    Since HBASE-5941 is an improvement, a vote may be raised for porting it to other branches.

    On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting wrote:

    Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that
    is the reason for the performance degradation. Additionally HBASE-5941 with the locks is also
    contributing to the performance degradation.

    So until those changes get into an hbase release I just have to live with the slower
    performance. Is there anything I need to do on my end?

    Just as a sanity check, I tried setting a timestamp in the delete object but it made no
    difference. I'll batch my deletes at end as you suggested (as memory allows).

    Thanks,
    ~Jeff

    On 6/27/2012 4:11 PM, Ted Yu wrote:

    Amit:
    Can you point us to the JIRA or changelist in 0.89-fb ?

    Thanks


    On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote:

    There was some difference in the way locks are taken for batched deletes
    and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone

    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my

    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.

    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0],

    rowkey[50000-1]).iterator();

    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.

    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read before
    the delete (
    http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E
    ).

    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Ted Yu at Jun 27, 2012 at 10:45 pm
    The JIRA was HBASE-5941
    On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote:

    There was some difference in the way locks are taken for batched deletes
    and puts. This was fixed for 89.

    I wonder if the same could be the issue here.

    Sent from my iPhone
    On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote:

    I'm struggling to understand why my deletes are taking longer than my
    inserts. My understanding is that a delete is just an insertion of a
    tombstone. And I'm deleting the entire row.
    I do a simple loop (pseudo code) and insert the 100 byte rows:

    for (int i=0; i < 50000; i++)
    {
    puts.append(new Put(rowkey[i], oneHundredBytes[i]));

    if (puts.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(puts);
    Benchmark.stop();
    }
    }


    The above takes about 8282ms total.

    However the delete takes more than twice as long:

    Iterator it = table.getScannerScan(rowkey[0],
    rowkey[50000-1]).iterator();
    while(it.hasNext())
    {
    r = it.next();
    deletes.append(new Delete(r.getRow()));
    if (deletes.size() % 1000 == 0)
    {
    Benchmark.start();
    table.batch(deletes);
    Benchmark.stop();
    }
    }

    The above takes 17369ms total.

    I'm only benchmarking the deletion time and not the scan time.
    Additionally if I batch the deletes into one big one at the end (rather
    than while I'm scanning) it takes about the same amount of time. I am
    deleting the entire row so I wouldn't think it would be doing a read before
    the delete (
    http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E
    ).
    Any thoughts on why it is slower and how I can speed it up?

    Thanks,
    ~Jeff

    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 27, '12 at 9:04p
activeJun 28, '12 at 2:38p
posts8
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase