Grokbase Groups HBase user July 2011
FAQ
Hi,

I have job where i need to read from 1 hbase table, perform aggregations
and writing back to other hbase table. For it, I am using
TableMapReduceUtil.initTableMapperJob and
TableMapReduceUtil.initTableReducerJob. In reducer, if I use
put.setWriteToWAL(false), then job completes within seconds but without it,
it takes 30 mins approximately. Why there is so huge difference in
performance? I wish that I can complete the same job within seconds while
using put.setWriteToWAL(true) to prevent the data loss. So kindly let me
know what other optimizations I can do?

Thanks

--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Search Discussions

  • Stack at Jul 1, 2011 at 8:45 pm

    On Fri, Jul 1, 2011 at 1:07 PM, Shuja Rehman wrote:
    I have  job where i need to read from 1 hbase table, perform aggregations
    and writing back to other hbase table. For it, I am using
    TableMapReduceUtil.initTableMapperJob and
    TableMapReduceUtil.initTableReducerJob. In reducer, if I use
    put.setWriteToWAL(false), then job completes within seconds but without it,
    it takes 30 mins approximately. Why there is so huge difference in
    performance? I wish that I can complete the same job within seconds while
    using put.setWriteToWAL(true) to prevent the data loss. So kindly let me
    know what other optimizations I can do?
    Don't disable WAL. You are just going to shoot yourself in the foot
    if you leave it off.

    The difference in perf is that you are writing every edit to the
    filesystem first before anything else is done.

    Try playing with deferred sync'ing of writes. You need to set your
    table do to deferred flushes by setting the DEFERRED_LOG_FLUSH table
    attribute on your table. Once set, rather than sync every write,
    we'll sync on a period. The default is to sync every second. Here is
    the setting in hbase-default.xml

    <property>
    <name>hbase.regionserver.optionallogflushinterval</name>
    <value>1000</value>
    <description>Sync the HLog to the HDFS after this interval if it has not
    accumulated enough entries to trigger a sync. Default 1 second. Units:
    milliseconds.
    </description>
    </property>

    Now if you crash, instead of losing massive chunks of your job,
    instead you will lose up to the last second worth of writes but in
    compensation you should see faster writing.

    Also, what is slow? The writes or the reads? How many reducers? If
    you up the number does that help?

    St.Ack

    Try playing with hbase.regionserver.optionallogflushinterval. If you
    set your table so it does deferred flushes

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJul 1, '11 at 8:07p
activeJul 1, '11 at 8:45p
posts2
users2
websitehbase.apache.org

2 users in discussion

Shuja Rehman: 1 post Stack: 1 post

People

Translate

site design / logo © 2022 Grokbase