Grokbase Groups HBase user May 2016
FAQ
Hi,

I am thinking to implement regular snapshots on HBase to protect against
user mistakes, e.g. if something bad happens go back to the previous
snapshot.
I am thinking to keep something as one snapshot per week for four weeks, and
one snapshot a day for 7 days, so always have about 11 snapshots. Then each
time a new snapshot is created, an old one would be deleted.

From reading the doc I get the impression that snapshots are quite light to
take, and have zero on-going performance impact, i.e. HBase will be just as
fast with 11 snapshots than with none.
Is that right?

Am I also right to believe that the extra disk usage be very low in our
setup where we never deleted any data, just add more?

Finally, is anyone aware of a tool / helper script to implement such a
snapshot strategy, before I spend time writing my own?

Thank you,
Thibault.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
Sent from the HBase User mailing list archive at Nabble.com.

Search Discussions

  • Vladimir Rodionov at May 18, 2016 at 6:29 pm
    Snapshots are light when you take them, but not that light when you export
    them. If you do not do export and only need to protect against
    user errors - fine, otherwise, bear in mind that export snapshot is M/R job
    and it materializes (copies) all your data to another location

    Another possible problem with snapshot is eventual data duplication.
    Snapshots store references to store files, store files, usually, get
    compacted quite often, new files are created, old files get archived and
    later deleted. If you have snapshot, all store files it refers to will be
    kept in archive until you delete this snapshot.

    -Vlad
    On Wed, May 18, 2016 at 3:45 AM, thib wrote:

    Hi,

    I am thinking to implement regular snapshots on HBase to protect against
    user mistakes, e.g. if something bad happens go back to the previous
    snapshot.
    I am thinking to keep something as one snapshot per week for four weeks,
    and
    one snapshot a day for 7 days, so always have about 11 snapshots. Then
    each
    time a new snapshot is created, an old one would be deleted.

    From reading the doc I get the impression that snapshots are quite light to
    take, and have zero on-going performance impact, i.e. HBase will be just as
    fast with 11 snapshots than with none.
    Is that right?

    Am I also right to believe that the extra disk usage be very low in our
    setup where we never deleted any data, just add more?

    Finally, is anyone aware of a tool / helper script to implement such a
    snapshot strategy, before I spend time writing my own?

    Thank you,
    Thibault.



    --
    View this message in context:
    http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
    Sent from the HBase User mailing list archive at Nabble.com.
  • Huaxiang Sun at May 18, 2016 at 6:38 pm
    Hi Thibault,

         Yes, snapshots are very light to take as it does not copy the hfiles. As for disk space, per my understanding, it may not be low. Once compaction happens, snapshot will hold these hfiles which are supposed to be cleaned up.
         Hbase master webpage can provide you more information about how much extra diskspace snapshots takes. Please see the following links for more info. Just my 2 cents, Experts here may correct/provide more information.

    https://issues.apache.org/jira/secure/attachment/12802250/master-snapshot.png <https://issues.apache.org/jira/secure/attachment/12802250/master-snapshot.png>
    https://issues.apache.org/jira/browse/HBASE-15415 <https://issues.apache.org/jira/browse/HBASE-15415>

        Thanks,
        Huaxiang

    On May 18, 2016, at 3:45 AM, thib wrote:

    Hi,

    I am thinking to implement regular snapshots on HBase to protect against
    user mistakes, e.g. if something bad happens go back to the previous
    snapshot.
    I am thinking to keep something as one snapshot per week for four weeks, and
    one snapshot a day for 7 days, so always have about 11 snapshots. Then each
    time a new snapshot is created, an old one would be deleted.

    From reading the doc I get the impression that snapshots are quite light to
    take, and have zero on-going performance impact, i.e. HBase will be just as
    fast with 11 snapshots than with none.
    Is that right?

    Am I also right to believe that the extra disk usage be very low in our
    setup where we never deleted any data, just add more?

    Finally, is anyone aware of a tool / helper script to implement such a
    snapshot strategy, before I spend time writing my own?

    Thank you,
    Thibault.



    --
    View this message in context: http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
    Sent from the HBase User mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMay 18, '16 at 4:59p
activeMay 18, '16 at 6:38p
posts3
users3
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase