FAQ
Snapshot of table
-----------------

Key: HADOOP-2496
URL: https://issues.apache.org/jira/browse/HADOOP-2496
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: Billy Pearson
Fix For: 0.16.0


Havening an option to take a snapshot of a table would be vary useful in production.

What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.

The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.

I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Billy Pearson (JIRA) at Dec 28, 2007 at 9:01 am
    [ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554663 ]

    Billy Pearson commented on HADOOP-2496:
    ---------------------------------------

    Backup/snapshot should take each region as is and copy it to a folder and meta data for table should be backed up also with the snapshot.

    For a fast load of the restore we could
    stop serveing (disable) the table
    delete current regions and meta data for the table
    copy the backup regions in to the correct locations for hbase region serving
    reload the backup meta data.
    enable the table

    On the next rescan of the master the new meta would be picked up and the master could start assigning the regions to regionservers this way no time is spend reloading the data.

    Snapshot of table
    -----------------

    Key: HADOOP-2496
    URL: https://issues.apache.org/jira/browse/HADOOP-2496
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: Billy Pearson
    Fix For: 0.16.0


    Havening an option to take a snapshot of a table would be vary useful in production.
    What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.
    The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.
    I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Dec 29, 2007 at 10:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554960 ]

    stack commented on HADOOP-2496:
    -------------------------------

    Here's some ideas for how this might work Billy.

    HADOOP-1958 talks of making a table read-only. It also talks of being able to send a flush-and-compact command across a cluster/table so all in-memory entries are persisted followed by a compaction to tidy-up the on-disk representation. Jim is currently working on HADOOP-2478 which will move all to do with a particular table under a directory named for the table in hdfs. Hadoop has a copy files utility that can take a src in one fileystem and a target in the same or another filesystem and will run a mapreduce command to do a fast copy.

    Deploying the backup copy would run pretty much as you suggest only I'd imagine we'd have a tool that read the backed up table directory and per-region-found, did an insert into the catalog .META. table (Same tool run with a different option would purge a table from the catalog).
    Snapshot of table
    -----------------

    Key: HADOOP-2496
    URL: https://issues.apache.org/jira/browse/HADOOP-2496
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: Billy Pearson
    Fix For: 0.16.0


    Havening an option to take a snapshot of a table would be vary useful in production.
    What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.
    The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.
    I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Bryan Duxbury (JIRA) at Jan 10, 2008 at 10:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Bryan Duxbury updated HADOOP-2496:
    ----------------------------------

    Fix Version/s: (was: 0.16.0)
    0.17.0
    Priority: Minor (was: Major)

    Adjusting the priority down to Minor, as this is new functionality. Also setting Fix Version to 0.17, as we have a lot of stuff to get done before 0.16 as it is.
    Snapshot of table
    -----------------

    Key: HADOOP-2496
    URL: https://issues.apache.org/jira/browse/HADOOP-2496
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: Billy Pearson
    Priority: Minor
    Fix For: 0.17.0


    Havening an option to take a snapshot of a table would be vary useful in production.
    What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.
    The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.
    I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jim Kellerman (JIRA) at Jan 17, 2008 at 7:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jim Kellerman updated HADOOP-2496:
    ----------------------------------

    Issue Type: New Feature (was: Bug)
    Snapshot of table
    -----------------

    Key: HADOOP-2496
    URL: https://issues.apache.org/jira/browse/HADOOP-2496
    Project: Hadoop
    Issue Type: New Feature
    Components: contrib/hbase
    Reporter: Billy Pearson
    Priority: Minor
    Fix For: 0.17.0


    Havening an option to take a snapshot of a table would be vary useful in production.
    What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.
    The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.
    I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedDec 28, '07 at 8:49a
activeJan 17, '08 at 7:15p
posts5
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Jim Kellerman (JIRA): 5 posts

People

Translate

site design / logo © 2023 Grokbase