Grokbase Groups HBase user March 2011
FAQ
We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html). The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import. The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ). We have a patch we have been using that if a system property is set ( importtsv.timestamp ) to set the timestamp from the property. If the property is not set to use the current time. This has been very helpful for us and allows for more control in setting the timestamps for imported records.

My question is is this useful functionality in general? If so I'd be happy to submit a JIRA and patch with the appropriate changes.

Thanks

Andy

Search Discussions

  • Stack at Mar 28, 2011 at 4:50 pm
    This would be generally useful I'd say.
    Thank you Andy,
    St.Ack

    On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
    wrote:
    We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.

    My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.

    Thanks

    Andy
  • Jean-Daniel Cryans at Mar 28, 2011 at 4:51 pm
    I have two thoughts about it:

    1- We generally discourage users setting their own timestamps since it
    messes with the internals in some edge cases. Adding this
    functionality goes against that.
    2- Almost every interface we offer lets users set their own
    timestamps, so to be more consistent we should indeed offer it for
    importtsv.

    So I think you should open a jira and post your patch.

    J-D

    On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
    wrote:
    We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.

    My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.

    Thanks

    Andy
  • Andy Sautins at Mar 28, 2011 at 5:14 pm
    Discouraging setting timestamps seems to make sense. In our situation we bulk import ever 'x' minutes and if for some reason one of the older imports fails and has to be restarted after a later import happens we would like to import the older records at the appropriate timestamp before the timestamp of the later import. It sounds like that may be one of the situations that could trigger some internals edges cases, correct?

    Also, just as a separate note since the timestamp is set in the Mapper if the import has more than one mapper I wouldn't get a consistent timestamp for all the records for a given load. For our use case it is helpful to be able to identify all records associated with a given import.

    I went ahead and added a JIRA ( HBASE-3705 ) and uploaded the basic patch. I'll update the documentation as well.

    Thanks

    Andy

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of Jean-Daniel Cryans
    Sent: Monday, March 28, 2011 10:51 AM
    To: user@hbase.apache.org
    Subject: Re: passing timestamp into importtsv...

    I have two thoughts about it:

    1- We generally discourage users setting their own timestamps since it
    messes with the internals in some edge cases. Adding this
    functionality goes against that.
    2- Almost every interface we offer lets users set their own
    timestamps, so to be more consistent we should indeed offer it for
    importtsv.

    So I think you should open a jira and post your patch.

    J-D

    On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
    wrote:
    We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.

    My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.

    Thanks

    Andy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 28, '11 at 4:38p
activeMar 28, '11 at 5:14p
posts4
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase