Grokbase Groups HBase user March 2013
FAQ
We have a requirement to support data matching while loading deltas to
HBase.
I see there is a utility to support bulk loading.
http://hbase.apache.org/book/arch.bulk.load.html

But is there any way to support daily delta loading?
Is there any open sourced MDM software which can be integrated with HBase?

Does Hbase has any data matching functionality?

-Jignesh

Search Discussions

  • Ted Yu at Mar 21, 2013 at 4:08 pm
    Does MDM mean Mobile Device Management ?
    Can you elaborate what data matching functionality you need ?

    Thanks
    On Thu, Mar 21, 2013 at 9:04 AM, Jignesh Patel wrote:

    We have a requirement to support data matching while loading deltas to
    HBase.
    I see there is a utility to support bulk loading.
    http://hbase.apache.org/book/arch.bulk.load.html

    But is there any way to support daily delta loading?
    Is there any open sourced MDM software which can be integrated with HBase?

    Does Hbase has any data matching functionality?

    -Jignesh
  • Andrew Purtell at Mar 21, 2013 at 4:20 pm
    I think you may need to provide just a bit more information about your
    use case. Could you define a bit more 'delta' and 'data matching'?

    In a sense, every bulk load is a delta: updates for insert into a
    larger table, representing a set of changes as a batch.

    We could consider the existing HBase mechanisms for handling
    multiversioning to be a simple "data matching functionality" via
    simple existence testing by coordinate, although I know that is not
    what you mean (but I don't know what you mean precisely).

    * - coordinate: { row, column, qualifier, timestamp }
    On 3/21/13, Jignesh Patel wrote:
    We have a requirement to support data matching while loading deltas to
    HBase.
    I see there is a utility to support bulk loading.
    http://hbase.apache.org/book/arch.bulk.load.html

    But is there any way to support daily delta loading?
    Is there any open sourced MDM software which can be integrated with HBase?

    Does Hbase has any data matching functionality?

    -Jignesh
  • Jignesh Patel at Mar 21, 2013 at 8:24 pm
    Delta:
    We are trying to bring two different databases in synch. So in real time we
    insert data in 2 dbs(totally different format).
    But in the night we run a batch job and do cross checking if db2(which is
    actually Hbase) is missing a row or two we will insert it.


    Data Matching:
    We need to do user verification - i.e. when a new user inserted we will
    check his demographics and based on that we conclude user already exist or
    not.

    -Jignesh

    On Thu, Mar 21, 2013 at 12:20 PM, Andrew Purtell wrote:

    I think you may need to provide just a bit more information about your
    use case. Could you define a bit more 'delta' and 'data matching'?

    In a sense, every bulk load is a delta: updates for insert into a
    larger table, representing a set of changes as a batch.

    We could consider the existing HBase mechanisms for handling
    multiversioning to be a simple "data matching functionality" via
    simple existence testing by coordinate, although I know that is not
    what you mean (but I don't know what you mean precisely).

    * - coordinate: { row, column, qualifier, timestamp }
    On 3/21/13, Jignesh Patel wrote:
    We have a requirement to support data matching while loading deltas to
    HBase.
    I see there is a utility to support bulk loading.
    http://hbase.apache.org/book/arch.bulk.load.html

    But is there any way to support daily delta loading?
    Is there any open sourced MDM software which can be integrated with HBase?
    Does Hbase has any data matching functionality?

    -Jignesh
  • Andrew Purtell at Mar 24, 2013 at 12:44 pm
    So at a minimum you'd need to extend HBase to understand the semantics of
    the user records, what equality means for this case. This could be done by
    writing a coprocessor - code deployed server side injected into query or
    store processing, in effect a combination of stored procedures and
    triggers. The coprocessor framework also provides plumbing for custom RPC
    endpoints, so if existing HBase operations are not expressive enough you
    can add your own.
    On Thursday, March 21, 2013, Jignesh Patel wrote:

    Delta:
    We are trying to bring two different databases in synch. So in real time we
    insert data in 2 dbs(totally different format).
    But in the night we run a batch job and do cross checking if db2(which is
    actually Hbase) is missing a row or two we will insert it.


    Data Matching:
    We need to do user verification - i.e. when a new user inserted we will
    check his demographics and based on that we conclude user already exist or
    not.

    -Jignesh


    On Thu, Mar 21, 2013 at 12:20 PM, Andrew Purtell <apurtell@apache.org<javascript:;>
    wrote:
    I think you may need to provide just a bit more information about your
    use case. Could you define a bit more 'delta' and 'data matching'?

    In a sense, every bulk load is a delta: updates for insert into a
    larger table, representing a set of changes as a batch.

    We could consider the existing HBase mechanisms for handling
    multiversioning to be a simple "data matching functionality" via
    simple existence testing by coordinate, although I know that is not
    what you mean (but I don't know what you mean precisely).

    * - coordinate: { row, column, qualifier, timestamp }

    On 3/21/13, Jignesh Patel <jigneshmpatel@gmail.com <javascript:;>>
    wrote:
    We have a requirement to support data matching while loading deltas to
    HBase.
    I see there is a utility to support bulk loading.
    http://hbase.apache.org/book/arch.bulk.load.html

    But is there any way to support daily delta loading?
    Is there any open sourced MDM software which can be integrated with HBase?
    Does Hbase has any data matching functionality?

    -Jignesh

    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 21, '13 at 4:04p
activeMar 24, '13 at 12:44p
posts5
users3
websitehbase.apache.org

People

Translate

site design / logo © 2021 Grokbase