Grokbase Groups Pig user July 2011
FAQ
I have data in an HBase table in stored in the following format:

rowkey group_id:1 group_id:2 ... group_id:n
2fcab50712467eab4004583eb8fb7f89 1 0 1
085125e8f7cdc99fd91dbd7280373c5b 0 1 0
dd53e23487da03fd02396306d248cda0 2 1 0

where the column family group_id contains one column for each set of data
and the number is the number of times that the hash is present in the set of
data.

I need to reformat the data and obtain the output in the following format:

hash group_id
2fcab50712467eab4004583eb8fb7f89 1
dd53e23487da03fd02396306d248cda0 1
dd53e23487da03fd02396306d248cda0 1
085125e8f7cdc99fd91dbd7280373c5b 2
dd53e23487da03fd02396306d248cda0 2
...
2fcab50712467eab4004583eb8fb7f89 n

Any ideas on how to achieve this? I'm really at a loss here.

Search Discussions

  • Dmitriy Ryaboy at Jul 25, 2011 at 7:51 pm
    Sounds like you need a udf that takes the hash returned by Pig when you load
    a column family, and returns a bag of column names, each column name
    repeated as many times as indicated by the value of the column. You would
    then flatten the result of this udf.

    D
    On Mon, Jul 25, 2011 at 11:01 AM, Juan Martin Pampliega wrote:

    I have data in an HBase table in stored in the following format:

    rowkey group_id:1 group_id:2 ... group_id:n
    2fcab50712467eab4004583eb8fb7f89 1 0 1
    085125e8f7cdc99fd91dbd7280373c5b 0 1 0
    dd53e23487da03fd02396306d248cda0 2 1 0

    where the column family group_id contains one column for each set of data
    and the number is the number of times that the hash is present in the set
    of
    data.

    I need to reformat the data and obtain the output in the following format:

    hash group_id
    2fcab50712467eab4004583eb8fb7f89 1
    dd53e23487da03fd02396306d248cda0 1
    dd53e23487da03fd02396306d248cda0 1
    085125e8f7cdc99fd91dbd7280373c5b 2
    dd53e23487da03fd02396306d248cda0 2
    ...
    2fcab50712467eab4004583eb8fb7f89 n

    Any ideas on how to achieve this? I'm really at a loss here.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 25, '11 at 6:01p
activeJul 25, '11 at 7:51p
posts2
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase