Grokbase Groups Hive user June 2011
FAQ
If I have table timestamps:

hive> desc timestamps;

OK
ts bigint


hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 7
0 9
0 6
0 3
0 10
0 1
0 4
0 5
0 8
0 2
30 34
30 44
30 40
30 38
30 36
30 32
30 46
30 42
30 48
30 50
30 30
70 74
70 70
70 78
100 100
100 105
100 110
100 115

What is the most efficient hive query that will do this ? Thanks,

pk

Search Discussions

  • Tim Spence at Jun 11, 2011 at 6:18 am
    Praveen,
    This would be best accomplished with a UDF because Hive does not support
    cursors.
    Best of luck,
    Tim



    On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar wrote:

    If I have table timestamps:

    hive> desc timestamps;

    OK
    ts bigint


    hive> select ts from timestamps order by ts
    OK

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    30
    32
    34
    36
    38
    40
    42
    44
    46
    48
    50
    70
    74
    78
    100
    105
    110
    115

    and I want to make groups of the values where splits between groups
    occur where two time-consecutive entries have a difference greater
    than 10.

    Eg, above, the splits would be such that the numbers would be grouped
    into these ranges:

    0-10
    30-50
    70-78
    100-115

    because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

    I'd like the query to result in the following:

    hive> select ...

    0 7
    0 9
    0 6
    0 3
    0 10
    0 1
    0 4
    0 5
    0 8
    0 2
    30 34
    30 44
    30 40
    30 38
    30 36
    30 32
    30 46
    30 42
    30 48
    30 50
    30 30
    70 74
    70 70
    70 78
    100 100
    100 105
    100 110
    100 115

    What is the most efficient hive query that will do this ? Thanks,

    pk
  • Praveen at Jun 11, 2011 at 7:33 pm
    Do you mean that my UDF would store the timestamp of the current row in a static field in the UDF's implementation, and when processing the next row, use that field to get the previous row's value ?

    Can anyone comment on whether that's safe, re: I'm not familiar with Hive internals ?

    Thanks,

    pk

    Sent from my iPhone
    On Jun 10, 2011, at 11:18 PM, Tim Spence wrote:

    Praveen,
    This would be best accomplished with a UDF because Hive does not support cursors.
    Best of luck,
    Tim




    On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar wrote:
    If I have table timestamps:

    hive> desc timestamps;

    OK
    ts bigint


    hive> select ts from timestamps order by ts
    OK

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    30
    32
    34
    36
    38
    40
    42
    44
    46
    48
    50
    70
    74
    78
    100
    105
    110
    115

    and I want to make groups of the values where splits between groups
    occur where two time-consecutive entries have a difference greater
    than 10.

    Eg, above, the splits would be such that the numbers would be grouped
    into these ranges:

    0-10
    30-50
    70-78
    100-115

    because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

    I'd like the query to result in the following:

    hive> select ...

    0 7
    0 9
    0 6
    0 3
    0 10
    0 1
    0 4
    0 5
    0 8
    0 2
    30 34
    30 44
    30 40
    30 38
    30 36
    30 32
    30 46
    30 42
    30 48
    30 50
    30 30
    70 74
    70 70
    70 78
    100 100
    100 105
    100 110
    100 115

    What is the most efficient hive query that will do this ? Thanks,

    pk
  • Tim Spence at Jun 13, 2011 at 7:09 pm
    Praveen,
    My apologies--I meant to suggest a streaming function because a UDF would
    not be able to hold state either. Look at the documentation for TRANSFORM (
    http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform ). Your
    transformation script can be used to compare timestamps from successive
    lines of your data.
    Tim


    On Sat, Jun 11, 2011 at 12:32 PM, Praveen wrote:

    Do you mean that my UDF would store the timestamp of the current row in a
    static field in the UDF's implementation, and when processing the next row,
    use that field to get the previous row's value ?

    Can anyone comment on whether that's safe, re: I'm not familiar with Hive
    internals ?

    Thanks,

    pk

    Sent from my iPhone

    On Jun 10, 2011, at 11:18 PM, Tim Spence wrote:

    Praveen,
    This would be best accomplished with a UDF because Hive does not support
    cursors.
    Best of luck,
    Tim




    On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar < <pk1u.uu@gmail.com>
    pk1u.uu@gmail.com> wrote:
    If I have table timestamps:

    hive> desc timestamps;

    OK
    ts bigint


    hive> select ts from timestamps order by ts
    OK

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    30
    32
    34
    36
    38
    40
    42
    44
    46
    48
    50
    70
    74
    78
    100
    105
    110
    115

    and I want to make groups of the values where splits between groups
    occur where two time-consecutive entries have a difference greater
    than 10.

    Eg, above, the splits would be such that the numbers would be grouped
    into these ranges:

    0-10
    30-50
    70-78
    100-115

    because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

    I'd like the query to result in the following:

    hive> select ...

    0 7
    0 9
    0 6
    0 3
    0 10
    0 1
    0 4
    0 5
    0 8
    0 2
    30 34
    30 44
    30 40
    30 38
    30 36
    30 32
    30 46
    30 42
    30 48
    30 50
    30 30
    70 74
    70 70
    70 78
    100 100
    100 105
    100 110
    100 115

    What is the most efficient hive query that will do this ? Thanks,

    pk
  • Praveen Kumar at Jun 20, 2011 at 9:06 pm
    Gotcha, thanks !

    pk

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 11, '11 at 5:30a
activeJun 20, '11 at 9:06p
posts5
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase