FAQ
If I have table timestamps:

hive> desc timestamps;

OK
ts bigint

hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 7
0 9
0 6
0 3
0 10
0 1
0 4
0 5
0 8
0 2
30 34
30 44
30 40
30 38
30 36
30 32
30 46
30 42
30 48
30 50
30 30
70 74
70 70
70 78
100 100
100 105
100 110
100 115

What is the most efficient hive query that will do this ? Thanks,

pk

## Search Discussions

•  at Jun 11, 2011 at 6:18 am ⇧
Praveen,
This would be best accomplished with a UDF because Hive does not support
cursors.
Best of luck,
Tim

On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar wrote:

If I have table timestamps:

hive> desc timestamps;

OK
ts bigint

hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 7
0 9
0 6
0 3
0 10
0 1
0 4
0 5
0 8
0 2
30 34
30 44
30 40
30 38
30 36
30 32
30 46
30 42
30 48
30 50
30 30
70 74
70 70
70 78
100 100
100 105
100 110
100 115

What is the most efficient hive query that will do this ? Thanks,

pk
•  at Jun 11, 2011 at 7:33 pm ⇧
Do you mean that my UDF would store the timestamp of the current row in a static field in the UDF's implementation, and when processing the next row, use that field to get the previous row's value ?

Can anyone comment on whether that's safe, re: I'm not familiar with Hive internals ?

Thanks,

pk

Sent from my iPhone
On Jun 10, 2011, at 11:18 PM, Tim Spence wrote:

Praveen,
This would be best accomplished with a UDF because Hive does not support cursors.
Best of luck,
Tim

On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar wrote:
If I have table timestamps:

hive> desc timestamps;

OK
ts bigint

hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 7
0 9
0 6
0 3
0 10
0 1
0 4
0 5
0 8
0 2
30 34
30 44
30 40
30 38
30 36
30 32
30 46
30 42
30 48
30 50
30 30
70 74
70 70
70 78
100 100
100 105
100 110
100 115

What is the most efficient hive query that will do this ? Thanks,

pk
•  at Jun 13, 2011 at 7:09 pm ⇧
Praveen,
My apologies--I meant to suggest a streaming function because a UDF would
not be able to hold state either. Look at the documentation for TRANSFORM (
transformation script can be used to compare timestamps from successive
Tim

On Sat, Jun 11, 2011 at 12:32 PM, Praveen wrote:

Do you mean that my UDF would store the timestamp of the current row in a
static field in the UDF's implementation, and when processing the next row,
use that field to get the previous row's value ?

Can anyone comment on whether that's safe, re: I'm not familiar with Hive
internals ?

Thanks,

pk

Sent from my iPhone

On Jun 10, 2011, at 11:18 PM, Tim Spence wrote:

Praveen,
This would be best accomplished with a UDF because Hive does not support
cursors.
Best of luck,
Tim

On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar < <pk1u.uu@gmail.com>
pk1u.uu@gmail.com> wrote:
If I have table timestamps:

hive> desc timestamps;

OK
ts bigint

hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 7
0 9
0 6
0 3
0 10
0 1
0 4
0 5
0 8
0 2
30 34
30 44
30 40
30 38
30 36
30 32
30 46
30 42
30 48
30 50
30 30
70 74
70 70
70 78
100 100
100 105
100 110
100 115

What is the most efficient hive query that will do this ? Thanks,

pk
•  at Jun 20, 2011 at 9:06 pm ⇧
Gotcha, thanks !

pk

## Related Discussions

Discussion Overview
 group user categories hive, hadoop posted Jun 11, '11 at 5:30a active Jun 20, '11 at 9:06p posts 5 users 3 website hive.apache.org

### 3 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase