Grokbase Groups Hive user May 2011
We've noticed that our Hive jobs appear to be getting slower and slower
every day even though the data set isn't really growing by much.
Here are some run times taken from last month which shows the date and
the duration of the job in minutes:

2010/12/31 -> 19.2166666666667
2011/01/31 -> 24.55
2011/02/28 -> 44.6166666666667
2011/03/31 -> 49.9833333333333
2011/04/30 -> 55.3833333333333

The only thing that stands out is that we're not deleting older
partitions, so there are probably about two years worth of partitions in
the system.
The jobs only use the partition for the current month, but I'm not sure
if having the other partitions can somehow slow things down regardless of
them not being used.

Any advise and suggestions are welcome.


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 2, '11 at 8:09p
activeMay 2, '11 at 8:09p

1 user in discussion

Mayuran Yogarajah: 1 post



site design / logo © 2022 Grokbase