Grokbase Groups Hive user March 2009
It takes forever, you want to bulk load them or have an aggregator
pull them all out, append them, and overwrite them all back in.

Josh F.
On Mar 22, 2009, at 4:24 PM, Suhail Doshi wrote:


Do you know if hive may have problems going through *lots* of log
files (each 1 MB large). I remember reading about how hadoop
sometimes has problems dealing with lots of small files due to the
default block size it reads.


On Sun, Mar 22, 2009 at 4:16 PM, Zheng Shao wrote:
For now, please append the unix timestamp to the end of the file name.


On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi wrote:
Hi there,

I was reading some of the documentation and I came across this
statement: "Note that if the target table (or partition) already has
a file whose name collides with any of the filenames contained in
filepath - then the existing file will be replaced with the new file."

I have rotating data logs that start at log.1 and go to log.512 and
wrap around back to log.1, does this mean that when I try to LOAD
DATA log.1 again it's going to overwrite the other one?

In normal MySQL, this data is just constantly appended regardless of
the file name, but given how it's likely the file is being loaded in
hdfs this probably is different. If what I am thinking is happening,
what is the solution for rotating log files?




Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 22, '09 at 7:36p
activeMar 22, '09 at 11:46p



site design / logo © 2022 Grokbase