Grokbase Groups Hive user March 2009
FAQ
Hi there,

I was reading some of the documentation and I came across this statement:
"Note that if the target table (or partition) already has a file whose name
collides with any of the filenames contained in *filepath* - then the
existing file will be replaced with the new file."

I have rotating data logs that start at log.1 and go to log.512 and wrap
around back to log.1, does this mean that when I try to LOAD DATA log.1
again it's going to overwrite the other one?

In normal MySQL, this data is just constantly appended regardless of the
file name, but given how it's likely the file is being loaded in hdfs this
probably is different. If what I am thinking is happening, what is the
solution for rotating log files?

Thanks,
Suhail

Search Discussions

  • Zheng Shao at Mar 22, 2009 at 11:16 pm
    For now, please append the unix timestamp to the end of the file name.

    Zheng
    On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi wrote:

    Hi there,

    I was reading some of the documentation and I came across this statement:
    "Note that if the target table (or partition) already has a file whose name
    collides with any of the filenames contained in *filepath* - then the
    existing file will be replaced with the new file."

    I have rotating data logs that start at log.1 and go to log.512 and wrap
    around back to log.1, does this mean that when I try to LOAD DATA log.1
    again it's going to overwrite the other one?

    In normal MySQL, this data is just constantly appended regardless of the
    file name, but given how it's likely the file is being loaded in hdfs this
    probably is different. If what I am thinking is happening, what is the
    solution for rotating log files?

    Thanks,
    Suhail


    --
    Yours,
    Zheng
  • Suhail Doshi at Mar 22, 2009 at 11:25 pm
    Zheng,

    Do you know if hive may have problems going through *lots* of log files
    (each 1 MB large). I remember reading about how hadoop sometimes has
    problems dealing with lots of small files due to the default block size it
    reads.

    Suhail
    On Sun, Mar 22, 2009 at 4:16 PM, Zheng Shao wrote:

    For now, please append the unix timestamp to the end of the file name.

    Zheng

    On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi wrote:

    Hi there,

    I was reading some of the documentation and I came across this statement:
    "Note that if the target table (or partition) already has a file whose name
    collides with any of the filenames contained in *filepath* - then the
    existing file will be replaced with the new file."

    I have rotating data logs that start at log.1 and go to log.512 and wrap
    around back to log.1, does this mean that when I try to LOAD DATA log.1
    again it's going to overwrite the other one?

    In normal MySQL, this data is just constantly appended regardless of the
    file name, but given how it's likely the file is being loaded in hdfs this
    probably is different. If what I am thinking is happening, what is the
    solution for rotating log files?

    Thanks,
    Suhail


    --
    Yours,
    Zheng


    --
    http://mixpanel.com
    Blog: http://blog.mixpanel.com
  • Josh Ferguson at Mar 22, 2009 at 11:29 pm
    It takes forever, you want to bulk load them or have an aggregator
    pull them all out, append them, and overwrite them all back in.

    Josh F.
    On Mar 22, 2009, at 4:24 PM, Suhail Doshi wrote:

    Zheng,

    Do you know if hive may have problems going through *lots* of log
    files (each 1 MB large). I remember reading about how hadoop
    sometimes has problems dealing with lots of small files due to the
    default block size it reads.

    Suhail

    On Sun, Mar 22, 2009 at 4:16 PM, Zheng Shao wrote:
    For now, please append the unix timestamp to the end of the file name.

    Zheng


    On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi wrote:
    Hi there,

    I was reading some of the documentation and I came across this
    statement: "Note that if the target table (or partition) already has
    a file whose name collides with any of the filenames contained in
    filepath - then the existing file will be replaced with the new file."

    I have rotating data logs that start at log.1 and go to log.512 and
    wrap around back to log.1, does this mean that when I try to LOAD
    DATA log.1 again it's going to overwrite the other one?

    In normal MySQL, this data is just constantly appended regardless of
    the file name, but given how it's likely the file is being loaded in
    hdfs this probably is different. If what I am thinking is happening,
    what is the solution for rotating log files?

    Thanks,
    Suhail



    --
    Yours,
    Zheng



    --
    http://mixpanel.com
    Blog: http://blog.mixpanel.com
  • Suhail Doshi at Mar 22, 2009 at 11:46 pm
    Ah I guess that's why in the tutorial they use a staging external table
    pointing to a locatoin in hdfs and then INSERT into another table to avoid
    the mess of thousands of small files.

    Suhail
    On Sun, Mar 22, 2009 at 4:28 PM, Josh Ferguson wrote:

    It takes forever, you want to bulk load them or have an aggregator pull
    them all out, append them, and overwrite them all back in.
    Josh F.

    On Mar 22, 2009, at 4:24 PM, Suhail Doshi wrote:

    Zheng,

    Do you know if hive may have problems going through *lots* of log files
    (each 1 MB large). I remember reading about how hadoop sometimes has
    problems dealing with lots of small files due to the default block size it
    reads.

    Suhail
    On Sun, Mar 22, 2009 at 4:16 PM, Zheng Shao wrote:

    For now, please append the unix timestamp to the end of the file name.

    Zheng

    On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi wrote:

    Hi there,

    I was reading some of the documentation and I came across this statement:
    "Note that if the target table (or partition) already has a file whose name
    collides with any of the filenames contained in *filepath* - then the
    existing file will be replaced with the new file."

    I have rotating data logs that start at log.1 and go to log.512 and wrap
    around back to log.1, does this mean that when I try to LOAD DATA log.1
    again it's going to overwrite the other one?

    In normal MySQL, this data is just constantly appended regardless of the
    file name, but given how it's likely the file is being loaded in hdfs this
    probably is different. If what I am thinking is happening, what is the
    solution for rotating log files?

    Thanks,
    Suhail


    --
    Yours,
    Zheng


    --
    http://mixpanel.com
    Blog: http://blog.mixpanel.com


    --
    http://mixpanel.com
    Blog: http://blog.mixpanel.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 22, '09 at 7:36p
activeMar 22, '09 at 11:46p
posts5
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase