FAQ
I have some files with mixed characters from all over the world. utf-8,
latin1, latin9, and like 10 others. These are international files of raw IM
logs. Is there a way to load these files as is into Hadoop? Its smart
enough to interpret the file as is correct? My file sizes are petabytes and
I want to write some Hive queries to find patterns. Please bare with me as
I am a newbie.

I know I can set the character level at the server level, but I want to make
sure there is no other setting that I am missing. For example in mysql, I
can set the language at the DB Level.....

Thanks so much!

Search Discussions

  • Zheng Shao at Sep 27, 2009 at 12:25 am
    Hi Tom,

    Currently Hive/Hadoop recognizes data as UTF-8.

    If your encoding is different, most likely you can still process the data
    using Hive without any problems, as long as Hive/Hadoop does not have to do
    UTF-8 decoding.

    What is the row format of your data? Fields separated by TAB or something?
    As long as the encoding does not use the separator for something else (when
    as the second or third byte of a character), it should be fine.

    Zheng
    On Fri, Sep 25, 2009 at 2:58 PM, tom kersnick wrote:

    I have some files with mixed characters from all over the world. utf-8,
    latin1, latin9, and like 10 others. These are international files of raw IM
    logs. Is there a way to load these files as is into Hadoop? Its smart
    enough to interpret the file as is correct? My file sizes are petabytes and
    I want to write some Hive queries to find patterns. Please bare with me as
    I am a newbie.

    I know I can set the character level at the server level, but I want to
    make sure there is no other setting that I am missing. For example in
    mysql, I can set the language at the DB Level.....

    Thanks so much!

    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 25, '09 at 9:58p
activeSep 27, '09 at 12:25a
posts2
users2
websitehive.apache.org

2 users in discussion

Zheng Shao: 1 post Tom kersnick: 1 post

People

Translate

site design / logo © 2022 Grokbase