FAQ
Hi Tom,

Currently Hive/Hadoop recognizes data as UTF-8.

If your encoding is different, most likely you can still process the data
using Hive without any problems, as long as Hive/Hadoop does not have to do
UTF-8 decoding.

What is the row format of your data? Fields separated by TAB or something?
As long as the encoding does not use the separator for something else (when
as the second or third byte of a character), it should be fine.

Zheng
On Fri, Sep 25, 2009 at 2:58 PM, tom kersnick wrote:

I have some files with mixed characters from all over the world. utf-8,
latin1, latin9, and like 10 others. These are international files of raw IM
logs. Is there a way to load these files as is into Hadoop? Its smart
enough to interpret the file as is correct? My file sizes are petabytes and
I want to write some Hive queries to find patterns. Please bare with me as
I am a newbie.

I know I can set the character level at the server level, but I want to
make sure there is no other setting that I am missing. For example in
mysql, I can set the language at the DB Level.....

Thanks so much!

--
Yours,
Zheng

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 25, '09 at 9:58p
activeSep 27, '09 at 12:25a
posts2
users2
websitehive.apache.org

2 users in discussion

Zheng Shao: 1 post Tom kersnick: 1 post

People

Translate

site design / logo © 2022 Grokbase