I have some files with mixed characters from all over the world. utf-8,
latin1, latin9, and like 10 others. These are international files of raw IM
logs. Is there a way to load these files as is into Hadoop? Its smart
enough to interpret the file as is correct? My file sizes are petabytes and
I want to write some Hive queries to find patterns. Please bare with me as
I am a newbie.
I know I can set the character level at the server level, but I want to make
sure there is no other setting that I am missing. For example in mysql, I
can set the language at the DB Level.....
Thanks so much!