I have meet a code problem about the charset. I use Hadoop to store the log data, and my log data is not coded in UTF-8, for example GBK in china. If I use the PigStorage() to process my data, the data will be treated as UTF-8, then, I use my program to process the UTF-8 data, it can also run, but the result will be
And can we use the pig LOAD and STORE like Hadoop, not change the orignal data charset, store it as it was! Any one can help me? Or tell me why use the default UTF8?