Grokbase Groups Pig dev August 2008
FAQ
Hello!
I have meet a code problem about the charset. I use Hadoop to store the log data, and my log data is not coded in UTF-8, for example GBK in china. If I use the PigStorage() to process my data, the data will be treated as UTF-8, then, I use my program to process the UTF-8 data, it can also run, but the result will be
not right.
And can we use the pig LOAD and STORE like Hadoop, not change the orignal data charset, store it as it was! Any one can help me? Or tell me why use the default UTF8?

Search Discussions

  • Olga Natkovich at Aug 26, 2008 at 3:20 pm
    PigStorage is written to work with UTF8 data. You will need to write
    your on load/store function to get different semantics.

    Olga
    -----Original Message-----
    From: paradisehit
    Sent: Tuesday, August 26, 2008 1:52 AM
    To: pig-user@incubator.apache.org; pig-dev@incubator.apache.org
    Subject: Why the default LOAD and STORE use UTF-8? Why not use byte?

    Hello!
    I have meet a code problem about the charset. I use
    Hadoop to store the log data, and my log data is not coded in
    UTF-8, for example GBK in china. If I use the PigStorage() to
    process my data, the data will be treated as UTF-8, then, I
    use my program to process the UTF-8 data, it can also run,
    but the result will be not right.
    And can we use the pig LOAD and STORE like Hadoop, not
    change the orignal data charset, store it as it was! Any one
    can help me? Or tell me why use the default UTF8?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 26, '08 at 8:52a
activeAug 26, '08 at 3:20p
posts2
users2
websitepig.apache.org

2 users in discussion

Paradisehit: 1 post Olga Natkovich: 1 post

People

Translate

site design / logo © 2022 Grokbase