Grokbase Groups Hive dev January 2013
FAQ

[Hive-dev] unicode character as delimiter

Ho Kenneth - kennho
Jan 10, 2013 at 2:32 am
Hi all,

I have an input file that has a unicode character as a delimiter, which is þ (thorn)

For example:

col1þcol2þcol3

Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)

And I have tried the following but no luck:
create table test(col1 string, col2 string, col3 string) row format delimited fields terminated by '\c3be';

I'd appreciate your help! Thanks in advance.

--ken



***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************
reply

Search Discussions

3 responses

  • Dean Wampler at Jan 10, 2013 at 2:23 pm
    You have to use the octal representation, e.g., ^A is \001.

    On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
    wrote:
    Hi all,

    I have an input file that has a unicode character as a delimiter, which is
    þ (thorn)

    For example:

    col1þcol2þcol3

    Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)

    And I have tried the following but no luck:
    create table test(col1 string, col2 string, col3 string) row format
    delimited fields terminated by '\c3be';

    I'd appreciate your help! Thanks in advance.

    --ken



    ***************************************************************************
    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.

    ****************************************************************************


    --
    *Dean Wampler, Ph.D.*
    thinkbiganalytics.com
    +1-312-339-1330
  • Ho Kenneth - kennho at Jan 10, 2013 at 4:08 pm
    Thanks for the quick response.

    I try '\376', but still not working :(


    On 1/10/13 6:23 AM, "Dean Wampler" wrote:

    You have to use the octal representation, e.g., ^A is \001.

    On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
    wrote:
    Hi all,

    I have an input file that has a unicode character as a delimiter, which
    is
    þ (thorn)

    For example:

    col1þcol2þcol3

    Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)

    And I have tried the following but no luck:
    create table test(col1 string, col2 string, col3 string) row format
    delimited fields terminated by '\c3be';

    I'd appreciate your help! Thanks in advance.

    --ken




    *************************************************************************
    **
    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be
    legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.


    *************************************************************************
    ***


    --
    *Dean Wampler, Ph.D.*
    thinkbiganalytics.com
    +1-312-339-1330
  • Ho Kenneth - kennho at Jan 11, 2013 at 12:53 am
    I'd appreicate if someone can help out with this issue. Tons of thanks! :)

    I have tried many different combinations but still not able to get it to
    work.

    Q: how do we parse delimiter - "þ"


    On 1/10/13 8:08 AM, "Ho Kenneth - kennho" wrote:

    Thanks for the quick response.

    I try '\376', but still not working :(


    On 1/10/13 6:23 AM, "Dean Wampler" wrote:

    You have to use the octal representation, e.g., ^A is \001.

    On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
    wrote:
    Hi all,

    I have an input file that has a unicode character as a delimiter, which
    is
    þ (thorn)

    For example:

    col1þcol2þcol3

    Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)

    And I have tried the following but no luck:
    create table test(col1 string, col2 string, col3 string) row format
    delimited fields terminated by '\c3be';

    I'd appreciate your help! Thanks in advance.

    --ken




    ************************************************************************
    *
    **
    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be
    legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.


    ************************************************************************
    *
    ***


    --
    *Dean Wampler, Ph.D.*
    thinkbiganalytics.com
    +1-312-339-1330

Related Discussions

Discussion Navigation
viewthread | post