FAQ
Hi there gophers,

I'm trying to remove a certain character pattern from a string.

The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at all. I
always get: *error parsing regexp: invalid UTF-8: `�[0-9a-fk-or]`*
*
*
That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
character <http://shapecatcher.com/unicode/info/167>, called *Section sign*.
*It seems regexp doesn't translate my character*

I tried using:

- *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
*
- *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
- *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
sequence: `\u`)
- And a bunch more hoping something magically will work but none of them
did actually work.


Here's the code:
http://play.golang.org/p/prSh39bIng

Thanks,
Carlos

--

Search Discussions

  • Paul Hankin at Dec 9, 2012 at 1:36 pm

    On Sunday, 9 December 2012 13:50:52 UTC+1, Carlos Cobo wrote:

    Hi there gophers,

    I'm trying to remove a certain character pattern from a string.

    The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at
    all. I always get: *error parsing regexp: invalid UTF-8: `�[0-9a-fk-or]`*
    *
    *
    That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
    character <http://shapecatcher.com/unicode/info/167>, called *Section sign
    *.
    *It seems regexp doesn't translate my character*

    I tried using:

    - *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
    *
    - *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
    - *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
    sequence: `\u`)
    - And a bunch more hoping something magically will work but none of
    them did actually work.


    Here's the code:
    http://play.golang.org/p/prSh39bIng
    \x inserts bytes into your string, whereas you want to insert the UTF-8
    encoding. http://golang.org/ref/spec#String_literals

    *Using \u00a7 instead of \xa7 works. See: *
    http://play.golang.org/p/Sq_C6qAUyq

    --
    Paul

    --
  • Carlos Cobo at Dec 9, 2012 at 1:38 pm
    Yeah I figured that out when I tried translating the Section sign to []byte
    and to []rune.

    The solution you suggest doesn't remove the characters but at least doesn't
    complaing about invalid UTF-8.

    Copypasted from your code snippet, last 2 lines:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "\xa7e--------- \xa7fHelp:"...


    It should be:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "--------- Help:"...


    El domingo, 9 de diciembre de 2012 14:30:59 UTC+1, Paul Hankin escribió:
    On Sunday, 9 December 2012 13:50:52 UTC+1, Carlos Cobo wrote:

    Hi there gophers,

    I'm trying to remove a certain character pattern from a string.

    The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at
    all. I always get: *error parsing regexp: invalid UTF-8: `�[0-9a-fk-or]`*
    *
    *
    That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
    character <http://shapecatcher.com/unicode/info/167>, called *Section
    sign*.
    *It seems regexp doesn't translate my character*

    I tried using:

    - *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
    *
    - *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
    - *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
    sequence: `\u`)
    - And a bunch more hoping something magically will work but none of
    them did actually work.


    Here's the code:
    http://play.golang.org/p/prSh39bIng
    \x inserts bytes into your string, whereas you want to insert the UTF-8
    encoding. http://golang.org/ref/spec#String_literals

    *Using \u00a7 instead of \xa7 works. See: *
    http://play.golang.org/p/Sq_C6qAUyq

    --
    Paul
    --
  • Matt Harden at Dec 9, 2012 at 4:26 pm
    Your test string is also invalid UTF-8. You should use \u00a7 instead of
    \xa7 in all strings.
    On Sunday, December 9, 2012 7:38:23 AM UTC-6, Carlos Cobo wrote:

    Yeah I figured that out when I tried translating the Section sign to
    []byte and to []rune.

    The solution you suggest doesn't remove the characters but at least
    doesn't complaing about invalid UTF-8.

    Copypasted from your code snippet, last 2 lines:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "\xa7e--------- \xa7fHelp:"...


    It should be:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "--------- Help:"...


    El domingo, 9 de diciembre de 2012 14:30:59 UTC+1, Paul Hankin escribió:
    On Sunday, 9 December 2012 13:50:52 UTC+1, Carlos Cobo wrote:

    Hi there gophers,

    I'm trying to remove a certain character pattern from a string.

    The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at
    all. I always get: *error parsing regexp: invalid UTF-8: `�[0-9a-fk-or]`
    *
    *
    *
    That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
    character <http://shapecatcher.com/unicode/info/167>, called *Section
    sign*.
    *It seems regexp doesn't translate my character*

    I tried using:

    - *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
    *
    - *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
    - *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
    sequence: `\u`)
    - And a bunch more hoping something magically will work but none of
    them did actually work.


    Here's the code:
    http://play.golang.org/p/prSh39bIng
    \x inserts bytes into your string, whereas you want to insert the UTF-8
    encoding. http://golang.org/ref/spec#String_literals

    *Using \u00a7 instead of \xa7 works. See: *
    http://play.golang.org/p/Sq_C6qAUyq

    --
    Paul
    --
  • Carlos Cobo at Dec 9, 2012 at 4:56 pm
    Second time I repeat.
    I tried with both "\xa7" and "\u00a7". None of them work.

    It seems the service producing this messages doesn't give a **** about
    UTF-8 so I'll have to do 2 passes. First to correct bytes, then to remove
    them.

    El domingo, 9 de diciembre de 2012 17:26:51 UTC+1, Matt Harden escribió:
    Your test string is also invalid UTF-8. You should use \u00a7 instead of
    \xa7 in all strings.
    On Sunday, December 9, 2012 7:38:23 AM UTC-6, Carlos Cobo wrote:

    Yeah I figured that out when I tried translating the Section sign to
    []byte and to []rune.

    The solution you suggest doesn't remove the characters but at least
    doesn't complaing about invalid UTF-8.

    Copypasted from your code snippet, last 2 lines:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "\xa7e--------- \xa7fHelp:"...


    It should be:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "--------- Help:"...


    El domingo, 9 de diciembre de 2012 14:30:59 UTC+1, Paul Hankin escribió:
    On Sunday, 9 December 2012 13:50:52 UTC+1, Carlos Cobo wrote:

    Hi there gophers,

    I'm trying to remove a certain character pattern from a string.

    The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at
    all. I always get: *error parsing regexp: invalid UTF-8:
    `�[0-9a-fk-or]`*
    *
    *
    That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
    character <http://shapecatcher.com/unicode/info/167>, called *Section
    sign*.
    *It seems regexp doesn't translate my character*

    I tried using:

    - *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
    *
    - *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
    - *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
    sequence: `\u`)
    - And a bunch more hoping something magically will work but none of
    them did actually work.


    Here's the code:
    http://play.golang.org/p/prSh39bIng
    \x inserts bytes into your string, whereas you want to insert the UTF-8
    encoding. http://golang.org/ref/spec#String_literals

    *Using \u00a7 instead of \xa7 works. See: *
    http://play.golang.org/p/Sq_C6qAUyq

    --
    Paul
    --
  • Peter at Dec 9, 2012 at 5:23 pm
    Read up on string literals: http://golang.org/ref/spec#String_literals

    There's some subtlety involved, but once you understand it you'll see it's
    quite consistent.

    Have a look at http://play.golang.org/p/_rrMfmDKZh to make sure you can
    tell what's going on.

    Hope this helps.
    On Sunday, 9 December 2012 16:51:20 UTC, Carlos Cobo wrote:

    Second time I repeat.
    I tried with both "\xa7" and "\u00a7". None of them work.

    It seems the service producing this messages doesn't give a **** about
    UTF-8 so I'll have to do 2 passes. First to correct bytes, then to remove
    them.

    El domingo, 9 de diciembre de 2012 17:26:51 UTC+1, Matt Harden escribió:
    Your test string is also invalid UTF-8. You should use \u00a7 instead of
    \xa7 in all strings.
    On Sunday, December 9, 2012 7:38:23 AM UTC-6, Carlos Cobo wrote:

    Yeah I figured that out when I tried translating the Section sign to
    []byte and to []rune.

    The solution you suggest doesn't remove the characters but at least
    doesn't complaing about invalid UTF-8.

    Copypasted from your code snippet, last 2 lines:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "\xa7e--------- \xa7fHelp:"...


    It should be:

    Before: "\xa7e--------- \xa7fHelp:"...

    After: "--------- Help:"...


    El domingo, 9 de diciembre de 2012 14:30:59 UTC+1, Paul Hankin escribió:
    On Sunday, 9 December 2012 13:50:52 UTC+1, Carlos Cobo wrote:

    Hi there gophers,

    I'm trying to remove a certain character pattern from a string.

    The actual pattern is *"(?i)\xa7[0-9a-fk-or]"* but it doesn't work at
    all. I always get: *error parsing regexp: invalid UTF-8:
    `�[0-9a-fk-or]`*
    *
    *
    That "\xa7" thingy is a " <http://shapecatcher.com/unicode/info/167>§"
    character <http://shapecatcher.com/unicode/info/167>, called *Section
    sign*.
    *It seems regexp doesn't translate my character*

    I tried using:

    - *"(?i)\u00a7[0-9a-fk-or]", \u00a7 instead of \xa7
    *
    - *"(?i)\uc2a7[0-9a-fk-or]"*, '§' == '\u00a7'
    - *`(?i)\\u00a7[0-9a-fk-or]`*, using \u00a7 fails (invalid escape
    sequence: `\u`)
    - And a bunch more hoping something magically will work but none
    of them did actually work.


    Here's the code:
    http://play.golang.org/p/prSh39bIng
    \x inserts bytes into your string, whereas you want to insert the UTF-8
    encoding. http://golang.org/ref/spec#String_literals

    *Using \u00a7 instead of \xa7 works. See: *
    http://play.golang.org/p/Sq_C6qAUyq

    --
    Paul
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedDec 9, '12 at 12:51p
activeDec 9, '12 at 5:23p
posts6
users5
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase