FAQ
Parsing the Unicode data file I'm left with hex values for characters, for
example "0061" is lower case "A".

The problem is I need to transform "0061" to "A" via Go so that I can
perform some case and accent folding on characters.

fmt.Println("\u0061") behaves as expected, I assumed I might get away with:

h := "0061"
fmt.Println(`\u` + h)

But the result of this is "\u0061".

--

Search Discussions

  • Scott Lawrence at Jan 11, 2013 at 6:11 pm

    On Fri, 11 Jan 2013, Martin Gallagher wrote:

    Parsing the Unicode data file I'm left with hex values for characters, for
    example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    That's because "\u0061" is transformed to "a" (it's lowercase) at compile-time
    by the parser - it's not a feature of Println.

    hex.DecodeString() from encoding/hex should accomplish most of what you want:

    str, err := hex.DecodeString("6566")
    if err != nil {
    panic(err)
    }
    fmt.Printf("%s\n", string(str)) // ef
    --

    --
    Scott Lawrence

    go version go1.0.3
    Linux baidar 3.6.11-1-ARCH #1 SMP PREEMPT Tue Dec 18 08:57:15 CET 2012 x86_64 GNU/Linux

    --
  • Minux at Jan 11, 2013 at 6:12 pm

    On Sat, Jan 12, 2013 at 2:03 AM, Martin Gallagher wrote:

    Parsing the Unicode data file I'm left with hex values for characters, for
    example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    the conversion of "\uUUUU" happens at compile time, no at run-time.

    There are several ways to achieve this:
    http://play.golang.org/p/UFWSSfBrvA

    --
  • Martin Gallagher at Jan 11, 2013 at 6:28 pm
    hex.DecodeString("6566") worked for lower ranges, minux's example worked
    for all cases without any issue.

    Cheers!
    On Friday, January 11, 2013 6:12:35 PM UTC, minux wrote:


    On Sat, Jan 12, 2013 at 2:03 AM, Martin Gallagher <mar...@mutat.io<javascript:>
    wrote:
    Parsing the Unicode data file I'm left with hex values for characters,
    for example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away
    with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    the conversion of "\uUUUU" happens at compile time, no at run-time.

    There are several ways to achieve this:
    http://play.golang.org/p/UFWSSfBrvA
    --
  • Kevin Gillette at Jan 13, 2013 at 9:00 am
    If by "lower ranges" you mean ascii, then realize that if the input is
    utf-8 encoded, you can use the hex.Decode with a destination []byte, and
    then convert that into a string (or use bytes and unicode packages to deal
    with the data without needing to make an extra copy in memory). If the
    entire input can be accepted by hex.Decode without preprocessing on your
    part, using hex instead of strconv will also likely be faster.
    On Friday, January 11, 2013 11:28:00 AM UTC-7, Martin Gallagher wrote:

    hex.DecodeString("6566") worked for lower ranges, minux's example worked
    for all cases without any issue.

    Cheers!
    On Friday, January 11, 2013 6:12:35 PM UTC, minux wrote:

    On Sat, Jan 12, 2013 at 2:03 AM, Martin Gallagher wrote:

    Parsing the Unicode data file I'm left with hex values for characters,
    for example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away
    with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    the conversion of "\uUUUU" happens at compile time, no at run-time.

    There are several ways to achieve this:
    http://play.golang.org/p/UFWSSfBrvA
    --
  • Roger peppe at Jan 13, 2013 at 5:25 pm
    just for completeness (you wouldn't want to actually do it
    this way, but it's closest to the original poster's solution):

    func hexToString(h string) (string, error) {
    return strconv.Unquote(`"\u` + h + `"`)
    }
    On 11 January 2013 18:12, minux wrote:
    On Sat, Jan 12, 2013 at 2:03 AM, Martin Gallagher wrote:

    Parsing the Unicode data file I'm left with hex values for characters, for
    example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away
    with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    the conversion of "\uUUUU" happens at compile time, no at run-time.

    There are several ways to achieve this:
    http://play.golang.org/p/UFWSSfBrvA

    --
    --
  • Martin Gallagher at Jan 13, 2013 at 8:58 pm
    Performance isn't an issue, but I'm a bit of a premature optimiser
    anyway... so knowing the fastest solution would let me sleep easy!

    Regarding what I'm trying to achieve - it's automatic creation of full
    Unicode support for the Sphinx full text search engine, basically mapping
    and normalising characters and creating charset tables for supplied Unicode
    block ranges.

    I've just pushed the code if anyone's interested (sorry, it's in a rough
    state at the moment):
    https://github.com/Mutatio/sphinx-character-map/blob/master/characterMap.go

    CJK is still pending (but support is easy enough to add), it's the
    normalisation of accented Latin / Greek / Other scripts that I want to
    master first. Also all other command-line features are missing, it's purely
    at the prototyping stage.

    Cheers,
    - Martin
    On Sunday, January 13, 2013 5:25:15 PM UTC, rog wrote:

    just for completeness (you wouldn't want to actually do it
    this way, but it's closest to the original poster's solution):

    func hexToString(h string) (string, error) {
    return strconv.Unquote(`"\u` + h + `"`)
    }
    On 11 January 2013 18:12, minux <minu...@gmail.com <javascript:>> wrote:
    On Sat, Jan 12, 2013 at 2:03 AM, Martin Gallagher wrote:

    Parsing the Unicode data file I'm left with hex values for characters,
    for
    example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away
    with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    the conversion of "\uUUUU" happens at compile time, no at run-time.

    There are several ways to achieve this:
    http://play.golang.org/p/UFWSSfBrvA

    --
    --
  • Hraban Luyat at Jan 13, 2013 at 3:36 pm
    You might find use for:

    code.google.com/p/go/src/pkg/exp/norm

    Maybe (hopefully) it helps.

    Greetings,

    Hraban

    Τη Παρασκευή, 11 Ιανουαρίου 2013 8:03:57 μ.μ. UTC+2, ο χρήστης Martin
    Gallagher έγραψε:
    Parsing the Unicode data file I'm left with hex values for characters, for
    example "0061" is lower case "A".

    The problem is I need to transform "0061" to "A" via Go so that I can
    perform some case and accent folding on characters.

    fmt.Println("\u0061") behaves as expected, I assumed I might get away with:

    h := "0061"
    fmt.Println(`\u` + h)

    But the result of this is "\u0061".
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 11, '13 at 6:04p
activeJan 13, '13 at 8:58p
posts8
users6
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase