FAQ
Hello everyone!
Having some troubles decoding data with invalid bytes. From encoding.go:
type Encoding interface {
         // NewDecoder returns a transformer that converts to UTF-8.
         //
         // Transforming source bytes that are not of that encoding will not
         // result in an error per se. Each byte that cannot be transcoded
will
         // be represented in the output by the UTF-8 encoding of '\uFFFD',
the
         // replacement rune.
         NewDecoder() transform.Transformer
...

So I expect that decoder will not stop on first error and just replaces
invalid bytes with U+FFFD.
Can I do something to force this behavior?
My code is very simple:
t := japanese.ShiftJIS
reader := transform.NewReader(file, t.NewDecoder())
data, err := ioutil.ReadAll(reader)
if err != nil {
         fmt.Println(err.Error())
}

Resulting in:
*japanese: invalid Shift JIS encoding*

I tried some workaround, but I think it's ugly and potentially bugged as
far as I can see:
func ForceDecode(src []byte, t transform.Transformer) (result []byte,
skipped int) {
         src0, src1 := 0, len(src)
         for src0 != src1 {
                 buf, n, err := transform.Bytes(t, src[src0:src1])
                 src0 += n
                 result = append(result, buf...)
                 if err != nil {
                         src0++
                         skipped++
                 }
         }
         return
}

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Nigel Tao at Dec 7, 2015 at 2:48 am
    I remember, a year or three ago, discussing whether or not Encoding
    transformers should return an error early, use a substitute character,
    or be configurable between the two. I can't remember the details,
    though. Marcel, do you?

    Maybe we thought that people could write their own ForceDecode
    function if they wanted to, although I'd make it a function that
    returned a Transformer. Perhaps such a beast should live in
    golang.org/x/text/transform.

    In any case, it seems like a bug that the NewDecoder docs don't match
    the implementation. One or the other should change.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Mpvl at Dec 19, 2015 at 2:40 pm
    Sorry for the late reply. Just found this email among the noise.

    I indeed recently found the same discrepancy between the decoder's doc and
    implementation. Moreover, decoders do not always return an error on invalid
    input. There seems to be some pattern/system, but I'm not sure what it is.

    I recently changed the Encoders to return errors. There is no single method
    of replacement that is generally applicable so there is no way around this.
    There are now two different decorators for handling errors. See CL
    <https://go-review.googlesource.com/#/c/17701/1/encoding/encoding.go>

    We could do a similar thing for Decoding. However, it is a bit more tricky
    to do for Decoders (how many bytes should be gobbled per error). Also,
    unlike with encoders, decoders should not return an error by default. This
    makes the "decorator" technique used for encodings less suitable.

    I think it would be fine for Decoders to simply never returning an error,
    as the documentation suggests (one can always scan for U+FFFD), but it
    would be good to know if people could use the errors or why different
    errors were handled differently in the first place.

    On Mon, Dec 7, 2015 at 3:48 AM, Nigel Tao wrote:

    I remember, a year or three ago, discussing whether or not Encoding
    transformers should return an error early, use a substitute character,
    or be configurable between the two. I can't remember the details,
    though. Marcel, do you?

    Maybe we thought that people could write their own ForceDecode
    function if they wanted to, although I'd make it a function that
    returned a Transformer. Perhaps such a beast should live in
    golang.org/x/text/transform.

    In any case, it seems like a bug that the NewDecoder docs don't match
    the implementation. One or the other should change.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedDec 2, '15 at 3:21p
activeDec 19, '15 at 2:40p
posts3
users3
websitegolang.org

3 users in discussion

Rin Tohsaka: 1 post Nigel Tao: 1 post Mpvl: 1 post

People

Translate

site design / logo © 2021 Grokbase