Having some troubles decoding data with invalid bytes. From encoding.go:
type Encoding interface {
// NewDecoder returns a transformer that converts to UTF-8.
//
// Transforming source bytes that are not of that encoding will not
// result in an error per se. Each byte that cannot be transcoded
will
// be represented in the output by the UTF-8 encoding of '\uFFFD',
the
// replacement rune.
NewDecoder() transform.Transformer
...
So I expect that decoder will not stop on first error and just replaces
invalid bytes with U+FFFD.
Can I do something to force this behavior?
My code is very simple:
t := japanese.ShiftJIS
reader := transform.NewReader(file, t.NewDecoder())
data, err := ioutil.ReadAll(reader)
if err != nil {
fmt.Println(err.Error())
}
Resulting in:
*japanese: invalid Shift JIS encoding*
I tried some workaround, but I think it's ugly and potentially bugged as
far as I can see:
func ForceDecode(src []byte, t transform.Transformer) (result []byte,
skipped int) {
src0, src1 := 0, len(src)
for src0 != src1 {
buf, n, err := transform.Bytes(t, src[src0:src1])
src0 += n
result = append(result, buf...)
if err != nil {
src0++
skipped++
}
}
return
}
Any advice would be appreciated.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.