Grokbase
x

Dan Kogai (dan...@dan.co.jp)

Profile | Posts (598)

User Information

Display Name:Dan Kogai
Partial Email Address:dan...@dan.co.jp
Posts:
598 total
598 in Perl 5 Porters

5 Most Recent

All Posts
1) Dan Kogai [Encode] 2.39 released!
| +1 vote
Folks, I just release Encode 2.39. The biggest differnce is that now decode('utf8', $malformed,...
Perl 5 Porters
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Folks,

I just release Encode 2.39.

The biggest differnce is that now decode('utf8', $malformed, sub{ ... }) works like the rest of UCM-Based encodings. Since encode('utf8', $utf8) always works, I overlooked the need for fallback. But decode('utf8', $bytes) may fail if $bytes are malformed so fallback callback is useful in such cases.

cf: http://rt.cpan.org/Ticket/Display.html?id=51204
http://kawa.at.webry.info/200911/article_12.html (Japanese)
    
kawanet++

=head1 Availability

svn co http://svn.coderepos.org/share/lang/perl/Encode/trunk
git clone git://github.com/dankogai/p5-encode.git
http://www.dan.co.jp/~dankogai/cpan/Encode-2.39.tar.gz
and CPAN near you.

=head1 CPAN Index

              User: DANKOGAI (Dan Kogai)
Distribution file: Encode-2.39.tar.gz
   Number of files: 203
        *.pm files: 26
            README: Encode-2.39/README
          META.yml: Encode-2.39/META.yml
       YAML-Parser: YAML::XS 0.32
META-driven index: no
Timestamp of file: Thu Nov 26 09:31:02 2009 UTC
  Time of this run: Thu Nov 26 09:32:39 2009 UTC

=head1 Changes

$Revision: 2.39 $ $Date: 2009/11/26 09:23:59 $
! Encode.xs t/fallback.t
  $utf8 = decode('utf8', $malformed, sub{ ... }) # now works!
http://rt.cpan.org/Ticket/Display.html?id=51204
! t/CJKT.t t/guess.t t/perlio.t
  $ENV{'PERL_CORE'} tricks removed since they are no longer necessary.
  Message-Id: <20091116161513.GA25556@bestpractical.com>

=head1 AUTHOR

Dan the Encode Maintainer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAksOUxMACgkQErJia/WXtBuxTACfber33XNQpFhws6TEQiie4rMW
NvYAnjDNhKoz2sAJRv6HOwaKKjnwa1j5
=OO5I
-----END PGP SIGNATURE-----SIGNED
2) Dan Kogai Re: [Encode] 2.38 released!
| +1 vote
Jesse, Thanks. applied in my repo. Since it does not affect CPAN, VERSION++ deferred till other...
Perl 5 Porters
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Jesse,

Thanks. applied in my repo. Since it does not affect CPAN, VERSION++ deferred till other fixes.

Dan the Encode Maintainer


On 17 Nov 2009, at 01:15, Jesse Vincent wrote:

>
> Dan,
>
> Thanks for the new Encode! I've just committed it to blead.
>
> The attached patch is the diff necessary to get Encode in blead to pass
> its tests.
>
> Best,
> Jesse
> <encode_2.38_core_diff.patch>
3) Dan Kogai [Encode] 2.38 released!
| +1 vote
Folks, I just release Encode 2.38. =head1 Availability svn co...
Perl 5 Porters
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Folks,

I just release Encode 2.38.

=head1 Availability

svn co http://svn.coderepos.org/share/lang/perl/Encode/trunk
git clone git://github.com/dankogai/p5-encode.git
http://www.dan.co.jp/~dankogai/cpan/Encode-2.38.tar.gz
and CPAN near you.

=head1 CPAN Index

              User: DANKOGAI (Dan Kogai)
Distribution file: Encode-2.38.tar.gz
   Number of files: 203
        *.pm files: 26
            README: Encode-2.38/README
          META.yml: Encode-2.38/META.yml
       YAML-Parser: YAML::XS 0.32
META-driven index: no
Timestamp of file: Mon Nov 16 14:34:43 2009 UTC
  Time of this run: Mon Nov 16 14:36:21 2009 UTC

=head1 Changes

$Revision: 2.38 $ $Date: 2009/11/16 14:08:13 $
! Encode.xs
  Addressed: Encode memory corruption [perl #70528]
  Message-Id: <alpine.LFD.2.00.0911152328070.9483@ein.m-l.org>
! t/Unicode.t Unicode/Unicode.xs
  Patched: #51263: set magic is not applied when modifying encode arguments
http://rt.cpan.org/Ticket/Display.html?id=51263
! Encode.xs
  Patched: #51204: Callback CHECK not supported for UTF-8 decoder/encoder
http://rt.cpan.org/Ticket/Display.html?id=51204
! Byte/Byte.pm CN/CN.pm Changes JP/JP.pm KR/KR.pm TW/TW.pm
  Unicode/Unicode.pm bin/enc2xs lib/Encode/Supported.pod
  Fix URLs
http://rt.cpan.org/Ticket/Display.html?id=49776
! t/CJKT.t t/guess.t t/perlio.t t/piconv.t
  $PERL_CORE trick is now off for perl 5.11 or better.
Message-Id: <b77c1dce0909070245s59b294bq8a8a8166e7342793@mail.gmail.com>
  Message-Id: <E7FADA6C-D5A7-4ECA-BE4C-85911A97677E@dan.co.jp>
  Message-Id: <20090907154908.GS60303@plum.flirble.org>
  Message-Id: <20090907161509.GN8057@iabyn.com>


=head1 AUTHOR

Dan the Encode Maintainer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAksBY8QACgkQErJia/WXtBuC3wCfViFlTaXapN6stekP3Qzb+MaV
9gIAn1A1BbY2o/U9E5JguB7r1Ntr3Fp9
=NwC6
-----END PGP SIGNATURE-----SIGNED
4) Dan Kogai Re: Encode memory corruption [perl #70528]
| +1 vote
George, I think I have fixed it now. Would you try the patch below? That fixed the problem on my OS...
Perl 5 Porters
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

George,

I think I have fixed it now.

On 16 Nov 2009, at 14:46, George Greer wrote:
> It might not crash under perl 5.8.9, but making it crash is finicky anyway since the script doesn't exercise memory much afterward. Valgrind says 5.8.9 still causes the errant write:
>
> ==30569== Command: /home/perl/work/cpan_maint-5.8/perl/bin/perl5.8.9 -MEncode -le print\ encode\ "ascii",\ "\ a\\x{b6}\\x{2022}a"x8,\ sub{"\\x{2022}"}
> ==30569==
> ==30569== Invalid write of size 1
> ==30569== at 0x629D113: do_encode (encengine.c:119)
> ==30569== by 0x62970B3: encode_method (Encode.xs:128)
> ==30569== by 0x629920D: XS_Encode__XS_encode (Encode.xs:621)
> ==30569== by 0x479D0F: Perl_pp_entersub (pp_hot.c:2862)
> ==30569== by 0x444896: Perl_runops_debug (dump.c:1639)
> ==30569== by 0x465582: S_run_body (perl.c:2453)
> ==30569==    by 0x464E77: perl_run (perl.c:2368)
> ==30569==    by 0x421CA8: main (perlmain.c:109)
> ==30569== Address 0x61bb4c0 is 0 bytes after a block of size 48 alloc'd
> ==30569== at 0x4C2524D: realloc (vg_replace_malloc.c:476)
> ==30569== by 0x4451F1: Perl_safesysrealloc (util.c:177)
> ==30569== by 0x47D440: Perl_sv_grow (sv.c:1440)
> ==30569== by 0x48445A: Perl_sv_catpvn_flags (sv.c:3915)
> ==30569== by 0x484752: Perl_sv_catsv_flags (sv.c:3975)
> ==30569== by 0x6296D7A: encode_method (Encode.xs:204)
> ==30569== by 0x629920D: XS_Encode__XS_encode (Encode.xs:621)
> ==30569== by 0x479D0F: Perl_pp_entersub (pp_hot.c:2862)
> ==30569== by 0x444896: Perl_runops_debug (dump.c:1639)
> ==30569== by 0x465582: S_run_body (perl.c:2453)
> ==30569==    by 0x464E77: perl_run (perl.c:2368)
> ==30569==    by 0x421CA8: main (perlmain.c:109)
>
> perl 5.10.0 crashed a lot more than blead 5.11.1 during my test case reduction, but the valgrind still showed the write being there even when blead didn't crash.

Would you try the patch below?  That fixed the problem on my OS X.

====
% perl -MEncode -le 'print encode "ascii", " a\x{b6}\x{2022}a"x8, sub{ "\x{2022}" }'
Segmentation fault
% perl -Mblib -MEncode -le 'print encode "ascii", " a\x{b6}\x{2022}a"x8, sub{ "\x{2022}" }'
a••a a••a a••a a••a a••a a••a a••a a••a
====

The patch applies SvUTF8_off when encoding. I also did a little optimization but that does not matter on fixing the problem.

I will VERSION++ after your report.  Thank you in advance for testing.

Dan the Maintainer THereof.

===================================================================
RCS file: Encode.xs,v
retrieving revision 2.16
diff -u -r2.16 Encode.xs
- --- Encode.xs 2009/09/06 14:32:21 2.16
+++ Encode.xs 2009/11/16 08:17:11
@@ -68,7 +68,7 @@
{
     dSP;
     int argc;
- -    SV *temp, *retval;
+    SV *retval = newSVpv("",0);
     ENTER;
     SAVETMPS;
     PUSHMARK(sp);
@@ -79,13 +79,10 @@
     if (argc != 1){
  croak("fallback sub must return scalar!");
     }
- -    temp = newSVsv(POPs);
+    sv_catsv(retval, POPs);
     PUTBACK;
     FREETMPS;
     LEAVE;
- -    retval = newSVpv("",0);
- -    sv_catsv(retval, temp);
- -    SvREFCNT_dec(temp);
     return retval;
}

@@ -199,6 +196,7 @@
   : newSVpvf(check & ENCODE_PERLQQ ? "\\x{%04"UVxf"}" :
                  check & ENCODE_HTMLCREF ? "&#%" UVuf ";" :
                  "&#x%" UVxf ";", (UV)ch);
+     SvUTF8_off(subchar); /* make sure no decoded string gets in */
             sdone += slen + clen;
             ddone += dlen + SvCUR(subchar);
             sv_catsv(dst, subchar);

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAksBDBsACgkQErJia/WXtBuXBgCdEvbSBofXhu+DlP6qm6mo6ZJW
HUwAnjIAj+daYPByCbCd0ST28PDoSpkA
=84SB
-----END PGP SIGNATURE-----SIGNED
5) Dan Kogai Re: Encode memory corruption [perl #70528]
| +1 vote
George, Thank you for your report. \x{2022}wwwww \x{2022}rrrrr uuu qqqqqqqqq \x{2022}yyyyyyy =...
Perl 5 Porters
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

George,

Thank you for your report.

On 16 Nov 2009, at 13:38, George Greer wrote:
> - - - 8< - - - 8< - - -
> use Encode qw[encode];
> encode("ISO-8859-1", "\x{b6} \x{b6} \x{b6} \x{b6} \x{b6} \x{b6} =
\x{2022}wwwww \x{2022}rrrrr uuu   qqqqqqqqq \x{2022}yyyyyyy =
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \x{b6} \x{b6} \x{b6} \x{b6} \x{b6} =
\x{b6}", sub { "\x{2022}" });

It tries to return DURING string during encoding so the usage is wrong =
to begin with.
That being said, I successfully reproduced your case with the one-liner =
below.

perl -MEncode -le 'print encode "ascii", " a\x{b6}\x{2022}a"x8, sub{ =
"\x{2022}" }'

I also found this does not happen in Perl 5.8.9.  So this has something =
to do with how Perl 5.10 allocates memory.

At any rate, ext/Encode/Encode.xs must be the file to look at.

Dan the Maintainer Thereof
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAksA5BYACgkQErJia/WXtBvMiwCdEJ6PbaD8XgC0vXCtL903wu3q
qMUAn1DuSBbgwol6qE5hHyYOxYd6jEGo
=3D4xqQ
-----END PGP SIGNATURE-----SIGNED

spacer
Profile | Posts (598)
Home > People > Dan Kogai