| 1) Tux In XS, I have the following snippet: SV *tmp = newSVpv (csv->buffer, csv->used); dSP;... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
In XS, I have the following snippet: --8<--- SV *tmp = newSVpv (csv->buffer, csv->used); dSP; require_IO_Handle; PUSHMARK (sp); EXTEND (sp, 2); PUSHs ((dst)); PUSHs (tmp); PUTBACK; result = call_sv (m_print, G_SCALAR | G_METHOD); SPAGAIN; if (result) { result = POPi; unless (result) (void)SetDiag (csv, 2200); } PUTBACK; SvREFCNT_dec (tmp); -->8--- To prevent double encoding, is it possible to `see' (from XS) that the stream is opened with something like open my $dst, ">:utf8", "file"; or open my $dst, ">:encoding(utf8)", "file"; or open my $dst, ">", "file"; binmode $dst, ":encoding(utf8)"; ? --8<--- use strict; use warnings; use Data::Peek; use Encode qw( encode decode ); my $euro = "\x{20ac}"; my $deuro = encode ("utf8", $euro); DPeek $euro; DPeek $deuro; { open my $fh, ">", "test-plain.out"; print $fh "$euro\n"; } { open my $fh, ">:encoding(utf8)", "test-utf8.out"; print $fh "$euro\n"; } { open my $fh, ">", "test-encode.out"; print $fh "$deuro\n"; } { open my $fh, ">:encoding(utf8)", "test-utf8enc.out"; print $fh "$deuro\n"; } -->8--- PV("\342\202\254"\0) [UTF8 "\x{20ac}"] PV("\342\202\254"\0) Source : "test-encode.out". 00000000 E2 82 AC 0A .... Source : "test-plain.out". 00000000 E2 82 AC 0A .... Source : "test-utf8enc.out". 00000000 C3 A2 C2 82 C2 AC 0A ....... Source : "test-utf8.out". 00000000 E2 82 AC 0A .... So, 3 out of 4 are fine (as expected), but if a module want to do the right thing, it probably needs to know if the right thing is already being done to prevent the bad double-encoding. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
|
|
|
| 2) Slaven Rezic Check the output of PerlIO::get_layers($dst). |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
"H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
> > To prevent double encoding, is it possible to `see' (from XS) that the > stream is opened with something like > > open my $dst, ">:utf8", "file"; > or > open my $dst, ">:encoding(utf8)", "file"; > or > open my $dst, ">", "file"; > binmode $dst, ":encoding(utf8)"; > > ?
Check the output of PerlIO::get_layers($dst).
|
|
|
| 3) Ben Morrow Quoth h.m.brand@xs4all.nl ("H.Merijn Brand"): universal.c as a wrapper around PerlIO_get_layers,... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Quoth [email protected: h.m...@xs4all.nl] ("H.Merijn Brand"): > In XS, I have the following snippet: > --8<--- > SV *tmp = newSVpv (csv->buffer, csv->used); > dSP; > require_IO_Handle; > PUSHMARK (sp); > EXTEND (sp, 2); > PUSHs ((dst)); > PUSHs (tmp); > PUTBACK; > result = call_sv (m_print, G_SCALAR | G_METHOD); > SPAGAIN; > if (result) { > result = POPi; > unless (result) > (void)SetDiag (csv, 2200); > } > PUTBACK; > SvREFCNT_dec (tmp); > -->8--- > > To prevent double encoding, is it possible to `see' (from XS) that the > stream is opened with something like > > open my $dst, ">:utf8", "file"; > or > open my $dst, ">:encoding(utf8)", "file"; > or > open my $dst, ">", "file"; > binmode $dst, ":encoding(utf8)";
>From Perl you call PerlIO::get_layers. This is implemented in universal.c as a wrapper around PerlIO_get_layers, but that doesn't appear to be in the API. Perhaps it should be? You could also look at its implementation in perlio.c, which is very simple and AFAICT only uses documented bits of PerlIO. Ben
|
|
|
| 4) Tux Right, so is this a bug: use strict; use warnings; use Data::Peek; my $out = ""; open my $fh,... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de> wrote: > "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:> > >> > To prevent double encoding, is it possible to `see' (from XS) that the> > stream is opened with something like> >> > open my $dst, ">:utf8", "file";> > or> > open my $dst, ">:encoding(utf8)", "file";> > or> > open my $dst, ">", "file";> > binmode $dst, ":encoding(utf8)";> >> > ?> > Check the output of PerlIO::get_layers($dst).Right, so is this a bug: --8<--- use strict; use warnings; use Data::Peek; my $out = ""; open my $fh, ">:utf8", \$out; DDumper [ PerlIO::get_layers ($fh, details => 1) ]; -->8--- => $VAR1 = [ 'scalar', undef, 6328832 ]; Where is my "utf8" layer? -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
|
|
|
| 5) Tux FWIW: This looks more sane: open my $fh, ">:encoding(utf8)", \$out; $VAR1 = [ 'scalar', undef,... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Tue, 9 Feb 2010 10:21:42 +0100, "H.Merijn Brand" <h.m.brand@xs4all.nl> wrote: > On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>> wrote:> > > "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:> > > > >> > > To prevent double encoding, is it possible to `see' (from XS) that the> > > stream is opened with something like> > >> > > open my $dst, ">:utf8", "file";> > > or> > > open my $dst, ">:encoding(utf8)", "file";> > > or> > > open my $dst, ">", "file";> > > binmode $dst, ":encoding(utf8)";> > >> > > ?> > > > Check the output of PerlIO::get_layers($dst).> > Right, so is this a bug:> --8<---> use strict;> use warnings;> > use Data::Peek;> > my $out = "";> open my $fh, ">:utf8", \$out;> > DDumper [ PerlIO::get_layers ($fh, details => 1) ];> -->8---FWIW: This looks more sane: open my $fh, ">:encoding(utf8)", \$out; => $VAR1 = [ 'scalar', undef, 6296064, 'encoding', 'utf8', 4231680 ];
> => > > $VAR1 = [ > 'scalar', > undef, > 6328832 > ]; > > Where is my "utf8" layer? >
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
|
|
|
| 6) Eirik Berg Hanssen Try PerlIO::get_layers ($fh) =96 or, for clarity, PerlIO::get_layers Alternatively, you could note... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Tue, Feb 9, 2010 at 10:21 AM, H.Merijn Brand <h.m.brand@xs4all.nl> wrote= : > On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de> > wrote: > >> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes: >> >> Check the output of PerlIO::get_layers($dst). > > Right, so is this a bug: > --8<--- > use strict; > use warnings; > > use Data::Peek; > > my $out =3D ""; > open my $fh, ">:utf8", \$out; > > DDumper [ PerlIO::get_layers ($fh, details =3D> 1) ]; > -->8--- > > =3D> > > $VAR1 =3D [ > =A0 =A0'scalar', > =A0 =A0undef, > =A0 =A06328832 > =A0 =A0]; > > Where is my "utf8" layer?
Try PerlIO::get_layers ($fh) =96 or, for clarity, PerlIO::get_layers ($fh, details =3D> 0 ). Alternatively, you could note that 6328832 seems to be the (a?) flag signifying utf8. Since you seem to have read parts of it, let me quote the pertinent section of the POD: B<Implementation details follow, please close your eyes.> The arguments to layers are by default returned in parenthesis after the name of the layer, and certain layers (like C<utf8>) are not real layers but instead flags on real layers: to get all of these returned separately use the optional C<details> argument: my @layer_and_args_and_flags =3D PerlIO::get_layers($fh, details =3D> 1)= ; The result will be up to be three times the number of layers: the first element will be a name, the second element the arguments (unspecified arguments will be C<undef>), the third element the flags, the fourth element a name again, and so forth. B<You may open your eyes now.> Eirik
|
|
|
| 7) Tux Thanks for all the feedback, This code now prevents double encoding in Text::CSV_XS: #ifdef... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Tue, 9 Feb 2010 10:57:57 +0100, Eirik Berg Hanssen <ebhanssen@cpan.org> wrote: > On Tue, Feb 9, 2010 at 10:21 AM, H.Merijn Brand <h.m.brand@xs4all.nl> wrote:> > On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>> > wrote:> >> >> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:> >>> >> Check the output of PerlIO::get_layers($dst).> >> > Right, so is this a bug:> > --8<---> > use strict;> > use warnings;> >> > use Data::Peek;> >> > my $out = "";> > open my $fh, ">:utf8", \$out;> >> > DDumper [ PerlIO::get_layers ($fh, details => 1) ];> > -->8---> >> > =>> >> > $VAR1 = [> > 'scalar',> > undef,> > 6328832> > ];> >> > Where is my "utf8" layer?> > Try PerlIO::get_layers ($fh) – or, for clarity, PerlIO::get_layers> ($fh, details => 0 ).> > Alternatively, you could note that 6328832 seems to be the (a?) flag> signifying utf8. Since you seem to have read parts of it, let me> quote the pertinent section of the POD:> > B<Implementation details follow, please close your eyes.>> > The arguments to layers are by default returned in parenthesis after> the name of the layer, and certain layers (like C<utf8>) are not real> layers but instead flags on real layers: to get all of these returned> separately use the optional C<details> argument:> > my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);> > The result will be up to be three times the number of layers:> the first element will be a name, the second element the arguments> (unspecified arguments will be C<undef>), the third element the flags,> the fourth element a name again, and so forth.> > B<You may open your eyes now.>Thanks for all the feedback, This code now prevents double encoding in Text::CSV_XS: #ifdef USE_PERLIO if (csv->io_has_encoding == 0) { GV *gv = NULL; IO *io; csv->io_has_encoding = -1; /* Check only once! */ /* code stolen from universal.c */ if (isGV (dst)) gv = (GV *)dst; else if (SvROK (dst) && isGV (SvRV (dst))) gv = (GV *)SvRV (dst); else if (SvPOKp (dst)) gv = gv_fetchsv (dst, 0, SVt_PVIO); if (gv && (io = GvIO (gv))) { AV *av = PerlIO_get_layers (aTHX_ IoOFP (io)); I32 i; I32 last = av_len (av); for (i = last; i >= 0; i -= 3) { SV **namep = av_fetch (av, i - 2, FALSE); SV **flgsp = av_fetch (av, i, FALSE); if (( SvIOK (*flgsp) && SvIVX (*flgsp) & PERLIO_F_UTF8 ) || ( SvPOK (*namep) && ( memEQ (SvPV_nolen (*namep), "utf8", 4) || memEQ (SvPV_nolen (*namep), "encoding", 8)) )) csv->io_has_encoding = 1; } } } if (csv->utf8 && csv->io_has_encoding == 1) SvUTF8_on (tmp); #endif -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
|
|
|
|
 | |