Grokbase
x

Can I `see' what encoding my stream is?

View TopicPrint | Flat  Thread  Threaded
1) Tux In XS, I have the following snippet: SV *tmp = newSVpv (csv->buffer, csv->used); dSP;...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
In XS, I have the following snippet:
--8<---
SV *tmp = newSVpv (csv->buffer, csv->used);
dSP;
require_IO_Handle;
PUSHMARK (sp);
EXTEND (sp, 2);
PUSHs ((dst));
PUSHs (tmp);
PUTBACK;
result = call_sv (m_print, G_SCALAR | G_METHOD);
SPAGAIN;
if (result) {
     result = POPi;
     unless (result)
  (void)SetDiag (csv, 2200);
     }
PUTBACK;
SvREFCNT_dec (tmp);
-->8---

To prevent double encoding, is it possible to `see' (from XS) that the
stream is opened with something like

open my $dst, ">:utf8", "file";
or
open my $dst, ">:encoding(utf8)", "file";
or
open my $dst, ">", "file";
binmode $dst, ":encoding(utf8)";

?
--8<---
use strict;
use warnings;

use Data::Peek;
use Encode qw( encode decode );

my $euro = "\x{20ac}";
my $deuro = encode ("utf8", $euro);
DPeek $euro;
DPeek $deuro;

{   open my $fh, ">", "test-plain.out";
    print $fh "$euro\n";
    }

{   open my $fh, ">:encoding(utf8)", "test-utf8.out";
    print $fh "$euro\n";
    }

{   open my $fh, ">", "test-encode.out";
    print $fh "$deuro\n";
    }

{   open my $fh, ">:encoding(utf8)", "test-utf8enc.out";
    print $fh "$deuro\n";
    }
-->8---

PV("\342\202\254"\0) [UTF8 "\x{20ac}"]
PV("\342\202\254"\0)

Source : "test-encode.out".

00000000  E2 82 AC 0A                                         ....

Source : "test-plain.out".

00000000  E2 82 AC 0A                                         ....

Source : "test-utf8enc.out".

00000000  C3 A2 C2 82 C2 AC 0A                                .......

Source : "test-utf8.out".

00000000  E2 82 AC 0A                                         ....


So, 3 out of 4 are fine (as expected), but if a module want to do the
right thing, it probably needs to know if the right thing is already
being done to prevent the bad double-encoding.

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
2) Slaven Rezic Check the output of PerlIO::get_layers($dst).
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
"H.Merijn Brand" <h.m.brand@xs4all.nl> writes:

>
> To prevent double encoding, is it possible to `see' (from XS) that the
> stream is opened with something like
>
>  open my $dst, ">:utf8", "file";
> or
>  open my $dst, ">:encoding(utf8)", "file";
> or
>  open my $dst, ">", "file";
>  binmode $dst, ":encoding(utf8)";
>
> ?

Check the output of PerlIO::get_layers($dst).

--
Slaven Rezic - slaven <at> rezic <dot> de

    tkruler - Perl/Tk program for measuring screen distances
http://ptktools.sourceforge.net/#tkruler
3) Ben Morrow Quoth h.m.brand@xs4all.nl ("H.Merijn Brand"): universal.c as a wrapper around PerlIO_get_layers,...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Quoth [email protected: h.m...@xs4all.nl] ("H.Merijn Brand"):
> In XS, I have the following snippet:
> --8<---
> SV *tmp = newSVpv (csv->buffer, csv->used);
>  dSP;
>  require_IO_Handle;
>  PUSHMARK (sp);
>  EXTEND (sp, 2);
>  PUSHs ((dst));
>  PUSHs (tmp);
>  PUTBACK;
>  result = call_sv (m_print, G_SCALAR | G_METHOD);
>  SPAGAIN;
>  if (result) {
>      result = POPi;
>      unless (result)
>   (void)SetDiag (csv, 2200);
>      }
>  PUTBACK;
>  SvREFCNT_dec (tmp);
> -->8---
>
> To prevent double encoding, is it possible to `see' (from XS) that the
> stream is opened with something like
>
>  open my $dst, ">:utf8", "file";
> or
>  open my $dst, ">:encoding(utf8)", "file";
> or
>  open my $dst, ">", "file";
>  binmode $dst, ":encoding(utf8)";

>From Perl you call PerlIO::get_layers. This is implemented in

universal.c as a wrapper around PerlIO_get_layers, but that doesn't
appear to be in the API. Perhaps it should be?

You could also look at its implementation in perlio.c, which is very
simple and AFAICT only uses documented bits of PerlIO.

Ben
4) Tux Right, so is this a bug: use strict; use warnings; use Data::Peek; my $out = ""; open my $fh,...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>
wrote:

> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
>
> >
> > To prevent double encoding, is it possible to `see' (from XS) that the
> > stream is opened with something like
> >
> >  open my $dst, ">:utf8", "file";
> > or
> > open my $dst, ">:encoding(utf8)", "file";
> > or
> >  open my $dst, ">", "file";
> >  binmode $dst, ":encoding(utf8)";
> >
> > ?
>
> Check the output of PerlIO::get_layers($dst).

Right, so is this a bug:
--8<---
use strict;
use warnings;

use Data::Peek;

my $out = "";
open my $fh, ">:utf8", \$out;

DDumper [ PerlIO::get_layers ($fh, details => 1) ];
-->8---

=>

$VAR1 = [
    'scalar',
    undef,
    6328832
    ];

Where is my "utf8" layer?

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
5) Tux FWIW: This looks more sane: open my $fh, ">:encoding(utf8)", \$out; $VAR1 = [ 'scalar', undef,...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, 9 Feb 2010 10:21:42 +0100, "H.Merijn Brand"
<h.m.brand@xs4all.nl> wrote:

> On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>
> wrote:
>
> > "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
> >
> > >
> > > To prevent double encoding, is it possible to `see' (from XS) that the
> > > stream is opened with something like
> > >
> > >  open my $dst, ">:utf8", "file";
> > > or
> > > open my $dst, ">:encoding(utf8)", "file";
> > > or
> > >  open my $dst, ">", "file";
> > >  binmode $dst, ":encoding(utf8)";
> > >
> > > ?
> >
> > Check the output of PerlIO::get_layers($dst).
>
> Right, so is this a bug:
> --8<---
> use strict;
> use warnings;
>
> use Data::Peek;
>
> my $out = "";
> open my $fh, ">:utf8", \$out;
>
> DDumper [ PerlIO::get_layers ($fh, details => 1) ];
> -->8---

FWIW: This looks more sane:

open my $fh, ">:encoding(utf8)", \$out;

=>

$VAR1 = [
    'scalar',
    undef,
    6296064,
    'encoding',
    'utf8',
    4231680
];


> =>
>
> $VAR1 = [
>     'scalar',
>     undef,
>     6328832
>     ];
>
> Where is my "utf8" layer?
>


--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
6) Eirik Berg Hanssen Try PerlIO::get_layers ($fh) =96 or, for clarity, PerlIO::get_layers Alternatively, you could note...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, Feb 9, 2010 at 10:21 AM, H.Merijn Brand <h.m.brand@xs4all.nl> wrote=
:
> On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>
> wrote:
>
>> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
>>
>> Check the output of PerlIO::get_layers($dst).
>
> Right, so is this a bug:
> --8<---
> use strict;
> use warnings;
>
> use Data::Peek;
>
> my $out =3D "";
> open my $fh, ">:utf8", \$out;
>
> DDumper [ PerlIO::get_layers ($fh, details =3D> 1) ];
> -->8---
>
> =3D>
>
> $VAR1 =3D [
> =A0 =A0'scalar',
> =A0 =A0undef,
> =A0 =A06328832
> =A0 =A0];
>
> Where is my "utf8" layer?

  Try PerlIO::get_layers ($fh) =96 or, for clarity, PerlIO::get_layers
($fh, details =3D> 0 ).

  Alternatively, you could note that 6328832 seems to be the (a?) flag
signifying utf8.  Since you seem to have read parts of it, let me
quote the pertinent section of the POD:

B<Implementation details follow, please close your eyes.>

The arguments to layers are by default returned in parenthesis after
the name of the layer, and certain layers (like C<utf8>) are not real
layers but instead flags on real layers: to get all of these returned
separately use the optional C<details> argument:

   my @layer_and_args_and_flags =3D PerlIO::get_layers($fh, details =3D> 1)=
;

The result will be up to be three times the number of layers:
the first element will be a name, the second element the arguments
(unspecified arguments will be C<undef>), the third element the flags,
the fourth element a name again, and so forth.

B<You may open your eyes now.>


Eirik
7) Tux Thanks for all the feedback, This code now prevents double encoding in Text::CSV_XS: #ifdef...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, 9 Feb 2010 10:57:57 +0100, Eirik Berg Hanssen
<ebhanssen@cpan.org> wrote:

> On Tue, Feb 9, 2010 at 10:21 AM, H.Merijn Brand <h.m.brand@xs4all.nl> wrote:
> > On Mon, 08 Feb 2010 23:40:32 +0100, Slaven Rezic <slaven@rezic.de>
> > wrote:
> >
> >> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
> >>
> >> Check the output of PerlIO::get_layers($dst).
> >
> > Right, so is this a bug:
> > --8<---
> > use strict;
> > use warnings;
> >
> > use Data::Peek;
> >
> > my $out = "";
> > open my $fh, ">:utf8", \$out;
> >
> > DDumper [ PerlIO::get_layers ($fh, details => 1) ];
> > -->8---
> >
> > =>
> >
> > $VAR1 = [
> >    'scalar',
> >    undef,
> >    6328832
> >    ];
> >
> > Where is my "utf8" layer?
>
> Try PerlIO::get_layers ($fh) – or, for clarity, PerlIO::get_layers
> ($fh, details => 0 ).
>
> Alternatively, you could note that 6328832 seems to be the (a?) flag
> signifying utf8. Since you seem to have read parts of it, let me
> quote the pertinent section of the POD:
>
> B<Implementation details follow, please close your eyes.>
>
> The arguments to layers are by default returned in parenthesis after
> the name of the layer, and certain layers (like C<utf8>) are not real
> layers but instead flags on real layers: to get all of these returned
> separately use the optional C<details> argument:
>
> my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
>
> The result will be up to be three times the number of layers:
> the first element will be a name, the second element the arguments
> (unspecified arguments will be C<undef>), the third element the flags,
> the fourth element a name again, and so forth.
>
> B<You may open your eyes now.>

Thanks for all the feedback, This code now prevents double encoding in
Text::CSV_XS:

#ifdef USE_PERLIO
if (csv->io_has_encoding == 0) {
     GV *gv = NULL;
     IO *io;

     csv->io_has_encoding = -1; /* Check only once! */

     /* code stolen from universal.c */
     if (isGV (dst))
  gv = (GV *)dst;
     else if (SvROK (dst) && isGV (SvRV (dst)))
  gv = (GV *)SvRV (dst);
     else if (SvPOKp (dst))
  gv = gv_fetchsv (dst, 0, SVt_PVIO);

     if (gv && (io = GvIO (gv))) {
  AV *av = PerlIO_get_layers (aTHX_ IoOFP (io));
  I32 i;
  I32 last = av_len (av);
  for (i = last; i >= 0; i -= 3) {
      SV **namep = av_fetch (av, i - 2, FALSE);
      SV **flgsp = av_fetch (av, i,     FALSE);
      if (( SvIOK (*flgsp) &&
       SvIVX (*flgsp) & PERLIO_F_UTF8 ) ||
   ( SvPOK (*namep) && (
       memEQ (SvPV_nolen (*namep), "utf8",     4) ||
       memEQ (SvPV_nolen (*namep), "encoding", 8))
       ))
   csv->io_has_encoding = 1;
      }
  }
     }
if (csv->utf8 && csv->io_has_encoding == 1)
     SvUTF8_on (tmp);
#endif

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
spacer
View TopicPrint | Flat  Thread  Threaded
Home > Groups > Perl 5 Porters > Can I `see' what encoding my stream is? (7 posts)