FAQ

On Fri, Nov 26, 2010 at 01:47:55AM +0100, Chris 'Bingos' Williams wrote:
In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/719245bdb37d6b755905afe0a676055044850522?hp=d4456f896c3eddd615abf6048839581863ea6ca3>

- Log -----------------------------------------------------------------
commit 719245bdb37d6b755905afe0a676055044850522
Author: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Date: Fri Nov 26 00:46:37 2010 +0000

Update MIME-Base64 to CPAN version 3.12

[DELTA]

2010-10-25 Gisle Aas <gisle@ActiveState.com>

Release 3.12

Don't change SvUTF8 flag on the strings encoded [RT#60105]

Documentation tweaks
This causes a test failure in Encode. (I think, for all configurations):

./perl -MTestInit cpan/Encode/t/Encoder.t

...

ok 260 - decode
Wide character in subroutine entry at cpan/Encode/t/Encoder.t line 25.
# Looks like you planned 516 tests but ran 260.
# Looks like your test exited with 255 just after 260.


No, I don't know why. The collateral work of trying to maintain a
coherent distribution of modules...


The entire test is this:

#
# $Id: Encoder.t,v 2.0 2004/05/16 20:55:17 dankogai Exp $
#

BEGIN {
require Config; import Config;
if ($Config{'extensions'} !~ /\bEncode\b/) {
print "1..0 # Skip: Encode was not built\n";
exit 0;
}
$| = 1;
}

use strict;
#use Test::More 'no_plan';
use Test::More tests => 516;
use Encode::Encoder qw(encoder);
use MIME::Base64;
package Encode::Base64;
use base 'Encode::Encoding';
__PACKAGE__->Define('base64');
use MIME::Base64;
sub encode{
my ($obj, $data) = @_;
return encode_base64($data);
}
sub decode{
my ($obj, $data) = @_;
return decode_base64($data);
}

package main;

my $e = encoder("foo", "ascii");
ok ($e->data("bar"));
is ($e->data, "bar");
ok ($e->encoding("latin1"));
is ($e->encoding, "iso-8859-1");

my $data = '';
for my $i (0..255){
no warnings;
$data .= chr($i);
my $base64 = encode_base64($data);
is(encoder($data)->base64, $base64, "encode");
is(encoder($base64)->bytes('base64'), $data, "decode");
}

1;
__END__


Line 25 is

return encode_base64($data);

Nicholas Clark

Search Discussions

  • Nicholas Clark at Nov 26, 2010 at 10:58 am

    On Fri, Nov 26, 2010 at 09:13:29AM +0000, Nicholas Clark wrote:
    On Fri, Nov 26, 2010 at 01:47:55AM +0100, Chris 'Bingos' Williams wrote:

    Update MIME-Base64 to CPAN version 3.12

    [DELTA]

    2010-10-25 Gisle Aas <gisle@ActiveState.com>

    Release 3.12

    Don't change SvUTF8 flag on the strings encoded [RT#60105]

    Documentation tweaks
    This causes a test failure in Encode. (I think, for all configurations):

    ./perl -MTestInit cpan/Encode/t/Encoder.t

    ...

    ok 260 - decode
    Wide character in subroutine entry at cpan/Encode/t/Encoder.t line 25.
    # Looks like you planned 516 tests but ran 260.
    # Looks like your test exited with 255 just after 260.


    No, I don't know why. The collateral work of trying to maintain a
    coherent distribution of modules...
    OK, the cause is heuristics in Encode::Encoder:

    sub new {
    my ( $class, $data, $encname ) = @_;
    unless ($encname) {
    $encname = Encode::is_utf8($data) ? 'utf8' : '';
    }
    else {
    my $obj = find_encoding($encname)
    or croak __PACKAGE__, ": unknown encoding: $encname";
    $encname = $obj->name;
    }
    my $self = {
    data => $data,
    encoding => $encname,
    };
    bless $self => $class;
    }


    Looking at the documentation, Encode::is_utf8() is true if SvUTF8() is true.
    So, if I take the *same sequence of ords* and change the internal
    representation, that changes.

    However, if $encname is set to utf8, then Encode::Encoder assumes that that
    sequence of ords is a valid UTF-8 sequence. Which, well, it isn't. Because
    that's not what Encode::is_utf8() checks. So, my minimal test case, of a
    no-op

    use strict;
    use warnings;

    {
    package Encode::noop;

    use parent 'Encode::Encoding';
    __PACKAGE__->Define('noop');

    sub encode{
    my ($obj, $data) = @_;
    return $data;
    }

    sub decode{
    my ($obj, $data) = @_;
    return $data;
    }
    }

    use Encode::Encoder qw(encoder);
    use Devel::Peek;

    my $a = chr 163;
    my $b = $a . chr 256;
    chop $b;

    for my $in ($a, $b) {
    Dump($in);
    my $e = encoder($in);
    printf "Encoding is '%s'\n", $e->encoding;
    my $out = $e->noop();
    Dump($out . '');
    }

    __END__


    You can see that the heuristic means that my no-operation "encoder" actually
    mangles anything that isn't ASCII, if it happens to have become upgraded
    at some point.

    $ ./perl -Ilib encoder.pl
    SV = PV(0x84517d4) at 0x84bb2a4
    REFCNT = 2
    FLAGS = (PADMY,POK,pPOK)
    PV = 0x846d5cc "\243"\0
    CUR = 1
    LEN = 12
    Encoding is ''
    SV = PV(0x84cb00c) at 0x8528844
    REFCNT = 1
    FLAGS = (PADTMP,POK,pPOK)
    PV = 0x84f6a04 "\243"\0
    CUR = 1
    LEN = 12
    SV = PV(0x84517ec) at 0x848b574
    REFCNT = 2
    FLAGS = (PADMY,POK,pPOK,UTF8)
    PV = 0x8471ca4 "\302\243"\0 [UTF8 "\x{a3}"]
    CUR = 2
    LEN = 12
    Encoding is 'utf8'
    SV = PV(0x84cb00c) at 0x8528844
    REFCNT = 1
    FLAGS = (PADTMP,POK,pPOK,UTF8)
    PV = 0x84f6a04 "\357\277\275"\0 [UTF8 "\x{fffd}"]
    CUR = 3
    LEN = 12


    Nicholas Clark
  • Gisle Aas at Nov 26, 2010 at 11:08 pm

    On Nov 26, 2010, at 11:58 , Nicholas Clark wrote:

    OK, the cause is heuristics in Encode::Encoder:

    sub new {
    my ( $class, $data, $encname ) = @_;
    unless ($encname) {
    $encname = Encode::is_utf8($data) ? 'utf8' : '';
    }
    else {
    my $obj = find_encoding($encname)
    or croak __PACKAGE__, ": unknown encoding: $encname";
    $encname = $obj->name;
    }
    my $self = {
    data => $data,
    encoding => $encname,
    };
    bless $self => $class;
    }


    Looking at the documentation, Encode::is_utf8() is true if SvUTF8() is true.
    So, if I take the *same sequence of ords* and change the internal
    representation, that changes.

    However, if $encname is set to utf8, then Encode::Encoder assumes that that
    sequence of ords is a valid UTF-8 sequence. Which, well, it isn't. Because
    that's not what Encode::is_utf8() checks. So, my minimal test case, of a
    no-op
    Based on your test case I fixed the Encoder bug in
    <https://github.com/gisle/p5-encode/commit/36578d3bb1deb6d7e546ce9cf0d454ee68b74257>.

    --Gisle

    use strict;
    use warnings;

    {
    package Encode::noop;

    use parent 'Encode::Encoding';
    __PACKAGE__->Define('noop');

    sub encode{
    my ($obj, $data) = @_;
    return $data;
    }

    sub decode{
    my ($obj, $data) = @_;
    return $data;
    }
    }

    use Encode::Encoder qw(encoder);
    use Devel::Peek;

    my $a = chr 163;
    my $b = $a . chr 256;
    chop $b;

    for my $in ($a, $b) {
    Dump($in);
    my $e = encoder($in);
    printf "Encoding is '%s'\n", $e->encoding;
    my $out = $e->noop();
    Dump($out . '');
    }

    __END__


    You can see that the heuristic means that my no-operation "encoder" actually
    mangles anything that isn't ASCII, if it happens to have become upgraded
    at some point.

    $ ./perl -Ilib encoder.pl
    SV = PV(0x84517d4) at 0x84bb2a4
    REFCNT = 2
    FLAGS = (PADMY,POK,pPOK)
    PV = 0x846d5cc "\243"\0
    CUR = 1
    LEN = 12
    Encoding is ''
    SV = PV(0x84cb00c) at 0x8528844
    REFCNT = 1
    FLAGS = (PADTMP,POK,pPOK)
    PV = 0x84f6a04 "\243"\0
    CUR = 1
    LEN = 12
    SV = PV(0x84517ec) at 0x848b574
    REFCNT = 2
    FLAGS = (PADMY,POK,pPOK,UTF8)
    PV = 0x8471ca4 "\302\243"\0 [UTF8 "\x{a3}"]
    CUR = 2
    LEN = 12
    Encoding is 'utf8'
    SV = PV(0x84cb00c) at 0x8528844
    REFCNT = 1
    FLAGS = (PADTMP,POK,pPOK,UTF8)
    PV = 0x84f6a04 "\357\277\275"\0 [UTF8 "\x{fffd}"]
    CUR = 3
    LEN = 12


    Nicholas Clark
  • Gisle Aas at Nov 26, 2010 at 9:54 pm

    On Nov 26, 2010, at 10:13 , Nicholas Clark wrote:
    On Fri, Nov 26, 2010 at 01:47:55AM +0100, Chris 'Bingos' Williams wrote:
    In perl.git, the branch blead has been updated

    <http://perl5.git.perl.org/perl.git/commitdiff/719245bdb37d6b755905afe0a676055044850522?hp=d4456f896c3eddd615abf6048839581863ea6ca3>

    - Log -----------------------------------------------------------------
    commit 719245bdb37d6b755905afe0a676055044850522
    Author: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
    Date: Fri Nov 26 00:46:37 2010 +0000

    Update MIME-Base64 to CPAN version 3.12

    [DELTA]

    2010-10-25 Gisle Aas <gisle@ActiveState.com>

    Release 3.12

    Don't change SvUTF8 flag on the strings encoded [RT#60105]

    Documentation tweaks
    This causes a test failure in Encode. (I think, for all configurations):
    MIME-Base64-3.12 is buggy. In encode_base64() it ends up setting the UTF8 flag on its argument where it should not. I'll upload 3.13 asap.

    --Gisle
  • Chris 'BinGOs' Williams at Nov 26, 2010 at 10:39 pm

    On Fri, Nov 26, 2010 at 10:54:04PM +0100, Gisle Aas wrote:

    MIME-Base64-3.12 is buggy. In encode_base64() it ends up setting the UTF8 flag on its argument where it should not. I'll upload 3.13 asap.

    --Gisle
    Thanks, I'm integrating it at the moment into blead, doing a build/test.

    --
    Chris Williams
    aka BinGOs
    PGP ID 0x4658671F
    http://www.gumbynet.org.uk
    ==========================

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedNov 26, '10 at 9:13a
activeNov 26, '10 at 11:08p
posts5
users3
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase