FAQ
Hello,

Can someone help me understand what could cause this :

warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8");
warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" : "not utf8");
warn "content4=$content";
if ($ticketdata[0]->[0] ne $content) {
  warn "content5=$content";
  #
  warn "content6=$content stored=".$ticketdata[0]->[0];
  warn "content7=$content";
}

In apache2 error.log :

[Wed Jun 12 16:35:56 2013] [warn] [12504]ERR: 32: Warning in Perl code: $content : not utf8 at /var/www/sites/recia/rtgi3/rtgilib.pm line 382, <GEN46> line 13.
[Wed Jun 12 16:35:56 2013] [warn] [12504]ERR: 32: Warning in Perl code: $ticketdata[0]->[0] : utf8 at /var/www/sites/recia/rtgi3/rtgilib.pm line 383, <GEN46> line 13.
[Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code: content4=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 381, <GEN47> line 13.
[Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code: content5=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 383, <GEN47> line 13.
[Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code: content6=h\xc3\x83\xc2\xa9 stored=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 385, <GEN47> line 13.
[Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code: content7=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 386, <GEN47> line 13.

As you see, the $content variable changes from one line to the other ?!?
$ticketdata[0]->[0] contains "hé" coming from a DB (configured as UTF-8) and the test should not fail.

I guess the problem comes from the fact that on the same line I have one utf-8 variable and one non-utf8 one.

$content comes from $fdat{content} (not marked as utf8 while the page encoding is declared and recognized as utf-8).

What can I do to force embperl to always set the utf-8 flag on $fdat{...} ?

If you know a way of telling Apache/EmbPerl that no encoding other than UTF-8 exist in the world, I'll take it. And it's not a problem if I'm incompatible with anything.

Thanks for your help,

(using libembperl-perl 2.5.0~rc3-1 on Debian/wheezy with apache2-mpm-prefork 2.2.22-13)

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org

Search Discussions

  • Dirk Melchers at Jun 13, 2013 at 8:35 am
    Hello Jean-Christophe,

    Am 12.06.2013 um 16:44 schrieb Jean-Christophe Boggio:
    Hello,

    Can someone help me understand what could cause this :

    warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8");
    warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" : "not utf8");
    warn "content4=$content";
    if ($ticketdata[0]->[0] ne $content) {
    warn "content5=$content";
    #
    warn "content6=$content stored=".$ticketdata[0]->[0];
    warn "content7=$content";
    } [...]
    I guess the problem comes from the fact that on the same line I have one utf-8 variable and one non-utf8 one.

    $content comes from $fdat{content} (not marked as utf8 while the page encoding is declared and recognized as utf-8).

    What can I do to force embperl to always set the utf-8 flag on $fdat{...} ?

    If you know a way of telling Apache/EmbPerl that no encoding other than UTF-8 exist in the world, I'll take it. And it's not a problem if I'm incompatible with anything.


    I guess your guess is right - having one utf8 flagged variable in a statement converts all other things to utf8 also - and perl uses ISO-8895-1 for the conversion!
    So your string is destroyed after that. The same thing happens, when you use a Freeze::Thaw or a DataDumper - bad for serializing and storing something in a database :-(

    Embperl decides for itself, if the %fdat parameters are utf8 or not - I don't know, how it does so, maybe Gerald could say something about that - but we had a lot of "funny" things in the past regarding this problem. Our website is in different encodings (not UTF8 and not ISO-8859-1) so we ran in the trouble. We implemented an own "thaw" method which tries to thaw the data and if that fails, it converts the data to utf8 and thaws it again...

    A solution for you could be: use "$content=decode('UTF-8',$content)" to flag your variable or walk over %fdat to do it with all keys which are not already utf8-flagged. After that, you should have UTF8-only variables and everything works as expected.

    One little additional comment: using non utf8-flagged variables with utf8-content (as your $content variable) breaks a lot of perl stuff: lc, uc, cmp, le, gt, length, sort, ....


    With best regards,

    Dirk Melchers
    /// IT/Software-Development ///

    NUREG GmbH ///
    Dorfäckerstraße 31 | 90427 Nürnberg | Germany
    Tel. +49-911-32002-256 | Fax +49-911-32002-299
    Mobil +49-172-9354670 | www.nureg.de
    Nürnberg HRB 22653 | USt.ID DE 814 685 653
    Geschäftsführer: Michael Schmidt, Stefan Boas


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Gerald Richter at Jul 3, 2013 at 3:48 pm
    Hi,

    sorry for the late reply.

    Perl utf8 flag does NOT says that your data is utf8 or not. It tell us something about the internal representation of your data inside of Perl. So utf8 data can have the utf8 set, but it need not, also everything is alright.

    Unfortunately when I wrote the utf8 %fdat handling I was not fully aware of this fact.

    It might help to access your %fdat data via

    $data = Encode::decode_utf8 ($fdat{foo}) ;

    Decode_utf8 will convert the utf8 data (that Embperl delivers) to the correct internal representation.

    I will fix this in a further release

    Hope this helps

    Gerald
    -----Ursprüngliche Nachricht-----
    Von: Jean-Christophe Boggio
    Gesendet: Mittwoch, 12. Juni 2013 16:44
    An: embperl@perl.apache.org
    Betreff: Getting mad with UTF-8

    Hello,

    Can someone help me understand what could cause this :

    warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8"); warn
    "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" : "not
    utf8"); warn "content4=$content"; if ($ticketdata[0]->[0] ne $content) {
    warn "content5=$content";
    #
    warn "content6=$content stored=".$ticketdata[0]->[0];
    warn "content7=$content";
    }

    In apache2 error.log :

    [Wed Jun 12 16:35:56 2013] [warn] [12504]ERR: 32: Warning in Perl code:
    $content : not utf8 at /var/www/sites/recia/rtgi3/rtgilib.pm line 382,
    <GEN46> line 13.
    [Wed Jun 12 16:35:56 2013] [warn] [12504]ERR: 32: Warning in Perl code:
    $ticketdata[0]->[0] : utf8 at /var/www/sites/recia/rtgi3/rtgilib.pm line 383,
    <GEN46> line 13.
    [Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code:
    content4=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 381,
    <GEN47> line 13.
    [Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code:
    content5=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 383,
    <GEN47> line 13.
    [Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code:
    content6=h\xc3\x83\xc2\xa9 stored=h\xc3\xa9 at
    /var/www/sites/recia/rtgi3/rtgilib.pm line 385, <GEN47> line 13.
    [Wed Jun 12 16:29:13 2013] [warn] [10974]ERR: 32: Warning in Perl code:
    content7=h\xc3\xa9 at /var/www/sites/recia/rtgi3/rtgilib.pm line 386,
    <GEN47> line 13.

    As you see, the $content variable changes from one line to the other ?!?
    $ticketdata[0]->[0] contains "hé" coming from a DB (configured as UTF-8) and
    the test should not fail.

    I guess the problem comes from the fact that on the same line I have one
    utf-8 variable and one non-utf8 one.

    $content comes from $fdat{content} (not marked as utf8 while the page
    encoding is declared and recognized as utf-8).

    What can I do to force embperl to always set the utf-8 flag on $fdat{...} ?

    If you know a way of telling Apache/EmbPerl that no encoding other than
    UTF-8 exist in the world, I'll take it. And it's not a problem if I'm incompatible
    with anything.

    Thanks for your help,

    (using libembperl-perl 2.5.0~rc3-1 on Debian/wheezy with apache2-mpm-
    prefork 2.2.22-13)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Jean-Christophe Boggio at Jul 3, 2013 at 4:26 pm
    Gerald,

    Le 03/07/2013 17:47, Gerald Richter - ECOS a écrit :
    sorry for the late reply.
    No problem.
    Perl utf8 flag does NOT says that your data is utf8 or not. It tell
    us something about the internal representation of your data inside of
    Perl.
    I agree with that.
    So utf8 data can have the utf8 set, but it need not, also everything
    is alright.
    But isn't it what causes my problem ? Data comes as UTF-8 but is not
    "seen" by perl as such. So it gets re-encoded.
    That's how I understand it.
    It might help to access your %fdat data via $data =
    Encode::decode_utf8 ($fdat{foo}) ;
    Yes but I'd have to do it everywhere %fdat is concerned.
    Decode_utf8 will convert the utf8 data (that Embperl delivers) to the
    correct internal representation. I will fix this in a further
    release Hope this helps
    I guess that will solve many things.

    I have read many docs about UTF-8 but am still confused. I still don't
    understand what decode_utf8 *really* does. For example, what happens if
    you do it twice ? Like :

    $fdat{foo} = decode_utf8 ( decode_utf8 ($fdat{foo}) );

    Will it decode it once and then see it's already UTF-8 (because it has
    the utf8 flag set) and don't do it a second time ?

    Also, I still don't understand why I seem to be the only one having
    problems with UTF-8 :-)

    Thanks for taking care of this issue.

    Best regards,

    JC

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupembperl @
categoriesmodperl, perl
postedJun 12, '13 at 2:44p
activeJul 3, '13 at 4:26p
posts4
users3
websiteperl.apache.org

People

Translate

site design / logo © 2018 Grokbase