FAQ
I had a character encoding issue that I finally solved, but I don't
understand why the fix works. I'm hoping someone can explain this to me!

The issue was that non-ascii chars were appearing as junk BUT only when
retrieved via ajax calls. Otherwise, they displayed fine. The junk display
was due to them being interpreted as ISO-8859-1, but I could not figure out
why the browser was interpreting that way. All my data is handled as UTF-8.

The problem was fixed by calling utf8::decode on the data prior to sending
back via ajax. BUT WHY?

I am using the JSON view to render ajax responses, and it sets the charset
header correctly to UTF-8. Of course, even when you decode, perl still
represents as "internal" utf8. But why should this be necessary?

Thanks!



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090618/efe06a4f/attachment.htm

Search Discussions

  • Moritz Onken at Jun 19, 2009 at 7:52 am

    Am 19.06.2009 um 06:23 schrieb seasprocket@gmail.com:

    I had a character encoding issue that I finally solved, but I don't
    understand why the fix works. I'm hoping someone can explain this to
    me!

    The issue was that non-ascii chars were appearing as junk BUT only
    when retrieved via ajax calls. Otherwise, they displayed fine. The
    junk display was due to them being interpreted as ISO-8859-1, but I
    could not figure out why the browser was interpreting that way. All
    my data is handled as UTF-8.

    The problem was fixed by calling utf8::decode on the data prior to
    sending back via ajax. BUT WHY?

    I am using the JSON view to render ajax responses, and it sets the
    charset header correctly to UTF-8. Of course, even when you decode,
    perl still represents as "internal" utf8. But why should this be
    necessary?

    Thanks!
    What is the encoding of the web page that issues that ajax request?
    Does this occur on different browser as well?
    I had similar problems and solved it by making sure that
    every page has the utf8 encoding header set.

    IMHO using utf8::decode is a hack and should be avoided if possible.

    moritz
  • Phil Mitchell at Jun 19, 2009 at 3:25 pm

    On Fri, Jun 19, 2009 at 12:52 AM, Moritz Onken wrote:
    Am 19.06.2009 um 06:23 schrieb seasprocket@gmail.com:
    What is the encoding of the web page that issues that ajax request?

    charset=UTF-8

    Does this occur on different browser as well?
    yes (tested on FF and IE)
    I had similar problems and solved it by making sure that
    every page has the utf8 encoding header set.

    IMHO using utf8::decode is a hack and should be avoided if possible.

    I totally agree, but it needs to be fixed!


    moritz


    --
    ==========================
    http://www.bikewise.org

    2People citizen's network for climate action: http://www.2people.org

    Greater Seattle Climate Dialogues: http://www.climatedialogues.org
    ==========================
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090619/09606de6/attachment.htm
  • Francesc Romà i Frigolé at Jun 19, 2009 at 3:51 pm

    On Fri, Jun 19, 2009 at 6:23 AM, wrote:
    The problem was fixed by calling utf8::decode on the data prior to sending
    back via ajax. BUT WHY?

    I am using the JSON view to render ajax responses, and it sets the charset
    header correctly to UTF-8. Of course, even when you decode, perl still
    represents as "internal" utf8. But why should this be necessary?

    I had exactly the same problem and solution using Catalyst::Controller::REST
    with the JSON serializer. Still in my list of 'big mysteries to be solved'.


    I hadn't discovered Catalyst::Plugin::Unicode back then, I wonder if using
    it would help, haven't tried myself yet.

    Cheers
    Francesc
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090619/5a0fb97c/attachment.htm
  • Aristotle Pagaltzis at Jun 20, 2009 at 10:50 am

    * seasprocket@gmail.com [2009-06-19 06:30]:
    The issue was that non-ascii chars were appearing as junk BUT
    only when retrieved via ajax calls. Otherwise, they displayed
    fine. The junk display was due to them being interpreted as
    ISO-8859-1, but I could not figure out why the browser was
    interpreting that way. All my data is handled as UTF-8.

    The problem was fixed by calling utf8::decode on the data prior
    to sending back via ajax. BUT WHY?
    Looks like your code is broken and assumes bytes throughout; as
    long as all your data is UTF-8 you won?t notice. Apparently the
    JSON serialiser is trying to produce UTF-8 output correctly by
    encoding the strings you pass it; since they?re already encoded,
    you get double-encoding gremlins.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Seasprocket at Jun 23, 2009 at 12:50 am

    On Sat, Jun 20, 2009 at 3:50 AM, Aristotle Pagaltzis wrote:

    * seasprocket@gmail.com [2009-06-19 06:30]:>
    The problem was fixed by calling utf8::decode on the data prior
    to sending back via ajax. BUT WHY?
    Looks like your code is broken and assumes bytes throughout; as
    long as all your data is UTF-8 you won�t notice. Apparently the
    JSON serialiser is trying to produce UTF-8 output correctly by
    encoding the strings you pass it; since they�re already encoded,
    you get double-encoding gremlins.

    Thanks for your suggestion, but I'm pretty sure that the data is not getting
    encoded twice. C::V::JSON tests the data before it encodes (
    Encode::is_utf8() ) and only encodes if this test is true. This test only
    passes if the data is decoded.

    I have confirmed this by checking to see if Encode::encode is getting called
    in C::V::JSON (it's not).

    I agree something's broken, I just don't know what it is ... My suspicion is
    that I don't really understand what's happening inside sqlite -- I assume
    it's storing as UTF-8, but I don't really know what it's doing.



    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>

    _______________________________________________
    List: Catalyst@lists.scsys.co.uk
    Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
    Searchable archive:
    http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
    Dev site: http://dev.catalyst.perl.org/


    --
    ==========================
    http://www.bikewise.org

    2People citizen's network for climate action: http://www.2people.org

    Greater Seattle Climate Dialogues: http://www.climatedialogues.org
    ==========================
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090622/2582772f/attachment.htm
  • Aristotle Pagaltzis at Jun 23, 2009 at 8:28 am

    * seasprocket@gmail.com [2009-06-23 03:00]:
    Thanks for your suggestion, but I'm pretty sure that the data
    is not getting encoded twice. C::V::JSON tests the data before
    it encodes ( Encode::is_utf8() ) and only encodes if this test
    is true. This test only passes if the data is decoded.
    Augh! Augh! Why do people keep reading stuff into the UTF8 flag
    that it doesn?t mean. (Yeah, I know why, because it?s called the
    UTF8 flag when it should?ve been the UOK flag or something.) You
    can have decoded data with the UTF8 flag off, and you can have
    encoded data with the UTF8 flag on. The UTF8 flag is about the
    internals-level format of the byte buffer of a scalar, it has
    nothing to do with the meaning of the data on the Perl level.
    Testing the flag in pure-Perl code is an almost certain sign of
    brokenness.
    My suspicion is that I don't really understand what's happening
    inside sqlite -- I assume it's storing as UTF-8, but I don't
    really know what it's doing.
    Try Devel::Peek to examine the strings that come out of it?

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Seasprocket at Jun 29, 2009 at 6:46 pm

    On Tue, Jun 23, 2009 at 1:28 AM, Aristotle Pagaltzis wrote:

    * seasprocket@gmail.com [2009-06-23 03:00]:
    Thanks for your suggestion, but I'm pretty sure that the data
    is not getting encoded twice. C::V::JSON tests the data before
    it encodes ( Encode::is_utf8() ) and only encodes if this test
    is true. This test only passes if the data is decoded.
    Augh! Augh! Why do people keep reading stuff into the UTF8 flag
    that it doesn’t mean. (Yeah, I know why, because it’s called the
    UTF8 flag when it should’ve been the UOK flag or something.) You
    can have decoded data with the UTF8 flag off, and you can have
    encoded data with the UTF8 flag on.

    (Sorry to be so slow to reply. I wanted to find time to fully investigate
    this, but haven't.)

    The Encode docs state:

    # When you encode, the resulting UTF8 flag is always off.
    # When you decode, the resulting UTF8 flag is on unless you can
    unambiguously represent data [as ASCII].

    I was interpreting this to apply to all encoding/decoding -- but I now
    realize that it may only apply to the Encode package. Which really just
    leaves me more confused .. :)
    My suspicion is that I don't really understand what's happening
    inside sqlite -- I assume it's storing as UTF-8, but I don't
    really know what it's doing.
    Try Devel::Peek to examine the strings that come out of it?

    I used Devel::StringInfo and found:

    [info] string: Madrid Alarcón
    is_utf8: 0
    octet_length: 15
    valid_utf8: 1
    decoded_is_same: 0
    decoded:
    octet_length: 15
    downgradable: 1
    char_length: 14
    string: Madrid Alarc
    is_utf8: 1
    raw = <<Madrid Alarcón>>

    I did not draw any brilliant conclusions from this, although I'm curious why
    the decoded version has the non-ASCII char cut off.

    At this point, obviously, I need to find the time to dig in further. Thanks
    for your thoughts!



    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>

    _______________________________________________
    List: Catalyst@lists.scsys.co.uk
    Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
    Searchable archive:
    http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
    Dev site: http://dev.catalyst.perl.org/


    --
    ==========================
    http://www.bikewise.org

    2People citizen's network for climate action: http://www.2people.org

    Greater Seattle Climate Dialogues: http://www.climatedialogues.org
    ==========================
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090629/ea7dc6c1/attachment.htm

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcatalyst @
categoriescatalyst, perl
postedJun 19, '09 at 4:23a
activeJun 29, '09 at 6:46p
posts8
users5
websitecatalystframework.org
irc#catalyst

People

Translate

site design / logo © 2022 Grokbase