FAQ
I'm having a small problem that I hope somebody has a simple solution
to. I'm using Catalyst with TT for the view, PostgreSQL and everything
set up using utf8 (in perl source "use utf8", in postgres using
"enable_utf8" and in the actual templates containing utf8 encoded
interational characers). I've verified that the data stored in postgres
is actually stored correctly (international characters in the postgres
table display correctly in psql, and data pulled from both the database
and templates show international characters fine).

Everything seems to work fine, with one small exception. Whenever I have
a HTML form input type=text with an international character and the form
validation fails, so the default value of the input field contains the
international character, the rest of the html document does no longer
display international characters correctly. If I remove the
international character from the input field and resubmit, everything is
displayed correctly again.

I'm guessing the browser detects that the document contains some element
that is not proper utf8, and disables utf8 altogether before displaying
whenever the input field contains an international characters.

The input field value is set in the template from the
$c->req->parameters passed in the stash.

So my question is what's the best way to handle this? Can an input value
in a form handle a utf8 encoded string at all, and if so how can I
convince it my string is utf8, and if I do does the browser detect it
automagically?

Any pointers?

Thanks,

Marius K.

Search Discussions

  • Aristotle Pagaltzis at May 5, 2008 at 2:39 pm

    * Marius Kjeldahl [2008-05-05 00:20]:
    Everything seems to work fine
    ?Seems? being the operative word.
    with one small exception. Whenever I have a HTML form input
    type=text with an international character and the form
    validation fails, so the default value of the input field
    contains the international character, the rest of the html
    document does no longer display international characters
    correctly.
    That is because all of that was not marked as character data to
    begin with. When Perl tries to concatenate it with a Unicode
    string, it sees byte strings so it decodes them as Latin-1. Then
    all the UTF-8 multibyte characters turn into gremlins.
    I'm guessing the browser detects that the document contains
    some element that is not proper utf8, and disables utf8
    altogether before displaying whenever the input field contains
    an international characters.
    You?re probably wrong about that guess. What headers do you send?

    Do you use `<meta http-equiv="Content-Type">`? (Bad idea, btw.)
    If I remove the international character from the input field
    and resubmit, everything is displayed correctly again. [?] The
    input field value is set in the template from the
    $c->req->parameters passed in the stash.
    Are you using Catalyst::Plugin::Unicode?
    So my question is what's the best way to handle this?
    Did you tell Template Toolkit or whatever template engine you use
    that the templates are in UTF-8?
    Can an input value in a form handle a utf8 encoded string at
    all Yes.
    and if so how can I convince it my string is utf8, and if I do
    does the browser detect it automagically?
    No, the headers must be set correctly.
    Any pointers?
    In addition to the above? Check out encoding::warnings.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Marius Kjeldahl at May 5, 2008 at 3:12 pm
    Problem solved. In my "View" class, like:

    package MyApp::View::TT;
    use strict; use warnings;
    use base 'Catalyst::View::TT';

    replace the last line with:

    use base 'Catalyst::View::TT::ForceUTF8';

    and everything works fine. I guess there was some confusion between
    Template Toolkit and non-utf8 stash strings or similar.

    Thanks,

    Marius K.
  • Bill Moseley at May 5, 2008 at 7:28 pm

    On Mon, May 05, 2008 at 04:12:53PM +0200, Marius Kjeldahl wrote:
    Problem solved. In my "View" class, like:

    package MyApp::View::TT;
    use strict; use warnings;
    use base 'Catalyst::View::TT';

    replace the last line with:

    use base 'Catalyst::View::TT::ForceUTF8';
    That seems like the wrong approach.

    Data should be decoded on input from the outside and encoded on
    output. I'm not sure when it would be advisable to force utf8 flag
    on items in the stash, but I have not looked at that module in a
    while.

    <form> tags should have accept-charset

    C::P::Unicode::Encoding should be used (I suggest with reservations).
    That will decode parameters and encoding output.

    If your templates are UTF8 then ENCODING => 'UTF-8' when creating TT
    object.

    Do what's required for your database to handle utf-8.

    --
    Bill Moseley
    moseley@hank.org
  • Marius Kjeldahl at May 5, 2008 at 8:21 pm

    Bill Moseley wrote:
    use base 'Catalyst::View::TT::ForceUTF8';
    That seems like the wrong approach.

    Data should be decoded on input from the outside and encoded on
    output. I'm not sure when it would be advisable to force utf8 flag
    on items in the stash, but I have not looked at that module in a
    while.

    <form> tags should have accept-charset
    I tried this but couldn't get it working correctly, which may be
    entirely my fault of course.
    C::P::Unicode::Encoding should be used (I suggest with reservations).
    That will decode parameters and encoding output.
    I looked into this and related modules trying to figure out exactly
    where to do what, which lead me to the solution posted.
    If your templates are UTF8 then ENCODING => 'UTF-8' when creating TT
    object.
    Tried this as well. Didn't work. As far as I managed to figure out, that
    solution requires the plugin you mentioned, or a similar one (possibly
    ending in Encode instead of Encoding - I'm taking this from memory while
    googling for a solution to my problem).
    Do what's required for your database to handle utf-8.
    In my case, everything is utf8. The source code (with embedded strings),
    the database and I see no reason to start juggling back and forth
    between encodings unless there is a specific need. There may be one,
    which I'm sure further testing will demonstrate, but for now I'm ok.

    Actually, I found one place where it was actually needed already. I'm
    using some of the Yahoo YUI "ajax" components which didn't work great
    with utf8, and a simple "decode" (from utf8) before returning some
    values in a ajax component seemed to solve it just. There may be flags
    that can be set in the YUI library which enable utf8 encoding also,
    which would probably be a better solution.

    Thanks,

    Marius K.
  • Bill Moseley at May 5, 2008 at 8:46 pm

    On Mon, May 05, 2008 at 09:22:19PM +0200, Marius Kjeldahl wrote:
    <form> tags should have accept-charset
    I tried this but couldn't get it working correctly, which may be
    entirely my fault of course.
    What does "couldn't get it working" mean? You couldn't get an
    accept-charset on your form tags?

    C::P::Unicode::Encoding should be used (I suggest with reservations).
    That will decode parameters and encoding output.
    I looked into this and related modules trying to figure out exactly
    where to do what, which lead me to the solution posted.
    It's just a plugin in. You add it to the use Catalyst list of
    plugins. It only decodes $c->req->parameters (failing to decode
    body_parameters, btw) and then encodes the $c->req->body in
    finalize().

    If your templates are UTF8 then ENCODING => 'UTF-8' when creating TT
    object.
    Tried this as well. Didn't work. As far as I managed to figure out, that
    solution requires the plugin you mentioned, or a similar one (possibly
    ending in Encode instead of Encoding - I'm taking this from memory while
    googling for a solution to my problem).
    Again, not sure what "didn't work" means, but it doesn't require any
    other modules -- it just says your templates should be decoded as the
    encoding you specify:

    perldoc -m Template::Provider

    search for ENCODING


    --
    Bill Moseley
    moseley@hank.org
  • Aristotle Pagaltzis at May 6, 2008 at 8:04 am

    * Bill Moseley [2008-05-05 21:40]:
    <form> tags should have accept-charset
    Browsers tend to ignore that and send the form data in the same
    encoding as the page that the form was on. Some browsers also do
    other screwy things. Overall this is an area of much hatefulness.
    For best results, <http://search.cpan.org/perldoc?Encode::HEBCI>
    is the way to go. But most of the time it?s overkill, since once
    you get your pages to be served as UTF-8 properly, you can pretty
    much forget the issue.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Bill Moseley at May 6, 2008 at 5:43 pm

    On Tue, May 06, 2008 at 09:04:38AM +0200, Aristotle Pagaltzis wrote:
    * Bill Moseley [2008-05-05 21:40]:
    <form> tags should have accept-charset
    Browsers tend to ignore that and send the form data in the same
    encoding as the page that the form was on.
    "Browsers" is a bit general.

    Yes, IE will use the HTTP Content-Type header over accpet-charset in
    the <form> tag (and over any <meta> tag as well).

    Firefox 2 will use accept-charset (even if its different from the
    HTTP charset). So, it's good to have an accept-charset and make sure
    it matches the page's Content-Type charset.

    At least, that's how I remember it.

    other screwy things. Overall this is an area of much hatefulness.
    For best results, <http://search.cpan.org/perldoc?Encode::HEBCI>
    is the way to go. But most of the time it?s overkill, since once
    you get your pages to be served as UTF-8 properly, you can pretty
    much forget the issue.
    That's what I do -- I set the Content-Type, meta http-equiv,
    and accept-charset on the form all to utf-8. Any browser that screws
    that up likely isn't supported in other ways, too.

    --
    Bill Moseley
    moseley@hank.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcatalyst @
categoriescatalyst, perl
postedMay 4, '08 at 11:09p
activeMay 6, '08 at 5:43p
posts8
users3
websitecatalystframework.org
irc#catalyst

People

Translate

site design / logo © 2021 Grokbase