FAQ
Following up on a conversion I started on the DateTime mailing-list I'd like to
ask if it is really neccessary to use C::P::Unicode if a site uses
utf8-encoding?

I have the problem that up until now everything worked absolutely fine without
C::P::Unicode, Template::Stash::ForceUTF8, Template::Provider::Encoding or any
other unicode plugin because I believed that if everything is utf8 you don't
really have to worry about it that much.

Now I recently incorporated DateTime::Locale to get a list of localized month
names. Spitting them out in my templates revealed a <questionmark> symbol
instead of all german umlauts. I took a look at DateTime::Locale and everything
seems to be correct (use utf8 at the top, etc) so this can't be the culprit.
encode("utf8")-ing the month names makes them look correct. I asked about this
on the DateTime mailing list and everybody suggested a truckload of plugins to
incorporate in Catalyst which _ALL_ break everything else on my site except the
month names which are displayed fine then. It looks like everything gets
encoded twice when utilizing these plugins.

So I must admit I'm stuck with this. What is the best-practice for dealing with
Catalyst and utf8? Do I really need C::P::Unicode to make this work correctly?
What about the various TT plugins? And why the heck is everything double utf8
encoded when using these plugins that everybody else seems to use?

Thanks a lot for any input!

--Tobias

Search Discussions

  • Pedro Melo at Aug 9, 2007 at 10:11 am
    Hi,
    On Aug 9, 2007, at 9:27 AM, Tobias Kremer wrote:

    Following up on a conversion I started on the DateTime mailing-list
    I'd like to
    ask if it is really neccessary to use C::P::Unicode if a site uses
    utf8-encoding?

    I have the problem that up until now everything worked absolutely
    fine without
    C::P::Unicode, Template::Stash::ForceUTF8,
    Template::Provider::Encoding or any
    other unicode plugin because I believed that if everything is utf8
    you don't
    really have to worry about it that much.

    Now I recently incorporated DateTime::Locale to get a list of
    localized month
    names. Spitting them out in my templates revealed a <questionmark>
    symbol
    instead of all german umlauts. I took a look at DateTime::Locale
    and everything
    seems to be correct (use utf8 at the top, etc) so this can't be the
    culprit.
    encode("utf8")-ing the month names makes them look correct. I asked
    about this
    on the DateTime mailing list and everybody suggested a truckload of
    plugins to
    incorporate in Catalyst which _ALL_ break everything else on my
    site except the
    month names which are displayed fine then. It looks like everything
    gets
    encoded twice when utilizing these plugins.

    So I must admit I'm stuck with this. What is the best-practice for
    dealing with
    Catalyst and utf8? Do I really need C::P::Unicode to make this work
    correctly?
    What about the various TT plugins? And why the heck is everything
    double utf8
    encoded when using these plugins that everybody else seems to use?
    I must admit that I'm in the "it works for me but don't ask me why"-
    camp at the moment.

    A combination of:

    * DBIC and making sure output from text fields is UTF8
    (utf8_columns() is your friend, but depending on the DB, you might
    need more than that);
    * Catalyst::View::TT::ForceUTF8;
    * Catalyst::Plugin::Unicode

    with those three, and making sure my TT templates are in UTF8 (no BOM
    was injured) I haven't had any problems, even with DateTime with a
    PT_pt locale.

    Also if you use open(), make sure you use the three argument version,
    and stick '<:utf8' in the middle arg.

    This works for *me*, but I haven't had the time to understand it
    totally. And I would love someone to tell me that there is a simpler
    way.

    Sincerely I would hope Cat to be utf8-by-default around version 6...

    Best regards,
    --
    Pedro Melo
    Blog: http://www.simplicidade.org/notes/
    XMPP ID: melo@simplicidade.org
    Use XMPP!
  • Matt Lawrence at Aug 9, 2007 at 11:07 am

    Pedro Melo wrote:
    Hi,
    On Aug 9, 2007, at 9:27 AM, Tobias Kremer wrote:

    Following up on a conversion I started on the DateTime mailing-list
    I'd like to
    ask if it is really neccessary to use C::P::Unicode if a site uses
    utf8-encoding?
    Also if you use open(), make sure you use the three argument version,
    and stick '<:utf8' in the middle arg.
    It's probably worth mentioning that you can use the open pragma to do
    this automatically.

    use open ':utf8';

    You can also use the -C switch to perl or the PERL_UNICODE environment
    variable to set the default PerlIO layer to UTF-8 for handles. A value
    of 31 should cause all handles to be flagged, including the three
    standard handles.

    See perldoc open and man perlrun for more details about these.

    Matt
  • Jonathan T. Rockway at Aug 9, 2007 at 10:28 am

    On Thu, Aug 09, 2007 at 10:27:27AM +0200, Tobias Kremer wrote:
    I have the problem that up until now everything worked absolutely fine without
    C::P::Unicode, Template::Stash::ForceUTF8, Template::Provider::Encoding or any
    ForceUTF8 is a hack hack hack. If your program doesn't work without it,
    it's completely broken.

    C::P::Unicode is necessary though. If you don't know why, read the
    source and the various perl unicode manpages. There's a lot to
    understand, and I've explained it too many times to care anymore.
    Google it.

    The gist of it is this:

    <outside octets> -> Encode::decode(...) -> <manipulate it>
    -> Encode::encode(...) -> <the user>

    If you do manipulation before you decode, your app will break. If you
    do manipulation after you encode, your app will break.

    Anyway, a whole article about unicode and Catalyst is here:

    http://www.catalystframework.org/calendar/2006/21

    Learn and enjoy.

    Regards,
    Jonathan Rockway
  • Jonathan T. Rockway at Aug 9, 2007 at 10:34 am

    On Thu, Aug 09, 2007 at 10:27:27AM +0200, Tobias Kremer wrote:

    month names which are displayed fine then. It looks like everything gets
    encoded twice when utilizing these plugins.
    OK, so I changed my mind and I'll be a bit nicer. :)

    This is exactly the problem. Currently, your "unicode" data is
    sitting in memory as a bunch of octets. If you read in octets and
    then spit those out, things will appear to work.

    The problem is that the Locale data you have is properly encoded as
    Perl characters. When you concatenate those characters with your
    octets, the octet data is treated as latin-1 and then converted to
    utf8. Since the data is utf8 and not latin-1, you get your
    double-encoded junk. This is why you need to decode() your data
    before you use it inside Perl.

    Try this:

    $ recode latin-1..utf8
    <type in some utf8>

    You'll notice the familiar double-encoded junk. It turns out that
    this is exactly what Perl is doing, because that's what you're telling
    it to do.
    So I must admit I'm stuck with this. What is the best-practice for dealing with
    The best practice is to tell your program what you want it to do,
    rather than just type stuff and hope it works :)

    Regards,
    Jonathan Rockway
  • Tatsuhiko Miyagawa at Aug 9, 2007 at 10:40 am

    On 8/9/07, Tobias Kremer wrote:
    I have the problem that up until now everything worked absolutely fine without
    C::P::Unicode, Template::Stash::ForceUTF8, Template::Provider::Encoding or any
    other unicode plugin because I believed that if everything is utf8 you don't
    really have to worry about it that much.
    No, you need to. Because DateTime::Locale month_names are utf8-flagged
    (meaning Perl knows that it's a string properly decoded to Unicode)
    but Catalyst request parameters are from HTTP request which Catalyst
    doesn't know which encoding it's encoded in, until you use modules
    like C::P::Unicode.

    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.

    Concatinating utf-8 flagged variables with utf-8 encoded byte string
    causes automatic SV upgrade, which causes double utf-8 encoded string.

    You might want to look at the manpages of encoding::warnings and perlunitut.

    I have a couple of hacks to workaround that, like
    Template::Stash::ForceUTF8 that you mentioned, and
    Encode::DoubleEncodedUTF8 is probably the most evil one, that "fixes"
    the double-encoded utf-8 strings back to what you mean. Too evil to
    use on production but would be still useful to catch bugs like that in
    testing.

    HTH,

    --
    Tatsuhiko Miyagawa
  • Pedro Melo at Aug 9, 2007 at 11:24 am
    Hi,
    On Aug 9, 2007, at 10:40 AM, Tatsuhiko Miyagawa wrote:

    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.
    hmms.. Is there a third way, just telling TT that all my templates
    are in UTF8? Setting the BOM is not easy with some editors.

    Best regards,
    --
    Pedro Melo
    Blog: http://www.simplicidade.org/notes/
    XMPP ID: melo@simplicidade.org
    Use XMPP!
  • Jonas Alves at Aug 9, 2007 at 12:27 pm

    On 09/08/07, Pedro Melo wrote:
    Hi,
    On Aug 9, 2007, at 10:40 AM, Tatsuhiko Miyagawa wrote:

    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.
    hmms.. Is there a third way, just telling TT that all my templates
    are in UTF8? Setting the BOM is not easy with some editors.
    Looking at Template::Provider::Encoding description:

    "Template::Provider::Encoding is a Template Provider subclass to
    decode template using its declaration. You have to declare encoding of
    the template in the head (1st line) of template using (fake) encoding
    TT plugin. Otherwise the template is handled as utf-8."

    So if you want utf8 you just need to use T::P::E and don't need to
    explicitly specify the encoding.

    --
    Jonas
  • Tobias Kremer at Aug 9, 2007 at 12:13 pm

    Zitat von Tatsuhiko Miyagawa <miyagawa@gmail.com>:
    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.
    So you're saying that there's no sane way using TT without
    Template::Provider::Encoding (or something similar)? I admit I haven't really
    grasped this whole Unicode issue fully yet (and I don't seem to be the only one
    as it's causing trouble for lots of people) but this really should be more DWIM
    out of the box if possible ... Especially since utf8 is becoming the de-facto
    standard for encoding everything should "just" work. Yeah, I know, I'm naive :)
    Concatinating utf-8 flagged variables with utf-8 encoded byte string
    causes automatic SV upgrade, which causes double utf-8 encoded string.
    Hmmm. So my templates are utf8 _ENCODED_ and the strings coming in from other
    perl modules are just utf8 _FLAGGED_. When TT concats them together during
    process() the result is wrecked because of the automatic upgrade. Correct?
    You might want to look at the manpages of encoding::warnings and perlunitut.
    Just quickly tried out encoding::warnings which outputs nothing at all but I'll
    give it a closer look.
    I have a couple of hacks to workaround that, like
    Template::Stash::ForceUTF8 that you mentioned, and
    Encode::DoubleEncodedUTF8 is probably the most evil one, that "fixes"
    the double-encoded utf-8 strings back to what you mean. Too evil to
    use on production but would be still useful to catch bugs like that in
    testing.
    Hacks are not an option here :) And there must be a "right" way to do it.

    --Tobias
  • Aristotle Pagaltzis at Aug 11, 2007 at 5:29 pm

    * Tobias Kremer [2007-08-10 12:41]:
    Zitat von Tatsuhiko Miyagawa <miyagawa@gmail.com>:
    Concatinating utf-8 flagged variables with utf-8 encoded byte
    string causes automatic SV upgrade, which causes double utf-8
    encoded string.
    Hmmm. So my templates are utf8 _ENCODED_ and the strings coming
    in from other perl modules are just utf8 _FLAGGED_. When TT
    concats them together during process() the result is wrecked
    because of the automatic upgrade. Correct?
    Forget the fact that they are UTF-8 flagged. Think of it this
    way: Perl has two kinds of strings, byte strings and character
    strings.

    Byte strings consist of, well, bytes; they might be text, or
    maybe they?re not. If they are, they are _encoded_; to understand
    the text you have to _decode_ the byte sequence to characters.
    This notion may seem weird if you haven?t dealt with Unicode in
    depth, because most character sets use 255 characters, which they
    just represent using a single byte. But if you have more than 255
    characters (and Unicode has a lot more), then suddenly you have
    to pick some way to represent the character codes. A sequence of
    bytes alone is meaningless as text until you know what encoding
    it?s in.

    Character strings, OTOH, consist of Unicode characters; pure,
    ideal, atomic characters that have no particular representation.
    Of course the interpreter has to store these ideal characters
    somehow, so it uses UTF-8 internally; but that could equally well
    be UTF-16 or UCS-4 or for that matter ASCII plus XML entities.

    For deeper exposition of the concepts (what is an ideal character
    and how does it relate to encodings), read Joel Spolsky?s classic
    article:

    The Absolute Minimum Every Software Developer Absolutely,
    Positively Must Know About Unicode and Character Sets (No
    Excuses!)
    http://www.joelonsoftware.com/articles/Unicode.html

    Anyway, the problem you are seeing is that as long as you stay in
    one realm, things will work.

    F.ex., if you mix byte strings, and the bytes represent text
    encoded with the same encoding in both strings, you can mix them
    just fine. Note though that with multibyte or variable-width
    encodings (eg. UCS-2 and UTF-8 respectively), you will have to be
    careful to take the encoding into account in every string
    mutation. F.ex. if you truncate post titles for display in a
    sidebar, you will have to manually take care not to cut off the
    string off in the middle of a three-byte character.

    Likewise, the strings are both character strings, then you can
    mix them no problem. And because they consist of pure ideal
    characters, any operations on them treat characters as atomic.
    You do not need to care whether a character is one, two, three or
    however many bytes in the internal representation used by Perl;
    you can just truncate strings or run substitutions on them etc
    without worrying.

    But if you mix byte strings and character strings, there is
    trouble. Perl must find out what characters are in the byte
    string, so it must decode it. By default it does so by assuming
    that byte strings are text encoded in ISO-8859-1. If this is the
    wrong encoding, because, say, your data was actually
    UTF-8-encoded ? well, oops: now you have UTF-8 that was decoded
    as ISO-8859-1, which leads to the well-known artifacts.

    Note, however, that you can change the default using the
    `encoding` pragma. See `perldoc encoding`.

    If the program code itself is in UTF-8, you may want to declare
    that also: see `perldoc utf8`.

    And finally ? see `perldoc perlunicode`.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Carl Franks at Aug 13, 2007 at 1:22 pm
    Aristotle, thanks for your input - as soon as I saw your name in this
    thread, I knew to sit up and take notice :)

    It's taken a few hours, but I've figured out what's causing the
    specific problems Tobias was having with some parts of the page being
    double-encoded.

    In a nut-shell, it's YAML::Syck's fault!
    From what I can make of the XS in YAML::Syck, it's messing with the utf8 flags.
    So even if I pass YAML::Syck::LoadFile() a filehandle opened using the
    correct encoding, the data returned is either encoded again or
    incorrectly flagged (I'm not sure which).

    If I then concatenate a string from YAML::Syck, with a string from a
    correctly decoded filehandle, then the portion from YAML::Syck gets
    double-encoded by Catalyst::Plugin::Unicode::finalize() doing
    utf8::encode().

    I've tried using YAML::XS instead of YAML::Syck, but that also
    produces the same broken result.

    For the moment, I can fix Tobias' problem by making HTML-FormFu
    decode() all strings coming from a YAML file - but this is definitely
    a temporary hack - I'll open an RT ticket on YAML::Syck and see what
    the authors think.

    Carl
  • Tatsuhiko Miyagawa at Aug 13, 2007 at 7:02 pm
    Try $YAML::Syck::ImplicitUnicode = 1?
    On 8/13/07, Carl Franks wrote:
    Aristotle, thanks for your input - as soon as I saw your name in this
    thread, I knew to sit up and take notice :)

    It's taken a few hours, but I've figured out what's causing the
    specific problems Tobias was having with some parts of the page being
    double-encoded.

    In a nut-shell, it's YAML::Syck's fault!

    --
    Tatsuhiko Miyagawa
  • Carl Franks at Aug 14, 2007 at 8:52 am

    On 13/08/07, Tatsuhiko Miyagawa wrote:
    Try $YAML::Syck::ImplicitUnicode = 1?
    Thanks, that solved the problem.
    From reading the docs, I had thought that option did the opposite of
    what I was wanting, but then obviously I still have a lot to learn
    about unicode ;)

    Cheers,
    Carl
  • Tobias Kremer at Aug 9, 2007 at 1:51 pm

    Zitat von Tatsuhiko Miyagawa <miyagawa@gmail.com>:
    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.
    I just found the (undocumented?) "ENCODING" configuration option in
    Template::Provider. Setting this to "utf-8" makes the templates appear
    correctly with C::P::Unicode loaded. Is there still a need for
    Template::Provider::Encoding?

    --Tobias
  • Bill Moseley at Aug 9, 2007 at 3:18 pm

    On Thu, Aug 09, 2007 at 02:56:31PM +0200, Tobias Kremer wrote:
    I just found the (undocumented?) "ENCODING" configuration option in
    Template::Provider. Setting this to "utf-8" makes the templates appear
    correctly with C::P::Unicode loaded. Is there still a need for
    Template::Provider::Encoding?
    In config.yml

    View::TT
    ENCODING: UTF-8

    Template provider will see you are running a modern Perl (UNICODE flag
    in provider) and then look for a Byte Order Mark. If not found it
    will then decode your content based on the ENCODING setting.

    No, you don't need Template::Provider::Encoding if you only have one
    encoding in your templates.

    Yes you need Unicode (or the older Unicode::Encoding) plugin so that
    input params are decoded and output is encoded back to utf8.

    --
    Bill Moseley
    moseley@hank.org
  • Tobias Kremer at Aug 9, 2007 at 3:38 pm

    View::TT
    ENCODING: UTF-8

    Template provider will see you are running a modern Perl (UNICODE flag
    in provider) and then look for a Byte Order Mark. If not found it
    will then decode your content based on the ENCODING setting.

    No, you don't need Template::Provider::Encoding if you only have one
    encoding in your templates.

    Yes you need Unicode (or the older Unicode::Encoding) plugin so that
    input params are decoded and output is encoded back to utf8.
    Yes, it all starts to make sense :) Thanks for all the great clarifications.
    MyApp is working fine now with C::P::Unicode and the ENCODING setting. By the
    way, does anybody know why the ENCODING option is undocumented? IMHO, it really
    should be mentioned in the Catalyst::Manual alongside some best-practice for
    Unicode. Use C::P::Unicode and ENCODING: UTF-8 should be enough for most people
    ...

    --Tobias
  • Matt S Trout at Aug 9, 2007 at 4:03 pm

    On Thu, Aug 09, 2007 at 04:42:53PM +0200, Tobias Kremer wrote:
    View::TT
    ENCODING: UTF-8

    Template provider will see you are running a modern Perl (UNICODE flag
    in provider) and then look for a Byte Order Mark. If not found it
    will then decode your content based on the ENCODING setting.

    No, you don't need Template::Provider::Encoding if you only have one
    encoding in your templates.

    Yes you need Unicode (or the older Unicode::Encoding) plugin so that
    input params are decoded and output is encoded back to utf8.
    Yes, it all starts to make sense :) Thanks for all the great clarifications.
    MyApp is working fine now with C::P::Unicode and the ENCODING setting. By the
    way, does anybody know why the ENCODING option is undocumented? IMHO, it really
    should be mentioned in the Catalyst::Manual alongside some best-practice for
    Unicode. Use C::P::Unicode and ENCODING: UTF-8 should be enough for most people
    ...
    Because (1) we don't control the TT docs, (2) you haven't written the doc
    patch for the Catalyst docs yet. I agree that you should patch the main manual
    as well as View::TT docs though :)

    --
    Matt S Trout Need help with your Catalyst or DBIx::Class project?
    Technical Director Want a managed development or deployment platform?
    Shadowcat Systems Ltd. Contact mst (at) shadowcatsystems.co.uk for a quote
    http://chainsawblues.vox.com/ http://www.shadowcat.co.uk/
  • Tatsuhiko Miyagawa at Aug 9, 2007 at 7:14 pm

    On 8/9/07, Tobias Kremer wrote:
    View::TT
    ENCODING: UTF-8

    Template provider will see you are running a modern Perl (UNICODE flag
    in provider) and then look for a Byte Order Mark. If not found it
    will then decode your content based on the ENCODING setting.

    No, you don't need Template::Provider::Encoding if you only have one
    encoding in your templates.

    Yes you need Unicode (or the older Unicode::Encoding) plugin so that
    input params are decoded and output is encoded back to utf8.
    Yes, it all starts to make sense :) Thanks for all the great clarifications.
    MyApp is working fine now with C::P::Unicode and the ENCODING setting. By the
    way, does anybody know why the ENCODING option is undocumented? IMHO, it really
    should be mentioned in the Catalyst::Manual alongside some best-practice for
    Unicode. Use C::P::Unicode and ENCODING: UTF-8 should be enough for most people
    ...
    IIRC, ENCODING options was added right after I released
    Template::Provider::Encoding. No idea why it's still undocumented
    (consult the template-toolkit mailing list :), but yeah, if it's a
    single encoding you use in your templates, there's no need to use
    T::P::Encoding.

    --
    Tatsuhiko Miyagawa
  • Matt S Trout at Aug 9, 2007 at 3:32 pm

    On Thu, Aug 09, 2007 at 02:56:31PM +0200, Tobias Kremer wrote:
    Zitat von Tatsuhiko Miyagawa <miyagawa@gmail.com>:
    Similarly even if your templates are encoded in utf-8,
    Template-Toolkit doesn't know which encoding they are in, until you
    set BOM to your templates or use Template::Provider::Encoding to
    explicitly specify the encoding to decode the template.
    I just found the (undocumented?) "ENCODING" configuration option in
    Template::Provider. Setting this to "utf-8" makes the templates appear
    correctly with C::P::Unicode loaded. Is there still a need for
    Template::Provider::Encoding?
    Ooh. doc patch to View::TT ?

    --
    Matt S Trout Need help with your Catalyst or DBIx::Class project?
    Technical Director Want a managed development or deployment platform?
    Shadowcat Systems Ltd. Contact mst (at) shadowcatsystems.co.uk for a quote
    http://chainsawblues.vox.com/ http://www.shadowcat.co.uk/
  • Carl Franks at Aug 9, 2007 at 11:01 am

    On 09/08/07, Tobias Kremer wrote:
    Following up on a conversion I started on the DateTime mailing-list I'd like to
    ask if it is really neccessary to use C::P::Unicode if a site uses
    utf8-encoding?

    I have the problem that up until now everything worked absolutely fine without
    C::P::Unicode, Template::Stash::ForceUTF8, Template::Provider::Encoding or any
    other unicode plugin because I believed that if everything is utf8 you don't
    really have to worry about it that much.

    Now I recently incorporated DateTime::Locale to get a list of localized month
    names. Spitting them out in my templates revealed a <questionmark> symbol
    instead of all german umlauts. I took a look at DateTime::Locale and everything
    seems to be correct (use utf8 at the top, etc) so this can't be the culprit.
    encode("utf8")-ing the month names makes them look correct. I asked about this
    on the DateTime mailing list and everybody suggested a truckload of plugins to
    incorporate in Catalyst which _ALL_ break everything else on my site except the
    month names which are displayed fine then. It looks like everything gets
    encoded twice when utilizing these plugins.
    Tobias,

    I tried jrock's advice of adding C::P::Unicode to the Cat app you sent
    me a couple days ago - and it does fix the encoding problem.

    Carl
  • Tobias Kremer at Aug 9, 2007 at 11:31 am

    Tobias,

    I tried jrock's advice of adding C::P::Unicode to the Cat app you sent
    me a couple days ago - and it does fix the encoding problem.
    I also did that but it only works in some cases. Try adding a block element to
    the FormFu YAML file (or a comment for the date element) that contains some
    Umlauts and they will get double-encoded even with Unicode loaded. Most of my
    site now works with the Unicode plugin loaded but still some pages with FormFu
    forms are double-encoded - not all of them though.

    --Tobias
  • Daisuke Maki at Aug 9, 2007 at 1:53 pm
    I've been doing Japanese web pages for such a long time now, and yet I
    still can't claim to have a full grasp of things.

    But anyways, what I do is:

    * All my templates are in unicode (open documents
    with set enc=utf-8, NO BOM), as well as my FormFu
    config files.
    * I do NOT use UTF8Columns for my DBIx::Class schema
    * I do NOT use things like C::P::Unicode, T::P::Encoding

    I think that means all of my data is in octets, so I have to jump hoops
    when I want to enforce Unicode semantics, such as truncating a string to
    X number of characteers, as opposed to bytes. Other than that little
    "detail", this is working fine for me.

    I suppose it's a hack, but I couldn't get things to really work the
    other way around. I don't thinks this is the best practice at all, but
    just so you know ...

    --d



    Tobias Kremerwrote:
    Following up on a conversion I started on the DateTime mailing-list I'd like to
    ask if it is really neccessary to use C::P::Unicode if a site uses
    utf8-encoding?

    I have the problem that up until now everything worked absolutely fine without
    C::P::Unicode, Template::Stash::ForceUTF8, Template::Provider::Encoding or any
    other unicode plugin because I believed that if everything is utf8 you don't
    really have to worry about it that much.

    Now I recently incorporated DateTime::Locale to get a list of localized month
    names. Spitting them out in my templates revealed a <questionmark> symbol
    instead of all german umlauts. I took a look at DateTime::Locale and everything
    seems to be correct (use utf8 at the top, etc) so this can't be the culprit.
    encode("utf8")-ing the month names makes them look correct. I asked about this
    on the DateTime mailing list and everybody suggested a truckload of plugins to
    incorporate in Catalyst which _ALL_ break everything else on my site except the
    month names which are displayed fine then. It looks like everything gets
    encoded twice when utilizing these plugins.

    So I must admit I'm stuck with this. What is the best-practice for dealing with
    Catalyst and utf8? Do I really need C::P::Unicode to make this work correctly?
    What about the various TT plugins? And why the heck is everything double utf8
    encoded when using these plugins that everybody else seems to use?

    Thanks a lot for any input!

    --Tobias

    _______________________________________________
    List: Catalyst@lists.rawmode.org
    Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
    Searchable archive: http://www.mail-archive.com/catalyst@lists.rawmode.org/
    Dev site: http://dev.catalyst.perl.org/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcatalyst @
categoriescatalyst, perl
postedAug 9, '07 at 9:22a
activeAug 14, '07 at 8:52a
posts22
users11
websitecatalystframework.org
irc#catalyst

People

Translate

site design / logo © 2022 Grokbase