FAQ
Edit report at http://pear.php.net/bugs/bug.php?id=18876&edit=1

ID: 18876
Comment by: [email protected]
Reported By: pear at dlopez dot com
Summary: getting attachment filename in the desired charset
Status: Open
Type: Feature/Change Request
Package: Mail_mimeDecode
Operating System: linux
Package Version: SVN
PHP Version: 5.3.8
Roadmap Versions:
New Comment:

My bad--the iconv needs to apply to both B and Q cases, so the iconv
call should be on line 747 right before the str_replace.


Previous Comments:
------------------------------------------------------------------------

[2011-09-26 20:41:25] dlopez

Description:
------------
When decoding an email to extract file attachments,
d_parameters['filename'] is returned in the original charset (eg- KOI8).
Even though mimeDecode parses the header string and knows the original
charset (see line 731), the calling function has no access to the
charset of that return value, so it can not be reliably translated to
another desired charset (eg- UTF-8). If you add the following line
between line 737 and 738:

$text = iconv($charset, iconv_get_encoding('output_encoding'), $text);

the calling function can affect the charset of the filename decoded by
mimeDecode simply by setting the desired output charset, via:

iconv_set_encoding("output_encoding", "UTF-8");

Surely I'm not the only person grappling with processing filenames in
unpredictable charsets--is there a reason this has not been done before?
Have other approaches been considered?

------------------------------------------------------------------------

Search Discussions

  • Pear at Sep 27, 2011 at 7:57 am
    Edit report at http://pear.php.net/bugs/bug.php?id=18876&edit=1

    ID: 18876
    Comment by: [email protected]
    Reported By: pear at dlopez dot com
    Summary: getting attachment filename in the desired charset
    Status: Open
    Type: Feature/Change Request
    Package: Mail_mimeDecode
    Operating System: linux
    Package Version: SVN
    PHP Version: 5.3.8
    Roadmap Versions:
    New Comment:

    One more clarification: I understand that you can set
    decode_headers=false so that you can do the decoding yourself... but
    then what's the point to ever setting decode_headers=true if you can't
    trust that the return value will be in an expected charset? Wouldn't it
    make more sense if decode_headers was a charset string that
    _decodeHeader() would use to iconv() the output? And then put in a case
    whereby if decode_headers=null, then it would skip any decoding (and
    conversion)?


    Previous Comments:
    ------------------------------------------------------------------------

    [2011-09-26 21:16:56] dlopez

    My bad--the iconv needs to apply to both B and Q cases, so the iconv
    call should be on line 747 right before the str_replace.

    ------------------------------------------------------------------------

    [2011-09-26 20:41:25] dlopez

    Description:
    ------------
    When decoding an email to extract file attachments,
    d_parameters['filename'] is returned in the original charset (eg- KOI8).
    Even though mimeDecode parses the header string and knows the original
    charset (see line 731), the calling function has no access to the
    charset of that return value, so it can not be reliably translated to
    another desired charset (eg- UTF-8). If you add the following line
    between line 737 and 738:

    $text = iconv($charset, iconv_get_encoding('output_encoding'), $text);

    the calling function can affect the charset of the filename decoded by
    mimeDecode simply by setting the desired output charset, via:

    iconv_set_encoding("output_encoding", "UTF-8");

    Surely I'm not the only person grappling with processing filenames in
    unpredictable charsets--is there a reason this has not been done before?
    Have other approaches been considered?

    ------------------------------------------------------------------------
  • Alan at Sep 27, 2011 at 8:01 am
    Edit report at http://pear.php.net/bugs/bug.php?id=18876&edit=1

    ID: 18876
    Updated by: [email protected]
    Reported By: pear at dlopez dot com
    Summary: getting attachment filename in the desired charset
    Status: Open
    Type: Feature/Change Request
    Package: Mail_mimeDecode
    Operating System: linux
    Package Version: SVN
    PHP Version: 5.3.8
    Roadmap Versions:
    New Comment:

    looks like it should be optional (so as not to break BC) - (however I
    guess it is the recommended setting..)


    Previous Comments:
    ------------------------------------------------------------------------

    [2011-09-27 09:59:04] dlopez

    One more clarification: I understand that you can set
    decode_headers=false so that you can do the decoding yourself... but
    then what's the point to ever setting decode_headers=true if you can't
    trust that the return value will be in an expected charset? Wouldn't it
    make more sense if decode_headers was a charset string that
    _decodeHeader() would use to iconv() the output? And then put in a case
    whereby if decode_headers=null, then it would skip any decoding (and
    conversion)?

    ------------------------------------------------------------------------

    [2011-09-26 21:16:56] dlopez

    My bad--the iconv needs to apply to both B and Q cases, so the iconv
    call should be on line 747 right before the str_replace.

    ------------------------------------------------------------------------

    [2011-09-26 20:41:25] dlopez

    Description:
    ------------
    When decoding an email to extract file attachments,
    d_parameters['filename'] is returned in the original charset (eg- KOI8).
    Even though mimeDecode parses the header string and knows the original
    charset (see line 731), the calling function has no access to the
    charset of that return value, so it can not be reliably translated to
    another desired charset (eg- UTF-8). If you add the following line
    between line 737 and 738:

    $text = iconv($charset, iconv_get_encoding('output_encoding'), $text);

    the calling function can affect the charset of the filename decoded by
    mimeDecode simply by setting the desired output charset, via:

    iconv_set_encoding("output_encoding", "UTF-8");

    Surely I'm not the only person grappling with processing filenames in
    unpredictable charsets--is there a reason this has not been done before?
    Have other approaches been considered?

    ------------------------------------------------------------------------
  • Alan at Sep 27, 2011 at 8:19 am
    Edit report at http://pear.php.net/bugs/bug.php?id=18876&edit=1

    ID: 18876
    Updated by: [email protected]
    Reported By: pear at dlopez dot com
    Summary: getting attachment filename in the desired charset
    -Status: Open
    +Status: Closed
    Type: Feature/Change Request
    Package: Mail_mimeDecode
    Operating System: linux
    Package Version: SVN
    PHP Version: 5.3.8
    -Assigned To:
    +Assigned To: alan_k
    Roadmap Versions:
    New Comment:

    -Status: Open
    +Status: Closed
    -Assigned To:
    +Assigned To: alan_k
    This bug has been fixed in SVN.

    If this was a documentation problem, the fix will appear on pear.php.net
    by the end of next Sunday (CET).

    If this was a problem with the pear.php.net website, the change should
    be live shortly.

    Otherwise, the fix will appear in the package's next release.

    Thank you for the report and for helping us make PEAR better.

    Can you test the changed code.

    http://svn.php.net/viewvc/pear/packages/Mail_Mime/trunk/mimeDecode.php?
    r1=317378&r2=317377&pathrev=317378&view=patch


    Previous Comments:
    ------------------------------------------------------------------------

    [2011-09-27 10:02:25] alan_k

    looks like it should be optional (so as not to break BC) - (however I
    guess it is the recommended setting..)

    ------------------------------------------------------------------------

    [2011-09-27 09:59:04] dlopez

    One more clarification: I understand that you can set
    decode_headers=false so that you can do the decoding yourself... but
    then what's the point to ever setting decode_headers=true if you can't
    trust that the return value will be in an expected charset? Wouldn't it
    make more sense if decode_headers was a charset string that
    _decodeHeader() would use to iconv() the output? And then put in a case
    whereby if decode_headers=null, then it would skip any decoding (and
    conversion)?

    ------------------------------------------------------------------------

    [2011-09-26 21:16:56] dlopez

    My bad--the iconv needs to apply to both B and Q cases, so the iconv
    call should be on line 747 right before the str_replace.

    ------------------------------------------------------------------------

    [2011-09-26 20:41:25] dlopez

    Description:
    ------------
    When decoding an email to extract file attachments,
    d_parameters['filename'] is returned in the original charset (eg- KOI8).
    Even though mimeDecode parses the header string and knows the original
    charset (see line 731), the calling function has no access to the
    charset of that return value, so it can not be reliably translated to
    another desired charset (eg- UTF-8). If you add the following line
    between line 737 and 738:

    $text = iconv($charset, iconv_get_encoding('output_encoding'), $text);

    the calling function can affect the charset of the filename decoded by
    mimeDecode simply by setting the desired output charset, via:

    iconv_set_encoding("output_encoding", "UTF-8");

    Surely I'm not the only person grappling with processing filenames in
    unpredictable charsets--is there a reason this has not been done before?
    Have other approaches been considered?

    ------------------------------------------------------------------------
  • Pear at Sep 27, 2011 at 10:05 am
    Edit report at http://pear.php.net/bugs/bug.php?id=18876&edit=1

    ID: 18876
    Comment by: [email protected]
    Reported By: pear at dlopez dot com
    Summary: getting attachment filename in the desired charset
    Status: Closed
    Type: Feature/Change Request
    Package: Mail_mimeDecode
    Operating System: linux
    Package Version: SVN
    PHP Version: 5.3.8
    Assigned To: alan_k
    Roadmap Versions:
    New Comment:

    Holy cow, you're a superhero. I thought this feature request would be
    on a back-burner (or perhaps had some simple workaround that wasn't
    clear to me). Kudos to you.

    Preliminary testing with KOI8 content (and with decode_headers=false)
    looks good, with two notes:

    1) The second argument to _decodeHeaders appears to be a misspelling of
    $default_charset. It's not broken though because you misspelled it the
    same way in both places.

    2) For greater robustness you might want to check if iconv fails and
    returns false. For example, if the charset passed via decode_headers is
    invalid or not supported (I set it to 'foobar' to test), mimeDecode now
    returns an empty string, which might catch people off-guard. If iconv
    returns false you may want to leave the value either undecoded or else
    do the straight decoding as was done prior to this patch. I suppose the
    latter choice is more backward compatible.


    Previous Comments:
    ------------------------------------------------------------------------

    [2011-09-27 10:20:41] alan_k

    -Status: Open
    +Status: Closed
    -Assigned To:
    +Assigned To: alan_k
    This bug has been fixed in SVN.

    If this was a documentation problem, the fix will appear on pear.php.net
    by the end of next Sunday (CET).

    If this was a problem with the pear.php.net website, the change should
    be live shortly.

    Otherwise, the fix will appear in the package's next release.

    Thank you for the report and for helping us make PEAR better.

    Can you test the changed code.

    http://svn.php.net/viewvc/pear/packages/Mail_Mime/trunk/mimeDecode.php?
    r1=317378&r2=317377&pathrev=317378&view=patch

    ------------------------------------------------------------------------

    [2011-09-27 10:02:25] alan_k

    looks like it should be optional (so as not to break BC) - (however I
    guess it is the recommended setting..)

    ------------------------------------------------------------------------

    [2011-09-27 09:59:04] dlopez

    One more clarification: I understand that you can set
    decode_headers=false so that you can do the decoding yourself... but
    then what's the point to ever setting decode_headers=true if you can't
    trust that the return value will be in an expected charset? Wouldn't it
    make more sense if decode_headers was a charset string that
    _decodeHeader() would use to iconv() the output? And then put in a case
    whereby if decode_headers=null, then it would skip any decoding (and
    conversion)?

    ------------------------------------------------------------------------

    [2011-09-26 21:16:56] dlopez

    My bad--the iconv needs to apply to both B and Q cases, so the iconv
    call should be on line 747 right before the str_replace.

    ------------------------------------------------------------------------

    [2011-09-26 20:41:25] dlopez

    Description:
    ------------
    When decoding an email to extract file attachments,
    d_parameters['filename'] is returned in the original charset (eg- KOI8).
    Even though mimeDecode parses the header string and knows the original
    charset (see line 731), the calling function has no access to the
    charset of that return value, so it can not be reliably translated to
    another desired charset (eg- UTF-8). If you add the following line
    between line 737 and 738:

    $text = iconv($charset, iconv_get_encoding('output_encoding'), $text);

    the calling function can affect the charset of the filename decoded by
    mimeDecode simply by setting the desired output charset, via:

    iconv_set_encoding("output_encoding", "UTF-8");

    Surely I'm not the only person grappling with processing filenames in
    unpredictable charsets--is there a reason this has not been done before?
    Have other approaches been considered?

    ------------------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedSep 26, '11 at 7:15p
activeSep 27, '11 at 10:05a
posts5
users2
websitepear.php.net

2 users in discussion

Pear: 3 posts Alan: 2 posts

People

Translate

site design / logo © 2023 Grokbase