FAQ
Hello, Internals!

I was fixing #72152 when it became apparent that the base64_decode
function is very buggy.


- Null byte ends processing.

- "V" produces empty result, while "V=" fails. Not very logical.

- Too short padding is allowed, e.g. "VV=" works like "VV==".

- Extra padding is allowed (like "V=====").

- Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
second place of a 24-bit run ("V=VV=" fails).

- In strict mode, space between padding fails:
"V V==" and "VV ==" and "VV== " are allowed,
"VV= =" fails.

- In strict mode, after a padding, one character is skipped, so "VVV=V"
decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead
of failing.


For each of the above, what would be the preferred behaviour in default
mode and strict mode?


Affected existing tests:

- ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
invalid base64 string, based on the invalid padding.

- ext/standard/tests/file/stream_rfc2397_006.phpt tests
"#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is
clearly not valid base64. This also raises a question whether fragments
should be skipped in data uri handling.


Suggestions?

I've created a bug-for-bug compatible rewrite of base64_decode [1], with
all the bugs neatly and specifically implemented and missing features
commented out, so it's now very simple to fix them one by one.

I've also attached a test script that tests "all" possible combinations
of data, padding, NUL and other invalid characters, and my first patch
indeed provides identical results to the old implementation.


Currently interesting lines in the test results:
'base64' 'default' 'strict'
'V' '' ''
'V=' (false) (false)
'VV=' 'U' 'U'
'VV==' 'U' 'U'
'V=====' (false) (false)
'=VVV=' 'UU' (false)
'VV=V=' 'UU' (false)
'VVV==' 'UU' 'UU'
'V=VV=' (false) (false)
'V V==' 'U' 'U'
'VV ==' 'U' 'U'
'VV== ' 'U' 'U'
'VV= =' 'U' (false)
'VVV=V' 'UUU' 'UU'
'VVVV=*' 'UUU' 'UUU'
'VVVVVV=V' 'UUUUU' 'UUUU'
'VVVVVV=*' 'UUUU' 'UUUU'
'VVVV===*' 'UUU' 'UUU'
'VVV====V' 'UUU' 'UU'
'VVV====*' 'UU' 'UU'
'VV=====V' 'UU' 'U'
'VV=====*' 'U' 'U'
'=======*' '' ''

--
Lauri Kenttä

Search Discussions

  • Scott Arciszewski at May 22, 2016 at 10:24 pm

    On Sun, May 22, 2016 at 6:56 AM, Lauri Kenttä wrote:
    Hello, Internals!

    I was fixing #72152 when it became apparent that the base64_decode function
    is very buggy.


    - Null byte ends processing.

    - "V" produces empty result, while "V=" fails. Not very logical.

    - Too short padding is allowed, e.g. "VV=" works like "VV==".

    - Extra padding is allowed (like "V=====").

    - Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
    second place of a 24-bit run ("V=VV=" fails).

    - In strict mode, space between padding fails:
    "V V==" and "VV ==" and "VV== " are allowed,
    "VV= =" fails.

    - In strict mode, after a padding, one character is skipped, so "VVV=V"
    decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead of
    failing.


    For each of the above, what would be the preferred behaviour in default mode
    and strict mode?


    Affected existing tests:

    - ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
    invalid base64 string, based on the invalid padding.

    - ext/standard/tests/file/stream_rfc2397_006.phpt tests
    "#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is clearly
    not valid base64. This also raises a question whether fragments should be
    skipped in data uri handling.


    Suggestions?

    I've created a bug-for-bug compatible rewrite of base64_decode [1], with all
    the bugs neatly and specifically implemented and missing features commented
    out, so it's now very simple to fix them one by one.

    I've also attached a test script that tests "all" possible combinations of
    data, padding, NUL and other invalid characters, and my first patch indeed
    provides identical results to the old implementation.


    Currently interesting lines in the test results:
    'base64' 'default' 'strict'
    'V' '' ''
    'V=' (false) (false)
    'VV=' 'U' 'U'
    'VV==' 'U' 'U'
    'V=====' (false) (false)
    '=VVV=' 'UU' (false)
    'VV=V=' 'UU' (false)
    'VVV==' 'UU' 'UU'
    'V=VV=' (false) (false)
    'V V==' 'U' 'U'
    'VV ==' 'U' 'U'
    'VV== ' 'U' 'U'
    'VV= =' 'U' (false)
    'VVV=V' 'UUU' 'UU'
    'VVVV=*' 'UUU' 'UUU'
    'VVVVVV=V' 'UUUUU' 'UUUU'
    'VVVVVV=*' 'UUUU' 'UUUU'
    'VVVV===*' 'UUU' 'UUU'
    'VVV====V' 'UUU' 'UU'
    'VVV====*' 'UU' 'UU'
    'VV=====V' 'UU' 'U'
    'VV=====*' 'U' 'U'
    '=======*' '' ''

    --
    Lauri Kenttä
    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
    I'm surprised to see that these functions weren't heavily tested.

    I've included some of the RFC 4648 test vectors in my
    cache-timing-safe userland implementation at
    https://github.com/paragonie/constant_time_encoding -- maybe this
    would be useful for base64_decode() as well?

    Scott Arciszewski
    Chief Development Officer
    Paragon Initiative Enterprises <https://paragonie.com>
  • Yasuo Ohgaki at May 23, 2016 at 12:52 am
    Hi Lauri,
    On Sun, May 22, 2016 at 7:56 PM, Lauri Kenttä wrote:

    Suggestions?
    IMHO

    Return FALSE for any invalid input when strict mode is on.
    Enable strict mode by default.
    Keep current behavior when strict mode is off.

    would be the best.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
  • Joe Watkins at May 23, 2016 at 4:59 am
    Morning,

         Would it be possible to open a PR that fixes the programming errors,
    such as oob read, and the strange flow of the subsequent block ?

         Assuming those fixes don't effect normal usage, I see no reason why
    that can't be merged immediately.

         When it comes to changing the behaviour, that needs a little more
    discussion I think, and I wait to see what others think about those changes.

    Cheers
    Joe
    On Mon, May 23, 2016 at 1:52 AM, Yasuo Ohgaki wrote:

    Hi Lauri,
    On Sun, May 22, 2016 at 7:56 PM, Lauri Kenttä wrote:

    Suggestions?
    IMHO

    Return FALSE for any invalid input when strict mode is on.
    Enable strict mode by default.
    Keep current behavior when strict mode is off.

    would be the best.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Niklas Keller at May 23, 2016 at 7:31 am
    Completely missing padding shouldn't fail, it's often removed to save space
    or when base64 is converted to base64url.

    Joe Watkins <pthreads@pthreads.org> schrieb am Mo., 23. Mai 2016 06:59:
    Morning,

    Would it be possible to open a PR that fixes the programming errors,
    such as oob read, and the strange flow of the subsequent block ?

    Assuming those fixes don't effect normal usage, I see no reason why
    that can't be merged immediately.

    When it comes to changing the behaviour, that needs a little more
    discussion I think, and I wait to see what others think about those
    changes.

    Cheers
    Joe
    On Mon, May 23, 2016 at 1:52 AM, Yasuo Ohgaki wrote:

    Hi Lauri,

    On Sun, May 22, 2016 at 7:56 PM, Lauri Kenttä <lauri.kentta@gmail.com>
    wrote:
    Suggestions?
    IMHO

    Return FALSE for any invalid input when strict mode is on.
    Enable strict mode by default.
    Keep current behavior when strict mode is off.

    would be the best.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Lauri Kenttä at May 25, 2016 at 8:38 pm
    Thanks for all the comments.

    Update:
    - Null byte ends processing.
    - In strict mode, space between padding fails
    - In strict mode, after a padding, one character is skipped
    https://github.com/php/php-src/pull/1923
    - Too short padding is allowed, e.g. "VV=" works like "VV==".
    - Extra padding is allowed (like "V=====").
    WONTFIX / NOTABUG. This doesn't really cause any problems.
    - "V" produces empty result, while "V=" fails. Not very logical.
    Any comments?
    I'd prefer failing in both cases. 6 bits out of 24 is invalid
    and might mean that the data is truncated by accident.
    - Invalid padding is allowed ("=VVV=", "VV=V=")
    Any comments? Strict mode at least gets this one right.
    It's really sad if someone relies on this "feature".

    --
    Lauri Kenttä

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedMay 22, '16 at 10:56a
activeMay 25, '16 at 8:38p
posts6
users5
websitephp.net

People

Translate

site design / logo © 2018 Grokbase