I was fixing #72152 when it became apparent that the base64_decode
function is very buggy.
- Null byte ends processing.
- "V" produces empty result, while "V=" fails. Not very logical.
- Too short padding is allowed, e.g. "VV=" works like "VV==".
- Extra padding is allowed (like "V=====").
- Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
second place of a 24-bit run ("V=VV=" fails).
- In strict mode, space between padding fails:
"V V==" and "VV ==" and "VV== " are allowed,
"VV= =" fails.
- In strict mode, after a padding, one character is skipped, so "VVV=V"
decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead
For each of the above, what would be the preferred behaviour in default
mode and strict mode?
Affected existing tests:
- ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
invalid base64 string, based on the invalid padding.
- ext/standard/tests/file/stream_rfc2397_006.phpt tests
"#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is
clearly not valid base64. This also raises a question whether fragments
should be skipped in data uri handling.
I've created a bug-for-bug compatible rewrite of base64_decode , with
all the bugs neatly and specifically implemented and missing features
commented out, so it's now very simple to fix them one by one.
I've also attached a test script that tests "all" possible combinations
of data, padding, NUL and other invalid characters, and my first patch
indeed provides identical results to the old implementation.
Currently interesting lines in the test results:
'base64' 'default' 'strict'
'V' '' ''
'V=' (false) (false)
'VV=' 'U' 'U'
'VV==' 'U' 'U'
'V=====' (false) (false)
'=VVV=' 'UU' (false)
'VV=V=' 'UU' (false)
'VVV==' 'UU' 'UU'
'V=VV=' (false) (false)
'V V==' 'U' 'U'
'VV ==' 'U' 'U'
'VV== ' 'U' 'U'
'VV= =' 'U' (false)
'VVV=V' 'UUU' 'UU'
'VVVV=*' 'UUU' 'UUU'
'VVVVVV=V' 'UUUUU' 'UUUU'
'VVVVVV=*' 'UUUU' 'UUUU'
'VVVV===*' 'UUU' 'UUU'
'VVV====V' 'UUU' 'UU'
'VVV====*' 'UU' 'UU'
'VV=====V' 'UU' 'U'
'VV=====*' 'U' 'U'
'=======*' '' ''