In fact, the idea of stripping content from a script file isn't
without precedent. Shebang lines are routinely removed from
cli/cgi/fpm, and if you want to properly output it, you need to do so
True, because in the context of CLI we know what is expected - a CLI
script which can start with #!. It is very unlikely that we'd have a
template run directly as CLI script and we would have this template
starting with #! which we want to output. But we lack such context in a
generic script - namely, the context that would tell us if it's safe to
drop the BOM.
So can we apply the same to the BOM? There's the obvious BC danger of
files which might depend on this behavior (declaring their encoding
via BOM, which happens to be the same as the script encoding).
Given that BOM in script files is mostly useless, and BOM in UTF-8 is
useless and not recommended for use either, I don't see why we need to.
In general, I don't think BOM is a real issue worth messing with the
lexer. Surely, from time to time somebody would use weird editor which
produces BOMs, like editing PHP scripts in Word. Surely, they'd have
weird effects that would force them to spend 5 minutes googling and
fixing it. I don't think it is the reason to spend day-persons of our
collective time to find a fix to this very niche problem and risk
potential BC issues.
If it is really becoming an issue, we could probably make the lexer
treat BOM+<? the same as <?, but I'm not convinced it is a serious
So how about declare statement?
That presumes you know there's BOM in the beginning of your file. If so,
why don't you just delete it instead of typing a long declare directive?
If you don't know it, you'd be forced to add it to every (non-template)
file in your codebase - which sounds a bit excessive.