|| at Apr 21, 2011 at 6:37 pm
Susanne Ebrecht wrote:
Grzegorz Szpetkowski wrote:
Is there any clear performance difference of using multi-byte
character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
LATIN2). Why there is no UTF-32 (generally more space for chars, but
faster to calculate than multibyte ?) ? ...
PostgreSQL didn't implement own character sets.
We just use what libc provide. Means what you find on your OS.
My information is there is no operating system using UTF-32.
Did you ever feel a performance difference on your OS when you used ISO instead
For locales where ISO-8859-x makes sense, UTF-8 is a mostly single-byte
encoding its own self.
Encoding occurs on the I/O path, so one would expect I/O effects to swamp
differences due to encoding.
Do you have any evidence whatsoever that encoding matters to performance, as
Susanne's question directs, or are you just letting your imagination run away
To answer your question literally, of course, yes, there will be a performance
difference between different character encodings. Whether that difference is
positive or negative, noticeable or not, is another matter that no one else
can answer for you. It depends on factors local to your particular
environment, including architecture, load, bandwidth, etc.
My guess, and guesses without evidence are as reliable as the newspaper
astrology column for this purpose, is that the differences depend on what your
OS natively supports - that is, if your OS is set up for UTF-8, then a LATIN-1
encoding might be slower, and vice versa - and that you will barely if at all
be able to detect them.
Knuth warned us, "Premature optimization is the root of all evil." Why don't
you just focus on a clean, maintainable data structure and good application