FAQ
Is there any clear performance difference of using multi-byte
character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
LATIN2). Why there is no UTF-32 (generally more space for chars, but
faster to calculate than multibyte ?) ? I found only at Oracle 11g
documentation that:

"For best performance, choose a character set that avoids character
set conversion and uses the most efficient encoding for the languages
desired. Single-byte character sets result in better performance than
multibyte character sets, and they also are the most efficient in
terms of space requirements. However, single-byte character sets limit
how many languages you can support."

Regards,
Grzegorz Sz.

Search Discussions

  • Susanne Ebrecht at Apr 21, 2011 at 11:49 am

    On 20.04.2011 22:29, Grzegorz Szpetkowski wrote:
    Is there any clear performance difference of using multi-byte
    character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
    LATIN2). Why there is no UTF-32 (generally more space for chars, but
    faster to calculate than multibyte ?) ? I found only at Oracle 11g
    documentation that:
    Hello Grzegorz,

    PostgreSQL didn't implement own character sets.
    We just use what libc provide. Means what you find on your OS.

    My information is there is no operating system using UTF-32.

    Did you ever feel a performance difference on your OS when you used ISO
    instead
    of UTF8?

    Regards,

    Susanne

    --
    Susanne Ebrecht - 2ndQuadrant
    PostgreSQL Development, 24x7 Support, Training and Services
    www.2ndQuadrant.com
  • Lew at Apr 21, 2011 at 6:37 pm

    Susanne Ebrecht wrote:
    Grzegorz Szpetkowski wrote:
    Is there any clear performance difference of using multi-byte
    character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
    LATIN2). Why there is no UTF-32 (generally more space for chars, but
    faster to calculate than multibyte ?) ? ...
    PostgreSQL didn't implement own character sets.
    We just use what libc provide. Means what you find on your OS.

    My information is there is no operating system using UTF-32.

    Did you ever feel a performance difference on your OS when you used ISO instead
    of UTF8?
    For locales where ISO-8859-x makes sense, UTF-8 is a mostly single-byte
    encoding its own self.

    Encoding occurs on the I/O path, so one would expect I/O effects to swamp
    differences due to encoding.

    Do you have any evidence whatsoever that encoding matters to performance, as
    Susanne's question directs, or are you just letting your imagination run away
    with you?

    To answer your question literally, of course, yes, there will be a performance
    difference between different character encodings. Whether that difference is
    positive or negative, noticeable or not, is another matter that no one else
    can answer for you. It depends on factors local to your particular
    environment, including architecture, load, bandwidth, etc.

    My guess, and guesses without evidence are as reliable as the newspaper
    astrology column for this purpose, is that the differences depend on what your
    OS natively supports - that is, if your OS is set up for UTF-8, then a LATIN-1
    encoding might be slower, and vice versa - and that you will barely if at all
    be able to detect them.

    Knuth warned us, "Premature optimization is the root of all evil." Why don't
    you just focus on a clean, maintainable data structure and good application
    design?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedApr 20, '11 at 8:30p
activeApr 21, '11 at 6:37p
posts3
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase