Tom Lane wrote:

I just spent a bit of time considering what we might do to fix this.
The idea mentioned in the above thread was to switch over to using
wchar_t in the regex code, but that seems to have a number of problems.
One showstopper is that on some platforms wchar_t is only 16 bits and
can't represent the full range of Unicode characters. I don't want to
fix case-folding only to break regexes for other uses.
We have a TODO item about having a regex specific data type. Would
implementing that solve this problem?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Search Discussions

  • Tom Lane at Dec 1, 2009 at 9:52 pm

    Alvaro Herrera writes:
    Tom Lane wrote:
    I just spent a bit of time considering what we might do to fix this.
    The idea mentioned in the above thread was to switch over to using
    wchar_t in the regex code, but that seems to have a number of problems.
    One showstopper is that on some platforms wchar_t is only 16 bits and
    can't represent the full range of Unicode characters. I don't want to
    fix case-folding only to break regexes for other uses.
    We have a TODO item about having a regex specific data type. Would
    implementing that solve this problem?
    No, not particularly --- the stumbling block here is really impedance
    mismatch between our internal APIs and libc's standard locale support.
    The TODO item that would fix it is implementing our own locale support;
    but I ain't holding my breath for that one.

    AFAIR the motivation for a regex data type was solely performance.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedDec 1, '09 at 9:46p
activeDec 1, '09 at 9:52p
posts2
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Alvaro Herrera: 1 post Tom Lane: 1 post

People

Translate

site design / logo © 2022 Grokbase