I'm in the process of converting some C functions written for another system
into C functions in a shared library that will be used by PostgreSQL, The
key function will be the state transition function for a user-defined
aggregate. From what I've read in the documentation:

1. the state value should initially be allocated in the "((AggState
*)fcinfo->context)->aggcontext" memory context.
2. this value will be passed to successive calls of the aggregate until it
completes and calls it's final function.

Since my state is fairly complex I intend to make my state value type text
to give myself a block of memory in which I can manage the various pointers
I need. I realize that I will need to be careful about alignment issues and
intend to store the state as a large struct in the data area of the text.

Q1 Is this approach for the state variable reasonable?

The existing C code that I've inherited makes heavy use of static and global
variables. I immediately assumed that for thread safety I'd need to roll any
of these variables that need to survive a function call into the state
variable struct and change any that were just for convenience instead of
parameters into parameters to my internal functions. I wanted to verify this
approach and have done fairly extensive searching on both the PostgreSQL
site and the Web in general. I didn't find any general guideline about not
using static and global variables. I did find a reference in the FAQ,
http://www.postgresql.org/docs/faqs.FAQ_DEV.html#item1.14 that "threads are
not currently used in the backend code".

Q2 Is my assumption that I shouldn't be using static and global variables in
a shared library because they aren't safe under whatever task switching
mechanism the backend is using correct?

Search Discussions

  • Tom Lane at Sep 14, 2007 at 5:21 pm

    "Don Walker" <don.walker@versaterm.com> writes:
    Since my state is fairly complex I intend to make my state value type text
    to give myself a block of memory in which I can manage the various pointers
    I need. I realize that I will need to be careful about alignment issues and
    intend to store the state as a large struct in the data area of the text.
    Q1 Is this approach for the state variable reasonable?
    Not very. In the first place, it's a horrid idea to put non-textual
    data into a text datum. This is not invisible-to-the-user stuff, they
    can easily see it by calling your transition function directly.
    bytea might be a better choice for holding random data. In the second
    place, your reference to pointers scares me. You cannot assume that the
    nodeAgg code won't copy the transition value from one place to another,
    so internal pointers aren't going to work. Can you use offsets instead?
    The existing C code that I've inherited makes heavy use of static and global
    variables. I immediately assumed that for thread safety I'd need to roll any
    of these variables that need to survive a function call into the state
    variable struct and change any that were just for convenience instead of
    parameters into parameters to my internal functions.
    There are no threads in the backend. What you *do* need to worry about
    is parallel evaluation of multiple instances of your aggregate, for
    instance "select myagg(x), myagg(y) from table". Another tricky problem
    is reinitialization if a query fails partway through and so you never
    get as far as running your finalization function. It's certainly
    easiest if you can keep all your state in the transition datum. People
    have been known to cheat, though.

    regards, tom lane
  • Don Walker at Sep 14, 2007 at 7:57 pm
    Q1 a) I will use bytea instead of text.

    Q1 b) I didn't intend to imply that my state value struct would contain
    pointers but thanks for reminding me that I should initialize any temporary
    pointers into the memory area using offsets.

    Q2 a) If I understand you correctly the use of static or global variables is
    generally a bad idea and there are cases where it could produce incorrect
    results.

    Q2 b) You seem to imply that, since the backend doesn't use threads,
    simultaneous single evaluations of my aggregate by different
    users/connections would not be a problem for the static or global variables.
    If 200 users want to evaluate my aggregate at the same time how does the
    backend service them? I'm only asking this out of curiosity so I won't be
    too disappointed if you don't reply (because the answer is too long and
    complicated, etc.).

    Thanks for your reply,
    Don Walker

    -----Original Message-----
    From: Tom Lane
    Sent: September 14, 2007 13:21
    To: Don Walker
    Cc: pgsql-hackers@postgresql.org
    Subject: Re: [HACKERS] Use of global and static variables in shared
    libraries


    "Don Walker" <don.walker@versaterm.com> writes:
    Since my state is fairly complex I intend to make my state value type text
    to give myself a block of memory in which I can manage the various pointers
    I need. I realize that I will need to be careful about alignment issues and
    intend to store the state as a large struct in the data area of the text.
    Q1 Is this approach for the state variable reasonable?
    Not very. In the first place, it's a horrid idea to put non-textual
    data into a text datum. This is not invisible-to-the-user stuff, they
    can easily see it by calling your transition function directly.
    bytea might be a better choice for holding random data. In the second
    place, your reference to pointers scares me. You cannot assume that the
    nodeAgg code won't copy the transition value from one place to another,
    so internal pointers aren't going to work. Can you use offsets instead?
    The existing C code that I've inherited makes heavy use of static and global
    variables. I immediately assumed that for thread safety I'd need to roll any
    of these variables that need to survive a function call into the state
    variable struct and change any that were just for convenience instead of
    parameters into parameters to my internal functions.
    There are no threads in the backend. What you *do* need to worry about
    is parallel evaluation of multiple instances of your aggregate, for
    instance "select myagg(x), myagg(y) from table". Another tricky problem
    is reinitialization if a query fails partway through and so you never
    get as far as running your finalization function. It's certainly
    easiest if you can keep all your state in the transition datum. People
    have been known to cheat, though.

    regards, tom lane
  • Heikki Linnakangas at Sep 14, 2007 at 9:13 pm

    Don Walker wrote:
    Q2 b) You seem to imply that, since the backend doesn't use threads,
    simultaneous single evaluations of my aggregate by different
    users/connections would not be a problem for the static or global variables.
    If 200 users want to evaluate my aggregate at the same time how does the
    backend service them?
    They will all use a different backend process. Each backend serves one
    connection.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 14, '07 at 3:54p
activeSep 14, '07 at 9:13p
posts4
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase