Greetings,

Minor enhancement, but a valuable one imv. Hopefully there aren't any
issues with it. :)

Thanks!

Stephen

commit 3cb707aa9f228e629e7127625a76a223751a778b
Author: Stephen Frost <sfrost@snowman.net>
Date: Wed Jan 12 09:17:31 2011 -0500

Add support for logging the current role

This adds a '%o' option to the log_line_prefix GUC which will log the
current role. The '%u' option only logs the Session user, which can
be misleading, but it's valuable to have both options.

*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 3508,3513 **** local0.* /var/log/postgresql
--- 3508,3518 ----
<entry>yes</entry>
</row>
<row>
+ <entry><literal>%o</literal></entry>
+ <entry>Current role name</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
<entry><literal>%d</literal></entry>
<entry>Database name</entry>
<entry>yes</entry>
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
***************
*** 1826,1831 **** log_line_prefix(StringInfo buf, ErrorData *edata)
--- 1826,1841 ----
appendStringInfoString(buf, username);
}
break;
+ case 'o':
+ if (MyProcPort)
+ {
+ const char *rolename = GetUserNameFromId(GetUserId());
+
+ if (rolename == NULL || *rolename == '\0')
+ rolename = _("[unknown]");
+ appendStringInfoString(buf, rolename);
+ }
+ break;
case 'd':
if (MyProcPort)
{

Search Discussions

  • Robert Haas at Jan 12, 2011 at 3:06 pm

    On Wed, Jan 12, 2011 at 9:23 AM, Stephen Frost wrote:
    Minor enhancement, but a valuable one imv.  Hopefully there aren't any
    issues with it. :)
    1. Why %o? That's not obviously mnemonic. Perhaps %U?

    2. It won't be clear to people reading this what the difference is
    between %u and this. You probably need to reword the documentation
    for the existing option as well as documenting the new one.

    3. Please attach the patch rather than including it inline, if possible.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at Jan 12, 2011 at 3:12 pm

    * Robert Haas (robertmhaas@gmail.com) wrote:
    On Wed, Jan 12, 2011 at 9:23 AM, Stephen Frost wrote:
    Minor enhancement, but a valuable one imv.  Hopefully there aren't any
    issues with it. :)
    1. Why %o? That's not obviously mnemonic. Perhaps %U?
    r was taken? :) I'm not sure I like %U, but in the end I don't *really*
    care. I'll update it to %U and wait for someone else to complain.
    2. It won't be clear to people reading this what the difference is
    between %u and this. You probably need to reword the documentation
    for the existing option as well as documenting the new one.
    Fair enough.
    3. Please attach the patch rather than including it inline, if possible.
    Hrm, I could have sworn that Tom had asked for the exact opposite in the
    past, but either way is fine by me.

    Stephen
  • Robert Haas at Jan 12, 2011 at 3:14 pm

    On Wed, Jan 12, 2011 at 10:12 AM, Stephen Frost wrote:
    r was taken? :)  I'm not sure I like %U, but in the end I don't *really*
    care.  I'll update it to %U and wait for someone else to complain.
    The joys of community...
    3. Please attach the patch rather than including it inline, if possible.
    Hrm, I could have sworn that Tom had asked for the exact opposite in the
    past, but either way is fine by me.
    Really? I don't remember that, but it's certainly possible. My
    problem is that cutting and pasting from a browser window into a patch
    file tends to be a little iffy. If you paste too much or too little
    or the whitespace doesn't come out quite right, the patch doesn't
    apply.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Jan 12, 2011 at 3:33 pm

    Robert Haas writes:
    On Wed, Jan 12, 2011 at 10:12 AM, Stephen Frost wrote:
    Hrm, I could have sworn that Tom had asked for the exact opposite in the
    past, but either way is fine by me.
    Really? I don't remember that, but it's certainly possible.
    I don't remember saying exactly that either. The main point is to
    ensure the patch doesn't get mangled in transmission. I've seen people
    screw it up both ways: inline is much more vulnerable to mailers
    deciding to whack whitespace around, while attachments are vulnerable to
    being encoded in all sorts of weird ways, some of which come out nicely
    in the archives and some of which don't. I'm not in favor of gzipping
    small patches that could perfectly well be left in readable form.

    This particular patch looks fine here:
    http://archives.postgresql.org/pgsql-hackers/2011-01/msg00845.php
    so I'm thinking Stephen doesn't need to revisit his technique.

    +1 for choosing something more mnemonic than "%o", btw.

    regards, tom lane
  • Stephen Frost at Jan 12, 2011 at 3:43 pm

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    +1 for choosing something more mnemonic than "%o", btw.
    Alright, not to be *too* ridiculous about this, but I'm feeling like
    '%R' might be better than '%U', if we don't mind overloading a single
    letter based on case. I've always been annoyed at the lack of
    distinction between 'user' and 'role' in our docs and feel it does lead
    to some confusion.

    Updated patch attached, if people agree. Compiles, passes regressions,
    works as advertised, etc.

    commit bba27fe63702405514ed2c3bb72b70cc178f9ce1
    Author: Stephen Frost <sfrost@snowman.net>
    Date: Wed Jan 12 10:38:24 2011 -0500

    Change log_line_prefix for current role to %R

    As we're going for a mnemonic, and this is really about roles
    instead of users, change log_line_prefix argument to %R from
    %U for current_role.

    Thanks,

    Stephen
  • Robert Haas at Jan 12, 2011 at 3:46 pm

    On Wed, Jan 12, 2011 at 10:43 AM, Stephen Frost wrote:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    +1 for choosing something more mnemonic than "%o", btw.
    Alright, not to be *too* ridiculous about this, but I'm feeling like
    '%R' might be better than '%U', if we don't mind overloading a single
    letter based on case.  I've always been annoyed at the lack of
    distinction between 'user' and 'role' in our docs and feel it does lead
    to some confusion.

    Updated patch attached, if people agree.  Compiles, passes regressions,
    works as advertised, etc.
    I was thinking that %u/%U would have the advantage of implying some
    connection between the two things which is in fact present. %r/%R
    seems not quite as good to me. Also, let's paint it tangerine.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at Jan 12, 2011 at 4:00 pm

    * Robert Haas (robertmhaas@gmail.com) wrote:
    I was thinking that %u/%U would have the advantage of implying some
    connection between the two things which is in fact present. %r/%R
    seems not quite as good to me. Also, let's paint it tangerine.
    I figured that's where you were going.

    +1 for whatever the committer wants to commit. ;)

    Stephen
  • Robert Haas at Jan 12, 2011 at 4:41 pm

    On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost wrote:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    I was thinking that %u/%U would have the advantage of implying some
    connection between the two things which is in fact present.  %r/%R
    seems not quite as good to me.  Also, let's paint it tangerine.
    I figured that's where you were going.

    +1 for whatever the committer wants to commit. ;)
    OK, done. :-)

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Jan 12, 2011 at 4:53 pm

    Robert Haas writes:
    On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost wrote:
    +1 for whatever the committer wants to commit. ;)
    OK, done. :-)
    Uh, did you actually stop to *think* about this patch?

    What you have just committed puts a syscache lookup into the elog output
    path. Quite aside from the likely performance hit, this will
    malfunction badly in any case where we're trying to log from an aborted
    transaction.

    Please revert and rethink.

    regards, tom lane
  • Robert Haas at Jan 12, 2011 at 4:59 pm

    On Wed, Jan 12, 2011 at 11:53 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost wrote:
    +1 for whatever the committer wants to commit. ;)
    OK, done.  :-)
    Uh, did you actually stop to *think* about this patch?
    You have a valid point here, but this isn't the most tactful way of putting it.
    What you have just committed puts a syscache lookup into the elog output
    path.  Quite aside from the likely performance hit, this will
    malfunction badly in any case where we're trying to log from an aborted
    transaction.

    Please revert and rethink.
    I think it's going to take more than a rethink - I don't see any way
    to salvage it. :-(

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at Jan 12, 2011 at 5:13 pm

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    Uh, did you actually stop to *think* about this patch?
    Actually, I was worried about exactly that, but I didn't see anything at
    the top of elog.c that indicated if it'd be a problem or not (and the
    Syscache lookup issue was *exactly* what I was looking for). :( There
    was much discussion about recursion and memory commit and whatnot, but
    nothing about SysCache lookups.
    What you have just committed puts a syscache lookup into the elog output
    path. Quite aside from the likely performance hit, this will
    malfunction badly in any case where we're trying to log from an aborted
    transaction.
    I had been looking into storing the current role inside the Proc struct
    or in some new variable and then pulling it from there (updating it when
    needed during a SET ROLE, of course), but it seemed a bit of overkill if
    it wasn't necessary (which wasn't obvious to me). We could also just log
    the role's OID (%o anyone..?), since that doesn't need a syscache lookup
    to get at. I'd much rather log the role name if we can tho.

    I had looked through some of the other calls happening in log_line_prefix
    and didn't see any explicit syscache lookups but it seemed like we were
    doing quite a few other things that might have issues, so I had assumed
    it'd be alright. Sorry about that.

    Thanks,

    Stephen
  • Robert Haas at Jan 12, 2011 at 5:25 pm

    On Wed, Jan 12, 2011 at 12:13 PM, Stephen Frost wrote:
    What you have just committed puts a syscache lookup into the elog output
    path.  Quite aside from the likely performance hit, this will
    malfunction badly in any case where we're trying to log from an aborted
    transaction.
    I had been looking into storing the current role inside the Proc struct
    or in some new variable and then pulling it from there (updating it when
    needed during a SET ROLE, of course), but it seemed a bit of overkill if
    it wasn't necessary (which wasn't obvious to me). We could also just log
    the role's OID (%o anyone..?), since that doesn't need a syscache lookup
    to get at.  I'd much rather log the role name if we can tho.
    Logging the OID seems to be of questionable value. I thought of the
    update-the-variable-when-it-changes approach too, but besides being a
    bit expensive if it's changing frequently, it's not necessarily safe
    to do the syscache lookup there either - see the comments for
    GetUserIdAndSecContext (which are really for SetUserIdAndSecContext,
    but they're in an odd place).

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at Jan 12, 2011 at 6:00 pm

    * Robert Haas (robertmhaas@gmail.com) wrote:
    Logging the OID seems to be of questionable value.
    I certainly disagree about this, not being able to figure out what's
    causing a 'permissions denied' error because you don't know which role
    the log is coming from is *very* annoying. Having to go look up the
    role from the OID in the log is also annoying, but less so, imv. :)
    I thought of the
    update-the-variable-when-it-changes approach too, but besides being a
    bit expensive if it's changing frequently, it's not necessarily safe
    to do the syscache lookup there either - see the comments for
    GetUserIdAndSecContext (which are really for SetUserIdAndSecContext,
    but they're in an odd place).
    Alright, here's a patch which adds the ability to log the current role's
    OID and which calls GetUserIdAndSecContext() directly and handles the
    possibility that CurrentUserId isn't valid. Perhaps we should just grab
    CurrentUserId directly rather than going through
    GetUserIdAndSecContext()? I could certainly do that instead.

    Also includes those additional comments in elog.c.

    Thanks,

    Stephen

    commit d9a7acd5ea1f5214b44875b6d257c5c59590167c
    Author: Stephen Frost <sfrost@snowman.net>
    Date: Wed Jan 12 12:53:50 2011 -0500

    Use GetUserIdAndSecContext to get role OID in elog

    We can't be sure that GetUserId() will be called when current_user
    is a valid Oid, per the comments in GetUserIdAndSecContext, when
    called from elog.c, so instead call GetUserIdAndSecContext directly
    and handle the possible invalid Oid ourselves.

    commit 605497b062298ea195d8999f8cefca10968ae22f
    Author: Stephen Frost <sfrost@snowman.net>
    Date: Wed Jan 12 12:29:44 2011 -0500

    Change to logging role's OID instead of name

    Remove the SysCache lookup from elog.c/log_line_prefix by logging
    the role's OID instead, this addresses a concern where a SysCache
    lookup could malfunction badly due to logging from a failed
    transaction. Note that using SysCache from the elog routines could
    also be a performance hit, though this would only be the case if a
    user chose to enable that logging.
  • Robert Haas at Jan 12, 2011 at 7:03 pm

    On Wed, Jan 12, 2011 at 12:59 PM, Stephen Frost wrote:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    Logging the OID seems to be of questionable value.
    I certainly disagree about this, not being able to figure out what's
    causing a 'permissions denied' error because you don't know which role
    the log is coming from is *very* annoying.
    Interesting. I wonder if we shouldn't try to fix this by including
    the relevant role name in the error message. Or is that just going to
    be too messy to live?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at Jan 13, 2011 at 12:59 am

    * Robert Haas (robertmhaas@gmail.com) wrote:
    On Wed, Jan 12, 2011 at 12:59 PM, Stephen Frost wrote:
    I certainly disagree about this, not being able to figure out what's
    causing a 'permissions denied' error because you don't know which role
    the log is coming from is *very* annoying.
    Interesting. I wonder if we shouldn't try to fix this by including
    the relevant role name in the error message. Or is that just going to
    be too messy to live?
    It might be possible to do and answer that specific question- but what
    about the obvious next question: which role was this command run with?
    iow, if I log dml, how do I know what the role was when the dml
    statement was run? ie- why was this command allowed?

    Let's ask another question- why do we provide a %u option in
    log_line_prefix instead of just logging it as part of each statement?
    When you have roles that aren't 'inherit' and have a lot of 'set role's
    happening, you end up asking the same questions about role that you
    would about user.

    As a side-note, CurrentUserId isn't actually exported (I'm not suprised,
    tbh, but I've actually checked now), so you have to go through
    GetUserIdAndSecContext().

    Thanks,

    Stephen
  • Tom Lane at Jan 13, 2011 at 1:16 am

    Stephen Frost writes:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    Interesting. I wonder if we shouldn't try to fix this by including
    the relevant role name in the error message. Or is that just going to
    be too messy to live?
    It might be possible to do and answer that specific question- but what
    about the obvious next question: which role was this command run with?
    iow, if I log dml, how do I know what the role was when the dml
    statement was run? ie- why was this command allowed?
    I'm less than excited about that argument because it's after the fact
    --- if you needed to know the information, you probably didn't have
    log_line_prefix set correctly, even assuming you had adequate logging
    otherwise. And logging an OID just seems too ugly to live.

    Another little problem with the quick and dirty solution is that stuff
    that's important enough to warrant a log_line_prefix escape is generally
    thought to be important enough to warrant inclusion in CSV logs. That
    would imply adding a column and taking the resultant compatibility hit.

    regards, tom lane
  • Stephen Frost at Jan 13, 2011 at 1:22 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    Stephen Frost <sfrost@snowman.net> writes:
    It might be possible to do and answer that specific question- but what
    about the obvious next question: which role was this command run with?
    iow, if I log dml, how do I know what the role was when the dml
    statement was run? ie- why was this command allowed?
    I'm less than excited about that argument because it's after the fact
    --- if you needed to know the information, you probably didn't have
    log_line_prefix set correctly, even assuming you had adequate logging
    otherwise. And logging an OID just seems too ugly to live.
    Erm, really? Ok, fine, maybe you didn't have log_line_prefix set
    correctly the first time you needed the information, but after you
    discover that you *don't know*, you're going to be looking for an option
    to let you get that information for the future. I would also suggest
    that more experienced admins are going to have a default log_line_prefix
    that they install on new systems they set up (I know I do...), and I'd
    be suprised if knowing the role that a command is actually run as wasn't
    popular among that set.

    I don't like logging an OID either.
    Another little problem with the quick and dirty solution is that stuff
    that's important enough to warrant a log_line_prefix escape is generally
    thought to be important enough to warrant inclusion in CSV logs. That
    would imply adding a column and taking the resultant compatibility hit.
    I'd be more than happy to add support for this to the CSV logs. I agree
    that it'd make sense to do. I think we need to solve the bigger problem
    of OID vs. rolename vs. lookups from elog first though.

    Thanks,

    Stephen
  • Stephen Frost at Jan 13, 2011 at 1:26 am

    * Stephen Frost (sfrost@snowman.net) wrote:
    Erm, really? Ok, fine, maybe you didn't have log_line_prefix set
    correctly the first time you needed the information, but after you
    discover that you *don't know*, you're going to be looking for an option
    to let you get that information for the future.
    Oh, yeah, and honestly, the above is the reason I'm after this myself- I
    was having a difficult time figuring out which of 300-odd users a given
    error was happening for and was annoyed that I couldn't figure out what
    role it was. The web server logs in with a 'general' role that doesn't
    inherit anything and then has to 'set role' to whatever role the user
    authenticated with. After that 'set role' happens, we've got no way to
    know from the logs who is impacted by a given error.

    Guess I'm just trying to say that I didn't write this patch as an
    academic exercise but rather because it solves a real world problem for
    me.

    Thanks,

    Stephen
  • Tom Lane at Jan 13, 2011 at 1:29 am

    Stephen Frost writes:
    Guess I'm just trying to say that I didn't write this patch as an
    academic exercise but rather because it solves a real world problem for
    me.
    I understand. But doing this right is going to take more than ten lines
    of code, and more than a negligible performance penalty. We have to
    consider whether it's worth it.

    regards, tom lane
  • Stephen Frost at Jan 13, 2011 at 1:45 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    I understand. But doing this right is going to take more than ten lines
    of code, and more than a negligible performance penalty. We have to
    consider whether it's worth it.
    It'd be ideal if the performance hit could only be felt by people who
    want to enable the option. On the flip side, I don't know that adding a
    bit of extra work to SET ROLE would be that bad. If it helps (and I
    don't know if it does, I'm still trying to wrap my head around
    GetUserIdAndSecContext/SetUserIdAndSecContext), I'd be fine with *not*
    trying to log the right role when inside Security Definter functions
    (after all, if those are getting called, the user could go look at the
    function definition to see which role it's being run as).

    I gather one issue is how we can pick up what the correct role name is
    when resetting the role due to a failed transaction..? Building a stack
    with all the role names pre-cached to deal with that wouldn't be likely
    to work and we'd need more than one level to deal with savepoints, I
    assume? We could reset it to an Invalid name on abort and then detect
    that it needs to be corrected at the start of the next transaction,
    perhaps?

    Thanks,

    Stephen
  • Tom Lane at Jan 13, 2011 at 2:00 am

    Stephen Frost writes:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    I understand. But doing this right is going to take more than ten lines
    of code, and more than a negligible performance penalty. We have to
    consider whether it's worth it.
    It'd be ideal if the performance hit could only be felt by people who
    want to enable the option. On the flip side, I don't know that adding a
    bit of extra work to SET ROLE would be that bad. If it helps (and I
    don't know if it does, I'm still trying to wrap my head around
    GetUserIdAndSecContext/SetUserIdAndSecContext), I'd be fine with *not*
    trying to log the right role when inside Security Definter functions
    (after all, if those are getting called, the user could go look at the
    function definition to see which role it's being run as).
    I gather one issue is how we can pick up what the correct role name is
    when resetting the role due to a failed transaction..? Building a stack
    with all the role names pre-cached to deal with that wouldn't be likely
    to work and we'd need more than one level to deal with savepoints, I
    assume? We could reset it to an Invalid name on abort and then detect
    that it needs to be corrected at the start of the next transaction,
    perhaps?
    I seem to recall that the assign hook for role stores the string form of
    the role name anyway. So in principle you could arrange for that to get
    dumped someplace where elog.c could look at it (think about just adding
    a string parameter to SetUserIdAndSecContext). It wouldn't track the
    effects of RENAME ROLE against an actively-used role, but then again
    neither does %u.

    I'm not actually concerned about adding a few extra cycles to SET ROLE
    for this. What bothered me more was the cost of adding another output
    column to CSV log mode. That's not something you're going to be able to
    finesse such that only people who care pay the cost.

    regards, tom lane
  • Andrew Dunstan at Jan 14, 2011 at 4:36 pm

    On 01/12/2011 08:59 PM, Tom Lane wrote:
    I'm not actually concerned about adding a few extra cycles to SET ROLE
    for this. What bothered me more was the cost of adding another output
    column to CSV log mode. That's not something you're going to be able to
    finesse such that only people who care pay the cost.
    I think it's time to revisit the design of CSV logs again, now we have
    two or three releases worth of experience with it. It needs some
    flexibility and refinement.

    cheers

    andrew
  • Tom Lane at Jan 14, 2011 at 4:44 pm

    Andrew Dunstan writes:
    On 01/12/2011 08:59 PM, Tom Lane wrote:
    I'm not actually concerned about adding a few extra cycles to SET ROLE
    for this. What bothered me more was the cost of adding another output
    column to CSV log mode. That's not something you're going to be able to
    finesse such that only people who care pay the cost.
    I think it's time to revisit the design of CSV logs again, now we have
    two or three releases worth of experience with it. It needs some
    flexibility and refinement.
    It would definitely be nice to support optional columns a little better.
    I'm not even sure whether the runtime overhead is worth worrying about
    (maybe it is, maybe it isn't, I have no data). But I do know that
    adding a column to the CSV output format spec causes a flag day for
    users. How can we avoid that?

    regards, tom lane
  • Stephen Frost at Jan 14, 2011 at 4:48 pm

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    Andrew Dunstan <andrew@dunslane.net> writes:
    I think it's time to revisit the design of CSV logs again, now we have
    two or three releases worth of experience with it. It needs some
    flexibility and refinement.
    It would definitely be nice to support optional columns a little better.
    I'm not even sure whether the runtime overhead is worth worrying about
    (maybe it is, maybe it isn't, I have no data). But I do know that
    adding a column to the CSV output format spec causes a flag day for
    users. How can we avoid that?
    My first thought would be to have a 'log_csv_format' GUC that's very
    similar to 'log_line_prefix' (and uses the same variables if
    possible..). We could then ship a default in postgresql.conf that
    matches what the current format is while adding the other options if
    people want to use them.

    If we could have all the processing to generate that line go through the
    same function for log_line_prefix and log_csv_format, that'd be even
    better. Makes me tempted to throw out the current notion of
    'log_line_*prefix*' and replace it with 'log_line_*format*' to match
    exactly the 'log_csv_format' that I'm proposing. That'd undoubtably
    cause more user headaches tho... :(

    Thanks,

    Stephen
  • Andrew Dunstan at Jan 14, 2011 at 9:56 pm

    On 01/14/2011 11:48 AM, Stephen Frost wrote:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    Andrew Dunstan<andrew@dunslane.net> writes:
    I think it's time to revisit the design of CSV logs again, now we have
    two or three releases worth of experience with it. It needs some
    flexibility and refinement.
    It would definitely be nice to support optional columns a little better.
    I'm not even sure whether the runtime overhead is worth worrying about
    (maybe it is, maybe it isn't, I have no data). But I do know that
    adding a column to the CSV output format spec causes a flag day for
    users. How can we avoid that?
    My first thought would be to have a 'log_csv_format' GUC that's very
    similar to 'log_line_prefix' (and uses the same variables if
    possible..). We could then ship a default in postgresql.conf that
    matches what the current format is while adding the other options if
    people want to use them.

    If we could have all the processing to generate that line go through the
    same function for log_line_prefix and log_csv_format, that'd be even
    better. Makes me tempted to throw out the current notion of
    'log_line_*prefix*' and replace it with 'log_line_*format*' to match
    exactly the 'log_csv_format' that I'm proposing. That'd undoubtably
    cause more user headaches tho... :(

    I'm not sure I really want to make it that flexible :-)

    To deal with the issue Tom's referring to, I think it would be
    sufficient if we just allowed users to suppress production of certain
    columns (as long as we never do anything so evil as to add a new column
    in the middle).

    There are some other issues with the format. I know Josh has bitched
    about the presence of command tags in certain fields, for example.

    cheers

    andrew
  • Aidan Van Dyk at Jan 14, 2011 at 10:04 pm

    On Fri, Jan 14, 2011 at 4:56 PM, Andrew Dunstan wrote:
    I'm not sure I really want to make it that flexible :-)

    To deal with the issue Tom's referring to, I think it would be sufficient if
    we just allowed users to suppress production of certain columns (as long as
    we never do anything so evil as to add a new column in the middle).

    There are some other issues with the format. I know Josh has bitched about
    the presence of command tags in certain fields, for example.
    If there is going to be any change, how about using fixed columns (an
    possibly allowing them to be empty for stuff that's expensive to
    create/write), but adding a 1st column that contains a "version"
    identifyer. And to make it easy, maybe the PG major version as the
    version value.

    If the 1st column is always the version, tools can easily know if
    they understand all the columns (and what order they are in) and it'
    easy to write a "conversion" that strips/re-aranges columns from a
    newer CVS dump to match an older one if you have tools that don't know
    about newer column layouts..

    Personally, I'm not worried about the CSV logs being backwards
    compatible as long as there's a very easy way to know what I might be
    looking at, so conversion is easy...

    But then again, I don't have multiple gigabytes of logs to process either.

    a.

    --
    Aidan Van Dyk                                             Create like a god,
    aidan@highrise.ca                                       command like a king,
    http://www.highrise.ca/                                   work like a slave.
  • Andrew Dunstan at Jan 14, 2011 at 10:10 pm

    On 01/14/2011 05:04 PM, Aidan Van Dyk wrote:
    If there is going to be any change, how about using fixed columns (an
    possibly allowing them to be empty for stuff that's expensive to
    create/write), but adding a 1st column that contains a "version"
    identifyer. And to make it easy, maybe the PG major version as the
    version value.

    If the 1st column is always the version, tools can easily know if
    they understand all the columns (and what order they are in) and it'
    easy to write a "conversion" that strips/re-aranges columns from a
    newer CVS dump to match an older one if you have tools that don't know
    about newer column layouts..

    The whole point of having CSV logs is so you can load them into a
    database table without needing preprocessing tools. So I'm not going to
    be very receptive to changes that are predicated on using such tools.

    cheers

    andrew
  • Tom Lane at Jan 14, 2011 at 10:44 pm

    Aidan Van Dyk writes:
    If there is going to be any change, how about using fixed columns (an
    possibly allowing them to be empty for stuff that's expensive to
    create/write), but adding a 1st column that contains a "version"
    identifyer. And to make it easy, maybe the PG major version as the
    version value.
    Seems like that just adds even more overhead, without really solving any
    of the problems we're concerned about. Code consuming the CSV log would
    still need a-priori knowledge of what columns to expect.

    regards, tom lane
  • Tom Lane at Jan 14, 2011 at 10:08 pm

    Andrew Dunstan writes:
    On 01/14/2011 11:48 AM, Stephen Frost wrote:
    My first thought would be to have a 'log_csv_format' GUC that's very
    similar to 'log_line_prefix' (and uses the same variables if
    possible..). We could then ship a default in postgresql.conf that
    matches what the current format is while adding the other options if
    people want to use them.
    I'm not sure I really want to make it that flexible :-)
    It actually sounded like a pretty good idea to me. The current CSV
    format is already overly bulky/verbose, because it includes absolutely
    everything anybody ever wanted before now. Allowing people to select
    what they actually need, and thereby get rid of some of the overhead
    they're currently paying, would be a good thing.

    regards, tom lane
  • Andrew Dunstan at Jan 14, 2011 at 11:48 pm

    On 01/14/2011 05:08 PM, Tom Lane wrote:
    Andrew Dunstan<andrew@dunslane.net> writes:
    On 01/14/2011 11:48 AM, Stephen Frost wrote:
    My first thought would be to have a 'log_csv_format' GUC that's very
    similar to 'log_line_prefix' (and uses the same variables if
    possible..). We could then ship a default in postgresql.conf that
    matches what the current format is while adding the other options if
    people want to use them.
    I'm not sure I really want to make it that flexible :-)
    It actually sounded like a pretty good idea to me. The current CSV
    format is already overly bulky/verbose, because it includes absolutely
    everything anybody ever wanted before now. Allowing people to select
    what they actually need, and thereby get rid of some of the overhead
    they're currently paying, would be a good thing.

    If you have a format string, what do you want to do with the bits of the
    format that aren't field references? What about delimiters? A format
    string makes it too easy to muck up and too hard to get right, IMNSHO.
    History has shown how easy it is to muck up CSVs. The suggestion I made
    of allowing people to suppress production of certain columns would take
    care of the bulk problem much more safely, I think. We've actually had
    remarkably few issues with CSV logs not being loadable, that I know of
    anyway. When we implemented it, I expected many more issues with it than
    we've had. I'd like to keep it that way.

    cheers

    andrew
  • Tom Lane at Jan 15, 2011 at 1:00 am

    Andrew Dunstan writes:
    On 01/14/2011 05:08 PM, Tom Lane wrote:
    It actually sounded like a pretty good idea to me.
    If you have a format string, what do you want to do with the bits of the
    format that aren't field references?
    I was thinking of it as being strictly a field list. I don't know
    whether it's really practical to borrow log_line_prefix's one-character
    names for the fields (for one thing, there would need to be names for
    all the existing CSV columns, not all of which equate to log_line_prefix
    escapes); but in any case anything other than field references would be
    disallowed. If you prefer to use a name list as the syntax that's fine
    with me.

    regards, tom lane
  • Robert Haas at Jan 15, 2011 at 1:41 am

    On Fri, Jan 14, 2011 at 8:00 PM, Tom Lane wrote:
    Andrew Dunstan <andrew@dunslane.net> writes:
    On 01/14/2011 05:08 PM, Tom Lane wrote:
    It actually sounded like a pretty good idea to me.
    If you have a format string, what do you want to do with the bits of the
    format that aren't field references?
    I was thinking of it as being strictly a field list.  I don't know
    whether it's really practical to borrow log_line_prefix's one-character
    names for the fields (for one thing, there would need to be names for
    all the existing CSV columns, not all of which equate to log_line_prefix
    escapes); but in any case anything other than field references would be
    disallowed.  If you prefer to use a name list as the syntax that's fine
    with me.
    I think we're in the process of designing a manned mission to Mars to
    solve the problem that our shoelaces are untied.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Jan 15, 2011 at 2:24 am

    On 01/14/2011 08:41 PM, Robert Haas wrote:
    I think we're in the process of designing a manned mission to Mars to
    solve the problem that our shoelaces are untied.
    What's your suggestion, then?

    cheers

    andre
  • Robert Haas at Jan 15, 2011 at 12:14 pm

    On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstan wrote:
    On 01/14/2011 08:41 PM, Robert Haas wrote:
    I think we're in the process of designing a manned mission to Mars to
    solve the problem that our shoelaces are untied.
    What's your suggestion, then?
    If there's a practical way to add the requested escape, add it to the
    text format and leave reengineering the CSV format for another day.
    Yeah, I know that's not the most beautiful solution in the world, but
    we're doing engineering here, not theology.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Jan 15, 2011 at 4:08 pm

    Robert Haas writes:
    On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstan wrote:
    What's your suggestion, then?
    If there's a practical way to add the requested escape, add it to the
    text format and leave reengineering the CSV format for another day.
    Yeah, I know that's not the most beautiful solution in the world, but
    we're doing engineering here, not theology.
    Well, the original patch was exactly that. But I don't agree with that
    approach; I think allowing the capabilities of text and CSV logs to
    diverge significantly would be a mistake. If a piece of information is
    valuable enough to need a way to include it in textual log entries,
    then you need a way to include it in CSV log entries too. If it's not
    valuable enough to do the work to support it in CSV, then we can live
    without it.

    regards, tom lane
  • Andrew Dunstan at Jan 15, 2011 at 4:19 pm

    On 01/15/2011 11:08 AM, Tom Lane wrote:
    Robert Haas<robertmhaas@gmail.com> writes:
    On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstanwrote:
    What's your suggestion, then?
    If there's a practical way to add the requested escape, add it to the
    text format and leave reengineering the CSV format for another day.
    Yeah, I know that's not the most beautiful solution in the world, but
    we're doing engineering here, not theology.
    Well, the original patch was exactly that. But I don't agree with that
    approach; I think allowing the capabilities of text and CSV logs to
    diverge significantly would be a mistake. If a piece of information is
    valuable enough to need a way to include it in textual log entries,
    then you need a way to include it in CSV log entries too. If it's not
    valuable enough to do the work to support it in CSV, then we can live
    without it.
    Yeah, I agree, that's exactly the kind of divergence we usually try to
    avoid. And it's hardly theology to say let's not do a half-assed job on
    this.

    cheers

    andrew
  • Tom Lane at Jan 15, 2011 at 2:28 am

    Robert Haas writes:
    On Fri, Jan 14, 2011 at 8:00 PM, Tom Lane wrote:
    I was thinking of it as being strictly a field list.
    I think we're in the process of designing a manned mission to Mars to
    solve the problem that our shoelaces are untied.
    How so? ISTM the problems at hand are (a) we can't add a new CSV column
    without causing a flag day for users who may not even care about the
    information, and (b) we're worried that emitting all these columns may
    result in a performance hit, again for information that a particular
    user may not need. A user-settable column list seems pretty on-target
    for solving those problems to me.

    regards, tom lane
  • Stephen Frost at Jan 15, 2011 at 2:30 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    A user-settable column list seems pretty on-target
    for solving those problems to me.
    I'm looking into implementing this.

    An interesting initial question is- should the users be able to control
    the *order* of the columns? My gut feeling, if we're giving them a GUC
    that's a list of fields, is 'yes', but I'm happy to listen to other
    thoughts.

    Thanks,

    Stephen
  • Tom Lane at Jan 15, 2011 at 2:51 am

    Stephen Frost writes:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    A user-settable column list seems pretty on-target
    for solving those problems to me.
    I'm looking into implementing this.
    An interesting initial question is- should the users be able to control
    the *order* of the columns? My gut feeling, if we're giving them a GUC
    that's a list of fields, is 'yes', but I'm happy to listen to other
    thoughts.
    Yeah, I was just thinking about that in connection with the suggestion
    of using a bitmap as the pre-parsed representation (which would more or
    less force adoption of the fixed-column-order approach). I really think
    we can't get away with that. Remember what Andrew pointed out upthread:
    it's important to be able to load the csvlog output directly into a
    table without any extra processing. Suppose a DBA is logging columns
    A,B,D and he later realizes that logging C would be a good thing too.
    He's going to have to ALTER TABLE ADD COLUMN to add C to his logging
    table ... and now it's at the end. This is no problem if he can set the
    GUC to be "A,B,D,C" and have the field order be honored. Otherwise he's
    got a problem.

    In any case, if the GUC representation is a list of field names, I think
    the POLA demands that the system honor the list order. You could escape
    that expectation by controlling the feature with a pile of booleans
    (csv_log_pid = on, csv_log_timestamp = off, etc) but I can't say that
    that sort of API appeals to me.

    BTW, in case you didn't know, there are some GUCs defined as lists of
    identifiers already (look for GUC_LIST bits). Be sure to steal code.

    regards, tom lane
  • Andrew Dunstan at Jan 15, 2011 at 2:59 am

    On 01/14/2011 09:51 PM, Tom Lane wrote:
    Stephen Frost<sfrost@snowman.net> writes:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    A user-settable column list seems pretty on-target
    for solving those problems to me.
    I'm looking into implementing this.
    An interesting initial question is- should the users be able to control
    the *order* of the columns? My gut feeling, if we're giving them a GUC
    that's a list of fields, is 'yes', but I'm happy to listen to other
    thoughts.
    Yeah, I was just thinking about that in connection with the suggestion
    of using a bitmap as the pre-parsed representation (which would more or
    less force adoption of the fixed-column-order approach). I really think
    we can't get away with that. Remember what Andrew pointed out upthread:
    it's important to be able to load the csvlog output directly into a
    table without any extra processing. Suppose a DBA is logging columns
    A,B,D and he later realizes that logging C would be a good thing too.
    He's going to have to ALTER TABLE ADD COLUMN to add C to his logging
    table ... and now it's at the end. This is no problem if he can set the
    GUC to be "A,B,D,C" and have the field order be honored. Otherwise he's
    got a problem.
    Ok, you sold me. Until I read this I was inclined to say not, on KISS
    principles.


    The only thing I'd suggest extra is that we might allow "version_n_m" as
    shorthand for the default table layout from the relevant version.

    cheers

    andrew
  • Stephen Frost at Jan 15, 2011 at 3:07 am

    * Andrew Dunstan (andrew@dunslane.net) wrote:
    The only thing I'd suggest extra is that we might allow
    "version_n_m" as shorthand for the default table layout from the
    relevant version.
    I like that idea, makes the default a lot simpler to deal with too. :)

    Thanks!

    Stephen
  • Tom Lane at Jan 15, 2011 at 3:09 am

    Andrew Dunstan writes:
    The only thing I'd suggest extra is that we might allow "version_n_m" as
    shorthand for the default table layout from the relevant version.
    Mmm ... seems like that just complicates matters. To make that useful,
    you have to assume that there *is* a default table layout, it's
    different across versions, and everything that looks at this GUC value
    will know instantly what it is for each version. The last bit is kind
    of a killer for tools like pgfouine, no? In any case I thought the
    expectation here was that the default column list would be frozen at
    what it is now, and probably will never change.

    regards, tom lane
  • Stephen Frost at Jan 15, 2011 at 3:20 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    everything that looks at this GUC value
    will know instantly what it is for each version.
    The last bit is kind of a killer for tools like pgfouine, no?
    Ugh.. Could we just accept it as input but return the full list if
    asked for it>
    In any case I thought the
    expectation here was that the default column list would be frozen at
    what it is now, and probably will never change.
    This I don't like.. When I install a new version fresh, I like to get
    all of the "bells & whistles" that go along with it, which, in my view,
    would include new fields that the smart PG folks have decided might be
    useful for me. I'd like to provide a way for users who are upgrading to
    be able to get the old behavior back, to minimize the trouble for them,
    and being able to say "just change the version_9_1 to version_9_0 in
    your log_csv_fields GUC" is a heck of a lot better than "well, rip out
    the default and replace it with this huge list of fields".

    I'd been puzzling over how to deal with this big list of fields in
    postgresql.conf and I do like the idea of some kind of short-cut being
    provided to ease the pain for users. What about something other than
    version_x_y? I could maybe see having a 'default' and an 'all'
    instead.. Then have the default be what we have currently and 'all' be
    the full list I'm thinking about.

    Thanks,

    Stephen
  • Tom Lane at Jan 15, 2011 at 3:34 am

    Stephen Frost writes:
    I'd been puzzling over how to deal with this big list of fields in
    postgresql.conf and I do like the idea of some kind of short-cut being
    provided to ease the pain for users.
    Yeah, I agree with the worry that a default value that's a mile long
    is going to be a bit of a PITA. But I don't think we're there yet on
    having a better solution.
    What about something other than
    version_x_y? I could maybe see having a 'default' and an 'all'
    instead.. Then have the default be what we have currently and 'all' be
    the full list I'm thinking about.
    If "default" always means what it means today, I can live with that.
    But if the meaning of "all" changes from version to version, that seems
    like a royal mess. Again, I'm concerned that an external tool like
    pgfouine be able to make sense of the value without too much context.
    If it doesn't know what some of the columns are, it can just ignore them
    --- but a magic summary identifier that it doesn't understand at all is
    a problem.

    regards, tom lane
  • Alvaro Herrera at Jan 17, 2011 at 4:44 pm

    Excerpts from Tom Lane's message of sáb ene 15 00:34:40 -0300 2011:
    Stephen Frost <sfrost@snowman.net> writes:
    What about something other than
    version_x_y? I could maybe see having a 'default' and an 'all'
    instead.. Then have the default be what we have currently and 'all' be
    the full list I'm thinking about.
    If "default" always means what it means today, I can live with that.
    But if the meaning of "all" changes from version to version, that seems
    like a royal mess. Again, I'm concerned that an external tool like
    pgfouine be able to make sense of the value without too much context.
    If it doesn't know what some of the columns are, it can just ignore them
    --- but a magic summary identifier that it doesn't understand at all is
    a problem.
    Maybe if we offered a way for the utility to find out the field list
    from the magic identifier, it would be enough.

    (It would be neat to have magic identifiers for "terse", "verbose",
    etc, that mimicked the behavior of client processing)

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Andrew Dunstan at Jan 17, 2011 at 4:51 pm

    On 01/17/2011 11:44 AM, Alvaro Herrera wrote:
    Excerpts from Tom Lane's message of sáb ene 15 00:34:40 -0300 2011:
    Stephen Frost<sfrost@snowman.net> writes:
    What about something other than
    version_x_y? I could maybe see having a 'default' and an 'all'
    instead.. Then have the default be what we have currently and 'all' be
    the full list I'm thinking about.
    If "default" always means what it means today, I can live with that.
    But if the meaning of "all" changes from version to version, that seems
    like a royal mess. Again, I'm concerned that an external tool like
    pgfouine be able to make sense of the value without too much context.
    If it doesn't know what some of the columns are, it can just ignore them
    --- but a magic summary identifier that it doesn't understand at all is
    a problem.
    Maybe if we offered a way for the utility to find out the field list
    from the magic identifier, it would be enough.

    (It would be neat to have magic identifiers for "terse", "verbose",
    etc, that mimicked the behavior of client processing)
    Just output a header line with the column names. We've long been able to
    import such files. If the list of columns changes we should rotate log
    files before outputting the new format. That might get a little tricky
    to coordinate between backends.

    cheers

    andrew
  • Stephen Frost at Jan 15, 2011 at 3:05 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    In any case, if the GUC representation is a list of field names, I think
    the POLA demands that the system honor the list order.
    Agreed. That puts us back into the question of how to make it
    efficient. My best thought at the moment, which doesn't strike me as
    particularly efficient, is to build an array of the columns as enum's
    and then have loop through the array and use a switch() on the enum. At
    least it's all integer-based there then and we're not calling strcmp()
    for every field or strchr to find the next field, but couldn't we do
    better?
    BTW, in case you didn't know, there are some GUCs defined as lists of
    identifiers already (look for GUC_LIST bits). Be sure to steal code.
    No, I didn't.. Excellent.

    Thanks!

    Stephen
  • Tom Lane at Jan 15, 2011 at 3:16 am

    Stephen Frost writes:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    In any case, if the GUC representation is a list of field names, I think
    the POLA demands that the system honor the list order.
    Agreed. That puts us back into the question of how to make it
    efficient. My best thought at the moment, which doesn't strike me as
    particularly efficient, is to build an array of the columns as enum's
    and then have loop through the array and use a switch() on the enum.
    Yeah, an array or list of integer codes was what I was thinking too.
    At least it's all integer-based there then and we're not calling
    strcmp() for every field or strchr to find the next field, but
    couldn't we do better?
    I really doubt that the cycles spent in the loop + switch are going to
    amount to anything at all, compared to the cycles involved in formatting
    each field and then pushing it through the CSV logic. Not to mention
    the I/O costs of sending the string somewhere afterwards.

    regards, tom lane
  • Stephen Frost at Jan 15, 2011 at 3:25 am

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    Yeah, an array or list of integer codes was what I was thinking too.
    Hm, yeah, a list of integer codes might be even better/simpler.

    Okay, next user-interface question- thoughts on handling SIGHUP? My
    first reaction is that we should create a new log file on SIGHUP (if we
    don't already, havn't checked), or maybe just on SIGHUP if this variable
    changes.

    Point being, until we get Andrew's jagged-csv-import magic committed to
    core, we won't be able to import a log file that a user has changed the
    field list for mid-stream (following the add-a-new-column use-case we've
    been discussing).

    Thanks,

    Stephen
  • Tom Lane at Jan 15, 2011 at 3:43 am

    Stephen Frost writes:
    Okay, next user-interface question- thoughts on handling SIGHUP? My
    first reaction is that we should create a new log file on SIGHUP (if we
    don't already, havn't checked), or maybe just on SIGHUP if this variable
    changes.
    Point being, until we get Andrew's jagged-csv-import magic committed to
    core, we won't be able to import a log file that a user has changed the
    field list for mid-stream (following the add-a-new-column use-case we've
    been discussing).
    Now I think you're reaching the mission-to-mars stage that Robert was
    complaining about. Solving that sort of problem is well outside the
    scope of this patch. I don't care if people have to shut down and
    restart their servers in order to make a change to the log format.
    Even if I did, the other patch sounds like a better approach.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJan 12, '11 at 2:23p
activeMar 11, '11 at 2:21p
posts135
users12
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase