On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi wrote:
Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
shows stopped
5. However i could connect to the psql shell and get the prompt which means
the server is running.
I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Does anybody recall the specific reason for this? Do we need a way for
pg_ctl to figure this out, or do we need to change it in
PQconnecitonNeedsPassword()?

Search Discussions

  • Tom Lane at Sep 24, 2010 at 2:05 pm

    Magnus Hagander writes:
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.
    Why should it? That code is complicated enough, I don't think it needs
    to have a behavior of pretending that a wrong entry isn't there.

    regards, tom lane
  • Magnus Hagander at Sep 24, 2010 at 2:15 pm

    On Fri, Sep 24, 2010 at 16:04, Tom Lane wrote:
    Magnus Hagander <magnus@hagander.net> writes:
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.
    Why should it?  That code is complicated enough, I don't think it needs
    to have a behavior of pretending that a wrong entry isn't there.
    In that case, we should probably teach pg_ctl about this case, no?
    Since it clearly gives an incorrect message to the user now...
  • Andrew Dunstan at Sep 24, 2010 at 2:42 pm

    On 09/24/2010 10:15 AM, Magnus Hagander wrote:
    On Fri, Sep 24, 2010 at 16:04, Tom Lanewrote:
    Magnus Hagander<magnus@hagander.net> writes:
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.
    Why should it? That code is complicated enough, I don't think it needs
    to have a behavior of pretending that a wrong entry isn't there.
    In that case, we should probably teach pg_ctl about this case, no?
    Since it clearly gives an incorrect message to the user now...
    pg_ctl decides that the server is running iff it can connect to it. Do
    you intend to provide for a different test? Setting an incorrect
    password for the service account sounds like pilot error to me.

    cheers

    andrew
  • Tom Lane at Sep 24, 2010 at 3:15 pm

    Andrew Dunstan writes:
    On 09/24/2010 10:15 AM, Magnus Hagander wrote:
    In that case, we should probably teach pg_ctl about this case, no?
    Since it clearly gives an incorrect message to the user now...
    pg_ctl decides that the server is running iff it can connect to it. Do
    you intend to provide for a different test?
    Seems like getting a password challenge from the server is sufficient
    evidence that the server is running, whether we are able to meet the
    challenge or not. Perhaps we could just twiddle pg_ctl's "is it up"
    test a bit to notice whether the connect failure was of this sort.

    (Of course, a "pg_ping" utility would be a better answer, but nobody's
    gotten around to that in more than ten years, so I'm not holding my
    breath.)

    regards, tom lane
  • Dave Page at Sep 24, 2010 at 3:22 pm

    On Fri, Sep 24, 2010 at 4:11 PM, Tom Lane wrote:

    (Of course, a "pg_ping" utility would be a better answer, but nobody's
    gotten around to that in more than ten years, so I'm not holding my
    breath.)
    Hmm, that sounded like it could be my 9.1 mini project - then Google
    showed me that SeanC wrote something already.

    http://archives.postgresql.org/pgsql-patches/2003-07/msg00053.php

    --
    Dave Page
    Blog: http://pgsnake.blogspot.com
    Twitter: @pgsnake

    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 24, 2010 at 3:33 pm

    Dave Page writes:
    On Fri, Sep 24, 2010 at 4:11 PM, Tom Lane wrote:
    (Of course, a "pg_ping" utility would be a better answer, but nobody's
    gotten around to that in more than ten years, so I'm not holding my
    breath.)
    Hmm, that sounded like it could be my 9.1 mini project - then Google
    showed me that SeanC wrote something already.
    http://archives.postgresql.org/pgsql-patches/2003-07/msg00053.php
    Huh, I wonder why we never adopted that? Although I'd be inclined to
    do most of the heavy lifting inside libpq, myself, and this is way
    more verbose than what pg_ctl wants.

    regards, tom lane
  • Andrew Dunstan at Sep 24, 2010 at 8:31 pm

    On 09/24/2010 11:11 AM, Tom Lane wrote:
    Andrew Dunstan<andrew@dunslane.net> writes:
    On 09/24/2010 10:15 AM, Magnus Hagander wrote:
    In that case, we should probably teach pg_ctl about this case, no?
    Since it clearly gives an incorrect message to the user now...
    pg_ctl decides that the server is running iff it can connect to it. Do
    you intend to provide for a different test?
    Seems like getting a password challenge from the server is sufficient
    evidence that the server is running, whether we are able to meet the
    challenge or not. Perhaps we could just twiddle pg_ctl's "is it up"
    test a bit to notice whether the connect failure was of this sort.
    pg_ctl does in fact use that sort of logic:

    if ((conn = PQconnectdb(connstr)) != NULL &&
    (PQstatus(conn) == CONNECTION_OK ||
    PQconnectionNeedsPassword(conn)))


    But of course, libpq won't set that last condition if there is a bad
    password in the pgpass file, which seems a rather perverse thing to do.

    cheers

    andrew
  • Bruce Momjian at Nov 12, 2010 at 2:50 am

    Magnus Hagander wrote:
    On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
    wrote:
    Hi Mark,

    On of my college (Sujeet) has found a way to reproduce the same behaviour.
    1. Installed PG 9.0 on Win XP SP3
    2. Stop the Postgresql-9.0 service from service manager console
    3. Create pgpass.conf in postgres (service account) user's profile with an
    incorrect password deliberately.
    (Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
    4. Now start the postgresql-9.0 service, it will return an error and the
    status
    ?? shows stopped
    5. However i could connect to the psql shell and get the prompt which means
    ??? the server is running.
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.

    Does anybody recall the specific reason for this? Do we need a way for
    pg_ctl to figure this out, or do we need to change it in
    PQconnecitonNeedsPassword()?
    I was not able to reproduce this failure on my BSD system using GIT
    head:

    $ psql test
    psql: FATAL: password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"

    $ pg_ctl status
    pg_ctl: server is running (PID: 710)
    /usr/var/local/pgsql/bin/postgres "-i"

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Magnus Hagander at Nov 12, 2010 at 6:21 am

    On Fri, Nov 12, 2010 at 03:49, Bruce Momjian wrote:
    Magnus Hagander wrote:
    On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
    wrote:
    Hi Mark,

    On of my college (Sujeet) has found a way to reproduce the same behaviour.
    1. Installed PG 9.0 on Win XP SP3
    2. Stop the Postgresql-9.0 service from service manager console
    3. Create pgpass.conf in postgres (service account) user's profile with an
    incorrect password deliberately.
    (Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
    4. Now start the postgresql-9.0 service, it will return an error and the
    status
    ?? shows stopped
    5. However i could connect to the psql shell and get the prompt which means
    ??? the server is running.
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.

    Does anybody recall the specific reason for this? Do we need a way for
    pg_ctl to figure this out, or do we need to change it in
    PQconnecitonNeedsPassword()?
    I was not able to reproduce this failure on my BSD system using GIT
    head:

    $ psql test
    psql: FATAL:  password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"

    $ pg_ctl status
    pg_ctl: server is running (PID: 710)
    /usr/var/local/pgsql/bin/postgres "-i"
    The problem is not in pg_ctl status, it's in pg_ctl start. They're
    different codepaths - status never tries to actually connect, it just
    checks if the process is alive.
  • Bruce Momjian at Nov 12, 2010 at 2:08 pm

    Magnus Hagander wrote:
    On Fri, Nov 12, 2010 at 03:49, Bruce Momjian wrote:
    Magnus Hagander wrote:
    On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
    wrote:
    Hi Mark,

    On of my college (Sujeet) has found a way to reproduce the same behaviour.
    1. Installed PG 9.0 on Win XP SP3
    2. Stop the Postgresql-9.0 service from service manager console
    3. Create pgpass.conf in postgres (service account) user's profile with an
    incorrect password deliberately.
    (Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
    4. Now start the postgresql-9.0 service, it will return an error and the
    status
    ?? shows stopped
    5. However i could connect to the psql shell and get the prompt which means
    ??? the server is running.
    I took a quick look at the code, and from what I can tell this is
    because PQconnectionNeedsPassword() always returns false if a
    pgpass.conf has been used. There is no handling the case where pgpass
    is used, but has an incorrect password.

    Does anybody recall the specific reason for this? Do we need a way for
    pg_ctl to figure this out, or do we need to change it in
    PQconnecitonNeedsPassword()?
    I was not able to reproduce this failure on my BSD system using GIT
    head:

    ? ? ? ?$ psql test
    ? ? ? ?psql: FATAL: ?password authentication failed for user "postgres"
    ? ? ? ?password retrieved from file "/u/postgres/.pgpass"

    ? ? ? ?$ pg_ctl status
    ? ? ? ?pg_ctl: server is running (PID: 710)
    ? ? ? ?/usr/var/local/pgsql/bin/postgres "-i"
    The problem is not in pg_ctl status, it's in pg_ctl start. They're
    different codepaths - status never tries to actually connect, it just
    checks if the process is alive.
    Uh, I still cannot reproduce the failure:

    $ psql postgres
    psql: FATAL: password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"

    $ pg_ctl stop
    waiting for server to shut down.... done
    server stopped

    $ pg_ctl -l /dev/null start
    server starting

    (Got to love that new 9.0 pgpass error message.)

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Tom Lane at Nov 12, 2010 at 3:03 pm

    Bruce Momjian writes:
    Uh, I still cannot reproduce the failure:
    I would imagine you need -w option on the start. The whole issue
    here is whether start's wait-for-server-start code works.

    regards, tom lane
  • Bruce Momjian at Nov 12, 2010 at 4:47 pm

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Uh, I still cannot reproduce the failure:
    I would imagine you need -w option on the start. The whole issue
    here is whether start's wait-for-server-start code works.
    Thanks, I am now able to reproduce this. I was able to get this to
    report the .pgpass problem:

    $ psql postgres
    psql: FATAL: password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"

    $ pg_ctl stop
    waiting for server to shut down.... done
    server stopped

    $ pg_ctl -w -l /dev/null start
    waiting for server to start....FATAL: password authentication failed
    for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .FATAL: password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .FATAL: password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .^C

    I basically report the connection error string if it starts with "FATAL:".

    I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
    (see // comments), but it seems there is no way to access this, i.e.
    PQgetResult(conn) on a connection failure is always NULL.

    Anyway, perhaps FATAL is a better test because it will report any major
    failure, not just a .pgpass one.

    Patch attached.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Magnus Hagander at Nov 16, 2010 at 11:39 am

    On Fri, Nov 12, 2010 at 17:47, Bruce Momjian wrote:
    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Uh, I still cannot reproduce the failure:
    I would imagine you need -w option on the start.  The whole issue
    here is whether start's wait-for-server-start code works.
    Thanks, I am now able to reproduce this.  I was able to get this to
    report the .pgpass problem:

    $ psql postgres
    psql: FATAL:  password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"

    $ pg_ctl stop
    waiting for server to shut down.... done
    server stopped

    $ pg_ctl -w -l /dev/null start
    waiting for server to start....FATAL:  password authentication failed
    for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .FATAL:  password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .FATAL:  password authentication failed for user "postgres"
    password retrieved from file "/u/postgres/.pgpass"
    .^C

    I basically report the connection error string if it starts with "FATAL:".

    I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
    (see // comments), but it seems there is no way to access this, i.e.
    PQgetResult(conn) on a connection failure is always NULL.

    Anyway, perhaps FATAL is a better test because it will report any major
    failure, not just a .pgpass one.

    Patch attached.
    Bad Bruce, using C++ comments like that :P And non-context diff ;)

    Does this actually solve the *problem*, though? The problem is not
    what is reported on stdout/stderr, the problem is that the net result
    is that the server is reported as not started (by the service control
    manager) when it actually *is* started. In this case, stderr doesn't
    even go anywhere. What happens if you *don't* Ctrl-C it?
  • Bruce Momjian at Nov 17, 2010 at 6:50 pm

    Magnus Hagander wrote:
    I basically report the connection error string if it starts with "FATAL:".

    I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
    (see // comments), but it seems there is no way to access this, i.e.
    PQgetResult(conn) on a connection failure is always NULL.

    Anyway, perhaps FATAL is a better test because it will report any major
    failure, not just a .pgpass one.

    Patch attached.
    Bad Bruce, using C++ comments like that :P And non-context diff ;)
    That comment use was to highlight that those are not for commit, but
    there if people want to test.

    As far as the diff, it seems git-external-diff isn't portable to
    non-Linux systems; I will post a separate email on that.
    Does this actually solve the *problem*, though? The problem is not
    what is reported on stdout/stderr, the problem is that the net result
    is that the server is reported as not started (by the service control
    manager) when it actually *is* started. In this case, stderr doesn't
    even go anywhere. What happens if you *don't* Ctrl-C it?
    I was just going to post on that. :-) Right now, it prints the FATAL
    and keeps printing 60 times, then says not running. Should we just exit
    on FATAL and output a special exit string, or say running?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Magnus Hagander at Nov 17, 2010 at 6:52 pm

    On Wed, Nov 17, 2010 at 19:50, Bruce Momjian wrote:
    Magnus Hagander wrote:
    Does this actually solve the *problem*, though? The problem is not
    what is reported  on stdout/stderr, the problem is that the net result
    is that the server is reported as not started (by the service control
    manager) when it actually *is* started. In this case, stderr doesn't
    even go anywhere. What happens if you *don't* Ctrl-C it?
    I was just going to post on that.  :-)  Right now, it prints the FATAL
    and keeps printing 60 times, then says not running.  Should we just exit
    on FATAL and output a special exit string, or say running?
    From the perspective of the service control manager, it should say
    running. That might break other scenarios though, but i'm not sure - I
    think we can safely say the server is running when we try to log in
    and get a password failure.
  • Bruce Momjian at Nov 17, 2010 at 6:57 pm

    Magnus Hagander wrote:
    On Wed, Nov 17, 2010 at 19:50, Bruce Momjian wrote:
    Magnus Hagander wrote:
    Does this actually solve the *problem*, though? The problem is not
    what is reported ?on stdout/stderr, the problem is that the net result
    is that the server is reported as not started (by the service control
    manager) when it actually *is* started. In this case, stderr doesn't
    even go anywhere. What happens if you *don't* Ctrl-C it?
    I was just going to post on that. ?:-) ?Right now, it prints the FATAL
    and keeps printing 60 times, then says not running. ?Should we just exit
    on FATAL and output a special exit string, or say running?
    From the perspective of the service control manager, it should say
    running. That might break other scenarios though, but i'm not sure - I
    think we can safely say the server is running when we try to log in
    and get a password failure.
    That was another part of the discussion. Right now we report any FATAL,
    so it might be a password problem, or something else, and it seems doing
    all FATALs is the best idea because it will catch any other cases like
    this.

    Is FATAL, in general, enough to conclude the server is running?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Magnus Hagander at Nov 17, 2010 at 7:08 pm

    On Wed, Nov 17, 2010 at 19:57, Bruce Momjian wrote:
    Magnus Hagander wrote:
    On Wed, Nov 17, 2010 at 19:50, Bruce Momjian wrote:
    Magnus Hagander wrote:
    Does this actually solve the *problem*, though? The problem is not
    what is reported ?on stdout/stderr, the problem is that the net result
    is that the server is reported as not started (by the service control
    manager) when it actually *is* started. In this case, stderr doesn't
    even go anywhere. What happens if you *don't* Ctrl-C it?
    I was just going to post on that. ?:-) ?Right now, it prints the FATAL
    and keeps printing 60 times, then says not running. ?Should we just exit
    on FATAL and output a special exit string, or say running?
    From the perspective of the service control manager, it should say
    running. That might break other scenarios though, but i'm not sure - I
    think we can safely say the server is running when we try to log in
    and get a password failure.
    That was another part of the discussion.  Right now we report any FATAL,
    so it might be a password problem, or something else, and it seems doing
    all FATALs is the best idea because it will catch any other cases like
    this.

    Is FATAL, in general, enough to conclude the server is running?
    No - specifically, we will send FATAL when "the database system is
    starting up", which is exactly the one we want to *avoid*.

    I think we should only exclude the password case. I guess we could
    also do all fatal *except* <list>, but that seems more fragile.
  • Tom Lane at Nov 17, 2010 at 7:21 pm

    Magnus Hagander writes:
    On Wed, Nov 17, 2010 at 19:57, Bruce Momjian wrote:
    Is FATAL, in general, enough to conclude the server is running?
    No - specifically, we will send FATAL when "the database system is
    starting up", which is exactly the one we want to *avoid*.
    I think we should only exclude the password case. I guess we could
    also do all fatal *except* <list>, but that seems more fragile.
    I believe that the above argument is exactly backwards. What we want
    here is to check the result of postmaster.c's canAcceptConnections(),
    and there are only a finite number of error codes that can result from
    rejections there. If we get past that, there are a large number of
    possible failures, but all of them indicate that the postmaster is in
    principle willing to accept connections. Checking for password errors
    only is utterly wrong: any other type of auth failure would be the same
    for this purpose, as would "no such database", "no such user", "too many
    connections", etc etc etc.

    What we actually want here, and don't have, is the fabled pg_ping
    protocol...

    regards, tom lane
  • Bruce Momjian at Nov 17, 2010 at 7:47 pm

    Tom Lane wrote:
    Magnus Hagander <magnus@hagander.net> writes:
    On Wed, Nov 17, 2010 at 19:57, Bruce Momjian wrote:
    Is FATAL, in general, enough to conclude the server is running?
    No - specifically, we will send FATAL when "the database system is
    starting up", which is exactly the one we want to *avoid*.
    I think we should only exclude the password case. I guess we could
    also do all fatal *except* <list>, but that seems more fragile.
    I believe that the above argument is exactly backwards. What we want
    here is to check the result of postmaster.c's canAcceptConnections(),
    and there are only a finite number of error codes that can result from
    rejections there. If we get past that, there are a large number of
    possible failures, but all of them indicate that the postmaster is in
    principle willing to accept connections. Checking for password errors
    only is utterly wrong: any other type of auth failure would be the same
    for this purpose, as would "no such database", "no such user", "too many
    connections", etc etc etc.
    Agreed. So how do we pass that info to libpq without exceeding the
    value of fixing this problem? Should we parse pg_controldata output?
    pg_upgrade could use machine-readable output from that too.
    What we actually want here, and don't have, is the fabled pg_ping
    protocol...
    Well, we are basically figuring how to implement that with this fix,
    whether it is part of pg_ctl or a separate binary.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Tom Lane at Nov 17, 2010 at 7:55 pm

    Bruce Momjian writes:
    Agreed. So how do we pass that info to libpq without exceeding the
    value of fixing this problem? Should we parse pg_controldata output?
    pg_upgrade could use machine-readable output from that too.
    pg_controldata seems 100% unrelated to this problem. You cannot even
    tell if the postmaster is alive just by inspecting pg_control.
    What we actually want here, and don't have, is the fabled pg_ping
    protocol...
    Well, we are basically figuring how to implement that with this fix,
    whether it is part of pg_ctl or a separate binary.
    Possibly the cleanest fix is to implement pg_ping as a libpq function.
    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.

    regards, tom lane
  • Bruce Momjian at Nov 17, 2010 at 8:03 pm

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Agreed. So how do we pass that info to libpq without exceeding the
    value of fixing this problem? Should we parse pg_controldata output?
    pg_upgrade could use machine-readable output from that too.
    pg_controldata seems 100% unrelated to this problem. You cannot even
    tell if the postmaster is alive just by inspecting pg_control.
    I was thinking of this:

    $ pg_controldata /u/pg/data
    ...
    Database cluster state: shut down
    What we actually want here, and don't have, is the fabled pg_ping
    protocol...
    Well, we are basically figuring how to implement that with this fix,
    whether it is part of pg_ctl or a separate binary.
    Possibly the cleanest fix is to implement pg_ping as a libpq function.
    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.
    OK, so a new libpq function --- got it. Would we just pass the status
    from the backend or can it be done without backend modifications?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Tom Lane at Nov 17, 2010 at 10:24 pm

    Bruce Momjian writes:
    Tom Lane wrote:
    Possibly the cleanest fix is to implement pg_ping as a libpq function.
    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.
    OK, so a new libpq function --- got it. Would we just pass the status
    from the backend or can it be done without backend modifications?
    It would definitely be better to do it without backend mods, so that
    the functionality would work against back-branch postmasters.

    To my mind, the entire purpose of such a function is to classify the
    possible errors so that the caller doesn't have to. So I wouldn't
    consider that it ought to "pass back the status from the backend".
    I think what we basically want is a function that takes a conninfo
    string (or one of the variants of that) and returns an enum defined
    more or less like this:

    * failed to connect to postmaster
    * connected, but postmaster is not accepting sessions
    * postmaster is up and accepting sessions

    I'm not sure those are exactly the categories we want, but something
    close to that. In particular, I don't know if there's any value in
    subdividing the "not accepting sessions" status --- pg_ctl doesn't
    really care, but other use-cases might want to tell the difference
    between the various canAcceptConnections failure states.

    BTW, it is annoying that we can't definitively distinguish "postmaster
    is not running" from a connectivity problem, but I can't see a way
    around that.

    regards, tom lane
  • Bruce Momjian at Nov 18, 2010 at 2:54 am

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Tom Lane wrote:
    Possibly the cleanest fix is to implement pg_ping as a libpq function.
    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.
    OK, so a new libpq function --- got it. Would we just pass the status
    from the backend or can it be done without backend modifications?
    It would definitely be better to do it without backend mods, so that
    the functionality would work against back-branch postmasters.

    To my mind, the entire purpose of such a function is to classify the
    possible errors so that the caller doesn't have to. So I wouldn't
    consider that it ought to "pass back the status from the backend".
    I think what we basically want is a function that takes a conninfo
    string (or one of the variants of that) and returns an enum defined
    more or less like this:

    * failed to connect to postmaster
    * connected, but postmaster is not accepting sessions
    * postmaster is up and accepting sessions

    I'm not sure those are exactly the categories we want, but something
    close to that. In particular, I don't know if there's any value in
    subdividing the "not accepting sessions" status --- pg_ctl doesn't
    really care, but other use-cases might want to tell the difference
    between the various canAcceptConnections failure states.

    BTW, it is annoying that we can't definitively distinguish "postmaster
    is not running" from a connectivity problem, but I can't see a way
    around that.
    Agreed. I will research this.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Bruce Momjian at Nov 24, 2010 at 5:14 am

    Bruce Momjian wrote:
    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Tom Lane wrote:
    Possibly the cleanest fix is to implement pg_ping as a libpq function.
    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.
    OK, so a new libpq function --- got it. Would we just pass the status
    from the backend or can it be done without backend modifications?
    It would definitely be better to do it without backend mods, so that
    the functionality would work against back-branch postmasters.

    To my mind, the entire purpose of such a function is to classify the
    possible errors so that the caller doesn't have to. So I wouldn't
    consider that it ought to "pass back the status from the backend".
    I think what we basically want is a function that takes a conninfo
    string (or one of the variants of that) and returns an enum defined
    more or less like this:

    * failed to connect to postmaster
    * connected, but postmaster is not accepting sessions
    * postmaster is up and accepting sessions

    I'm not sure those are exactly the categories we want, but something
    close to that. In particular, I don't know if there's any value in
    subdividing the "not accepting sessions" status --- pg_ctl doesn't
    really care, but other use-cases might want to tell the difference
    between the various canAcceptConnections failure states.

    BTW, it is annoying that we can't definitively distinguish "postmaster
    is not running" from a connectivity problem, but I can't see a way
    around that.
    Agreed. I will research this.
    I have researched this and developed the attached patch. It implements
    PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
    -w server status detection.

    The new output for cases where .pgpass is not allowing for a connection
    is:

    $ pg_ctl -w -l /dev/null start
    waiting for server to start.... done
    server started
    However, could not connect, perhaps due to invalid authentication or
    misconfiguration.

    The code basically checks the connection status between PQconnectStart()
    and connectDBComplete() to see if the server is running but we failed to
    connect for some reason.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Bruce Momjian at Nov 25, 2010 at 6:12 pm

    Bruce Momjian wrote:
    BTW, it is annoying that we can't definitively distinguish "postmaster
    is not running" from a connectivity problem, but I can't see a way
    around that.
    Agreed. I will research this.
    I have researched this and developed the attached patch. It implements
    PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
    -w server status detection.

    The new output for cases where .pgpass is not allowing for a connection
    is:

    $ pg_ctl -w -l /dev/null start
    waiting for server to start.... done
    server started
    However, could not connect, perhaps due to invalid authentication or
    misconfiguration.

    The code basically checks the connection status between PQconnectStart()
    and connectDBComplete() to see if the server is running but we failed to
    connect for some reason.
    I have applied this patch, with modified wording of the "cannot connect"
    case:

    $ pg_ctl -w -l /dev/null start
    waiting for server to start.... done
    server started
    warning: could not connect, perhaps due to invalid authentication or
    misconfiguration.

    I assume having the warning as the last printed things is appropriate.
    This is my second patch this week that got little feedback --- I am
    getting a little spooked. ;-)

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Fujii Masao at Nov 26, 2010 at 4:25 am

    On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian wrote:
    I have applied this patch, with modified wording of the "cannot connect"
    case:

    $ pg_ctl -w -l /dev/null start
    waiting for server to start.... done
    server started
    warning:  could not connect, perhaps due to invalid authentication or
    misconfiguration.
    This patch breaks the behavior that "pg_ctl -w start" waits until the standby
    has been ready to accept read-only queries. IOW, pg_ctl without this patch
    continues to check the connection even if the connection is rejected because
    the database has not been consistent yet. But pg_ctl with this patch treats
    that rejection as success of the standby starting and prints the above
    messages.

    I agree to treat the receipt of password request from the server as success
    of the server starting. But I don't think that we should treat other rejection
    cases that way and change the existing behavior.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Bruce Momjian at Nov 26, 2010 at 7:20 pm

    Fujii Masao wrote:
    On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian wrote:
    I have applied this patch, with modified wording of the "cannot connect"
    case:

    ? ? ? ?$ pg_ctl -w -l /dev/null start
    ? ? ? ?waiting for server to start.... done
    ? ? ? ?server started
    ? ? ? ?warning: ?could not connect, perhaps due to invalid authentication or
    ? ? ? ?misconfiguration.
    This patch breaks the behavior that "pg_ctl -w start" waits until the standby
    has been ready to accept read-only queries. IOW, pg_ctl without this patch
    continues to check the connection even if the connection is rejected because
    the database has not been consistent yet. But pg_ctl with this patch treats
    that rejection as success of the standby starting and prints the above
    messages.

    I agree to treat the receipt of password request from the server as success
    of the server starting. But I don't think that we should treat other rejection
    cases that way and change the existing behavior.
    OK, that is easy to fix. The only downside is that if you misconfigured
    .pgpass (which is what I used for testing), you have to wait 60 seconds
    to get the "cannot connect" error message. Is that OK?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Dmitriy Igrishin at Nov 26, 2010 at 8:26 pm
    Hey hackers,

    I am sorry, but is it possible to implement BTW ability to
    check exactly status of authentication from libpq ? As for now,
    the only way to check failed authentication is parsing the error
    message, that is sadly.

    2010/11/26 Bruce Momjian <bruce@momjian.us>
    Fujii Masao wrote:
    On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian wrote:
    I have applied this patch, with modified wording of the "cannot
    connect"
    case:

    ? ? ? ?$ pg_ctl -w -l /dev/null start
    ? ? ? ?waiting for server to start.... done
    ? ? ? ?server started
    ? ? ? ?warning: ?could not connect, perhaps due to invalid
    authentication or
    ? ? ? ?misconfiguration.
    This patch breaks the behavior that "pg_ctl -w start" waits until the standby
    has been ready to accept read-only queries. IOW, pg_ctl without this patch
    continues to check the connection even if the connection is rejected because
    the database has not been consistent yet. But pg_ctl with this patch treats
    that rejection as success of the standby starting and prints the above
    messages.

    I agree to treat the receipt of password request from the server as success
    of the server starting. But I don't think that we should treat other rejection
    cases that way and change the existing behavior.
    OK, that is easy to fix. The only downside is that if you misconfigured
    .pgpass (which is what I used for testing), you have to wait 60 seconds
    to get the "cannot connect" error message. Is that OK?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    // Dmitriy.
  • Tom Lane at Nov 26, 2010 at 8:31 pm

    Bruce Momjian writes:
    Fujii Masao wrote:
    I agree to treat the receipt of password request from the server as success
    of the server starting. But I don't think that we should treat other rejection
    cases that way and change the existing behavior.
    OK, that is easy to fix.
    It's wrong though. If you get back a "password rejected" error, or most
    other types of errors, it still indicates that the server started.
    We just went over this a few days ago.
    The only downside is that if you misconfigured
    .pgpass (which is what I used for testing), you have to wait 60 seconds
    to get the "cannot connect" error message. Is that OK?
    No; it's useless and unnecessary behavior.

    regards, tom lane
  • Tom Lane at Nov 26, 2010 at 8:39 pm

    Bruce Momjian writes:
    Fujii Masao wrote:
    This patch breaks the behavior that "pg_ctl -w start" waits until the standby
    has been ready to accept read-only queries. IOW, pg_ctl without this patch
    continues to check the connection even if the connection is rejected because
    the database has not been consistent yet. But pg_ctl with this patch treats
    that rejection as success of the standby starting and prints the above
    messages.
    The reason this is a problem is that somebody, in a fit of inappropriate
    optimization, took out the code that allowed canAcceptConnections to
    distinguish the "not consistent yet" state. We need to put that back,
    not try to kluge around the problem from the client side.

    regards, tom lane
  • Tom Lane at Nov 26, 2010 at 11:30 pm

    I wrote:
    The reason this is a problem is that somebody, in a fit of inappropriate
    optimization, took out the code that allowed canAcceptConnections to
    distinguish the "not consistent yet" state.
    Oh, no, that's not the case --- the PM_RECOVERY postmaster state does
    still distinguish not-ready from ready. The real problem is that what
    Bruce implemented has practically nothing to do with what was discussed
    last week. PQping is supposed to be smarter about classifying errors
    than this.

    Speaking of classifying errors, should we have a fourth result value to
    cover "obviously bogus parameters"? Right now you'll get PQNORESPONSE
    for cases like incorrect syntax in the conninfo string. I'm not sure
    how tense we ought to try to be about distinguishing, but if libpq
    failed before even attempting a connection, PQNORESPONSE seems a bit
    misleading.

    regards, tom lane
  • Robert Haas at Nov 27, 2010 at 12:16 am

    On Fri, Nov 26, 2010 at 6:30 PM, Tom Lane wrote:
    Speaking of classifying errors, should we have a fourth result value to
    cover "obviously bogus parameters"?
    +1.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Bruce Momjian at Nov 27, 2010 at 12:46 am

    Tom Lane wrote:
    I wrote:
    The reason this is a problem is that somebody, in a fit of inappropriate
    optimization, took out the code that allowed canAcceptConnections to
    distinguish the "not consistent yet" state.
    Oh, no, that's not the case --- the PM_RECOVERY postmaster state does
    still distinguish not-ready from ready. The real problem is that what
    Bruce implemented has practically nothing to do with what was discussed
    last week. PQping is supposed to be smarter about classifying errors
    than this.
    I was not aware this was discussed last week because I am behind on
    email. I was fixing a report from a month ago. I did explain how I was
    doing the tests.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Tom Lane at Nov 27, 2010 at 6:39 am

    Bruce Momjian writes:
    Tom Lane wrote:
    PQping is supposed to be smarter about classifying errors
    than this.
    I was not aware this was discussed last week because I am behind on
    email. I was fixing a report from a month ago. I did explain how I was
    doing the tests.
    Um, you did respond in that thread, several times even:
    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01102.php
    so I kind of assumed that the patch you presented this week did
    what was agreed to last week.

    I have committed a patch to make PQping do what was agreed to.

    regards, tom lane
  • Bruce Momjian at Nov 27, 2010 at 4:10 pm

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Tom Lane wrote:
    PQping is supposed to be smarter about classifying errors
    than this.
    I was not aware this was discussed last week because I am behind on
    email. I was fixing a report from a month ago. I did explain how I was
    doing the tests.
    Um, you did respond in that thread, several times even:
    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01102.php
    so I kind of assumed that the patch you presented this week did
    what was agreed to last week.
    Yes, I do remember that, but I remember this:

    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01095.php

    What we want here is to check the result of postmaster.c's
    canAcceptConnections(),

    and this:

    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01106.php

    You do have to distinguish connection failures (ie connection refused)
    from errors that came back from the postmaster, and the easiest place to
    be doing that is inside libpq.

    which I thought meant it had to be done in libpq and we didn't have
    access to the postmaster return codes in libpq.

    Your changes look very good, and not something I would have been able to
    code.
    I have committed a patch to make PQping do what was agreed to.
    Thanks.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 24, '10 at 1:23p
activeNov 27, '10 at 4:10p
posts36
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase