hi,
got (another time, it's not the first thing) this situation:
1. server: 64gb of ram, 24 cores of xeon processors (hyperthreading is off), storage is fusion io.
2. at one point load skyrocketed to 48 (i know, it's not high with this
hardware)
3. we were not able to connect to PostgreSQL from psql - psql when run
just froze.
4. postgresql: PostgreSQL 8.3.12 on x86_64-redhat-linux-gnu, compiled by
GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)

What I did gather:
straced postmaster when the problem was happening, and I was opening new
connections. strace looks like this:

16:54:36.886720 select(6, [3 4 5], NULL, NULL, {54, 366000}) = 1 (in [4], left {51, 563000})
16:54:39.768318 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.770795 accept(4, {sa_family=AF_INET, sin_port=htons(59661), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8
16:54:39.773895 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0
16:54:39.774522 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:54:39.774579 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
16:54:39.774729 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b06874a4460) = 12891
16:54:39.854208 close(8) = 0
16:54:39.855800 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
16:54:39.856853 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {57, 842000})
16:54:42.018570 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:42.027079 accept(4, {sa_family=AF_INET, sin_port=htons(59662), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8
16:54:42.046511 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0
16:54:42.095917 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:54:42.109190 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
16:54:42.111603 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b06874a4460) = 12912
16:54:42.114379 close(8) = 0
16:54:42.120551 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
16:54:42.128922 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {58, 593000})

so, everything looks right.

but - I was also stracing backends - just after creation (I did strace -ff -p <postmaster.pid> -o x.log).

strace from single backend, just forked off:

16:54:39.852638 close(3) = 0
16:54:39.852823 close(4) = 0
16:54:39.852872 close(5) = 0
16:54:39.852910 close(6) = 0
16:54:39.852994 setsid() = 12891
16:54:39.853050 rt_sigaction(SIGTERM, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.853169 rt_sigaction(SIGQUIT, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.853215 rt_sigaction(SIGALRM, {0x5ae490, [], SA_RESTORER, 0x3bd66302d0}, {0x1, [], SA_RESTORER, 0x3bd66302d0}, 8) = 0
16:54:39.853259 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV ALRM TERM CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.853504 write(2, "\0\0^\0[2\0\0t2010-11-29 16:54:39 UTC [12891]: [1-1] LOG: connection received: host=10.50.50.7 port=59661\n", 103) = 103
16:54:39.853622 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
16:54:39.853689 recvfrom(8, "\0\0\0\10\4\322\26/", 8192, 0, NULL, NULL) = 8
16:54:39.853822 sendto(8, "N", 1, 0, NULL, 0) = 1
16:54:39.853901 recvfrom(8, "\0\0\0%\0\3\0\0user\0web\0database\0xxxxxxxxx\0\0", 8192, 0, NULL, NULL) = 37
16:54:39.856007 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
16:54:39.856059 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.856130 write(2, "\0\0r\0[2\0\0t2010-11-29 16:54:39 UTC 10.50.50.7(59661) [12891]: [2-1] LOG: connection authorized: user=web database=xxxxxxxxx\n", 123) = 123
16:54:39.856255 rt_sigaction(SIGHUP, {0x5ad340, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588a90, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856315 rt_sigaction(SIGINT, {0x5ae350, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856359 rt_sigaction(SIGTERM, {0x5ae3f0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856402 rt_sigaction(SIGQUIT, {0x5ae4a0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856445 rt_sigaction(SIGALRM, {0x5a6160, [], SA_RESTORER, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER, 0x3bd66302d0}, 8) = 0
16:54:39.856484 rt_sigaction(SIGPIPE, {0x1, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x1, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856526 rt_sigaction(SIGUSR1, {0x5a1740, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5888b0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856565 rt_sigaction(SIGUSR2, {0x4d7030, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x584a50, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, 8) = 0
16:54:39.856819 rt_sigaction(SIGFPE, {0x5ad7b0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {SIG_DFL, [], 0}, 8) = 0
16:54:39.856929 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x3bd66302d0}, {0x588050, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x3bd66302d0}, 8) = 0
16:54:39.856978 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.857043 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796de9000
16:54:39.944561 semctl(7602213, 5, SETVAL, 0) = 0
16:54:39.944644 brk(0x3fbe000) = 0x3fbe000
16:54:39.944694 open("global/pg_database", O_RDONLY) = 3
16:54:39.944758 fstat(3, {st_mode=S_IFREG|0600, st_size=270, ...}) = 0
16:54:39.944811 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796fea000
16:54:39.944843 read(3, "\"template1\" 1 1663 456490577\n\"template0\" 11510 1663 527228657\n\"postgres\" 11511 1663 506289897\n\"xxxxxxxxx\" 16387 1663 457831935\n\"yyyy\" 31851 1663 427191013\n\"aaaa\" 31853 1663 427191021\n\"bbbb\" 31854 1663 427191048\n\"cccc\" 31855 1663 427191069\n\"ddddd\" 1477300 1"..., 4096) = 270
16:54:39.944921 close(3) = 0
16:54:39.944954 munmap(0x2b0796fea000, 4096) = 0
16:54:39.945088 semop(7602213, 0x7fffceae7110, 1

and that's all - backend was not showing anything else.

All other backends that were created at this time were doing the same thing - waiting on this semop.

What's more - when I tried to stop pg (pg_ctl -m fast), all backends died except for one.

This single backend was connected from very remote location, with not really good network connectivity between it and postgresql server.

I did strace this single backend, but then I had to pg_ctl -m immediate.

strace showed:

23540 sendto(8, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 838, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
23540 --- SIGQUIT (Quit) @ 0 (0) ---
23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [623-1] WARNING: terminating connection because of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [624-1] DETAIL: The postmaster has commanded this server p"..., 549) = 549
23540 write(2, "\0\0\204\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [626-1] FATAL: terminating connection due to administrator command\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [627-1] STATEMENT: SELECT table.id, table.some_f1, table.some_f2, "..., 653) = 653
23540 sendto(7, "\1\0\0\0X\2\0\0\3@\0\0\6\0\0\0\1\0\0\0\1\0\0\0p.-\0\0\0\0\0\1\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\212I\0\0\377\177\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\200\0\0\377\177\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 600, 0, NULL, 0) = 600
23540 --- SIGQUIT (Quit) @ 0 (0) ---
23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [628-1] WARNING: terminating connection because of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [629-1] DETAIL: The postmaster has commanded this server p"..., 549) = 549
23540 exit_group(2) = ?

Now, the question is: why did it hang? Is there anything we can do to
make it *not* hang?

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Search Discussions

  • Tom Lane at Nov 29, 2010 at 6:23 pm

    hubert depesz lubaczewski writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?

    regards, tom lane
  • Hubert depesz lubaczewski at Nov 29, 2010 at 6:24 pm

    On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?
    just some select.

    as far as I know there are no backends with such commands - drop/rename
    db.

    depesz

    --
    Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
    jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
  • Hubert depesz lubaczewski at Nov 29, 2010 at 6:34 pm

    On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?
    verified in logs. there were no alter database or drop database
    statements.

    Best regards,

    depesz

    --
    Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
    jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
  • Vivek Khera at Nov 29, 2010 at 8:57 pm

    On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to.  AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE.  What was
    that other session doing?
    Every once in a while when I connect to my big DB it feels like it
    stalls the connection for a few seconds. I have no idea what causes
    it, but it does feel like it has to do with the load. The load is not
    all that high relative to what my box can handle. It doesn't happen
    often enough for me to track it down, though. It instinct is that it
    is waiting on a lock, but clearly there is no rename/drop happening on
    my main db else i'd be out of business :-) I too run 8.3 as primary
    right now... testing 9.0 for deployment soon-ish.
  • Joshua D. Drake at Nov 29, 2010 at 9:02 pm

    On Mon, 2010-11-29 at 15:57 -0500, Vick Khera wrote:
    On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?
    Every once in a while when I connect to my big DB it feels like it
    stalls the connection for a few seconds. I have no idea what causes
    it, but it does feel like it has to do with the load. The load is not
    all that high relative to what my box can handle. It doesn't happen
    often enough for me to track it down, though. It instinct is that it
    is waiting on a lock, but clearly there is no rename/drop happening on
    my main db else i'd be out of business :-) I too run 8.3 as primary
    right now... testing 9.0 for deployment soon-ish.
    Initiating the connection all the caches etc? Perhaps trying to allocate
    shared memory?

    JD

    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
  • Ben Chobot at Nov 29, 2010 at 9:03 pm

    On Nov 29, 2010, at 12:57 PM, Vick Khera wrote:
    On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?
    Every once in a while when I connect to my big DB it feels like it
    stalls the connection for a few seconds. I have no idea what causes
    it, but it does feel like it has to do with the load. The load is not
    all that high relative to what my box can handle. It doesn't happen
    often enough for me to track it down, though. It instinct is that it
    is waiting on a lock, but clearly there is no rename/drop happening on
    my main db else i'd be out of business :-) I too run 8.3 as primary
    right now... testing 9.0 for deployment soon-ish.
    We're having similar issues on 8.4.[245]... occasionally psql takes anywhere from a few to several dozen seconds to connect. I've been unsuccessfully trying to blame spikes in the OS run queue (we desperately need some connection pooling) but if it's something to do with locks I can't see in pg_locks, that would explain why I haven't been able to figure out what's going on yet....
  • Dusan Misic at Nov 30, 2010 at 7:29 am

    On Mon, Nov 29, 2010 at 10:03 PM, Ben Chobot wrote:
    On Nov 29, 2010, at 12:57 PM, Vick Khera wrote:
    On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening
    new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to. AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE. What was
    that other session doing?
    Every once in a while when I connect to my big DB it feels like it
    stalls the connection for a few seconds. I have no idea what causes
    it, but it does feel like it has to do with the load. The load is not
    all that high relative to what my box can handle. It doesn't happen
    often enough for me to track it down, though. It instinct is that it
    is waiting on a lock, but clearly there is no rename/drop happening on
    my main db else i'd be out of business :-) I too run 8.3 as primary
    right now... testing 9.0 for deployment soon-ish.
    We're having similar issues on 8.4.[245]... occasionally psql takes
    anywhere from a few to several dozen seconds to connect. I've been
    unsuccessfully trying to blame spikes in the OS run queue (we desperately
    need some connection pooling) but if it's something to do with locks I can't
    see in pg_locks, that would explain why I haven't been able to figure out
    what's going on yet....
    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general
    This is normal. PostgreSQL needs to create new server process to handle your
    requested connection.

    Then it needs to allocate resources to that new connection. It initializes
    shared memory for that connection. That is the stall you are mentioning.

    If you are time sensitive on connection times, then use connection pooling
    software like pg-bouncer or pg-pool2. Pooling will keep your server instance
    "hot" and those lags won't happen.
  • Craig Ringer at Nov 30, 2010 at 9:21 am

    On 11/30/2010 03:28 PM, Dusan Misic wrote:
    We're having similar issues on 8.4.[245]... occasionally psql takes
    anywhere from a few to several dozen seconds to connect. I've been
    unsuccessfully trying to blame spikes in the OS run queue (we
    desperately need some connection pooling) but if it's something to
    do with locks I can't see in pg_locks, that would explain why I
    haven't been able to figure out what's going on yet....
    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


    This is normal. PostgreSQL needs to create new server process to handle
    your requested connection.

    Then it needs to allocate resources to that new connection. It
    initializes shared memory for that connection. That is the stall you are
    mentioning.
    Eh, what?

    Forking a backend and attaching to shared memory should *not* take "a
    few seconds". On my test machine it takes 100ms to fork psql, connect to
    the postmaster, fork a backend, init the backend, authenticate, run a
    dummy query and exit psql.

    If you're seeing delays like that, your machine is horrifyingly
    overloaded or there's something else wrong.

    --
    Craig Ringer
  • Dusan Misic at Nov 30, 2010 at 9:38 am

    On Tue, Nov 30, 2010 at 10:20 AM, Craig Ringer wrote:
    On 11/30/2010 03:28 PM, Dusan Misic wrote:

    We're having similar issues on 8.4.[245]... occasionally psql takes
    anywhere from a few to several dozen seconds to connect. I've been
    unsuccessfully trying to blame spikes in the OS run queue (we
    desperately need some connection pooling) but if it's something to
    do with locks I can't see in pg_locks, that would explain why I
    haven't been able to figure out what's going on yet....
    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


    This is normal. PostgreSQL needs to create new server process to handle
    your requested connection.

    Then it needs to allocate resources to that new connection. It
    initializes shared memory for that connection. That is the stall you are
    mentioning.
    Eh, what?

    Forking a backend and attaching to shared memory should *not* take "a few
    seconds". On my test machine it takes 100ms to fork psql, connect to the
    postmaster, fork a backend, init the backend, authenticate, run a dummy
    query and exit psql.

    If you're seeing delays like that, your machine is horrifyingly overloaded
    or there's something else wrong.

    --
    Craig Ringer
    This is not happening to me. Even on my home computer (which is terribly
    slow single core Sempron 2600+ with 128 kB of L2 cache, 1.5 GB DDR RAM and
    with slow ATA hard disks) it is fast to connect. Maximum (slowest)
    connection time is about 500 ms. I did some tuning in postgresql.conf.

    I'm very happy with PostgreSQL speed on my 'snail' computer. :)
  • Hubert depesz lubaczewski at Nov 30, 2010 at 1:21 pm

    On Mon, Nov 29, 2010 at 03:57:29PM -0500, Vick Khera wrote:
    On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane wrote:
    hubert depesz lubaczewski <depesz@depesz.com> writes:
    straced postmaster when the problem was happening, and I was opening new
    connections. strace looks like this:
    [ backend hangs on semop immediately after reading global/pg_database ]
    It looks like something had exclusive lock on the database that new
    connections wanted to connect to.  AFAICS the only action in 8.3 that
    would do that would be a DROP DATABASE or RENAME DATABASE.  What was
    that other session doing?
    Every once in a while when I connect to my big DB it feels like it
    stalls the connection for a few seconds. I have no idea what causes
    it, but it does feel like it has to do with the load. The load is not
    all that high relative to what my box can handle. It doesn't happen
    often enough for me to track it down, though. It instinct is that it
    is waiting on a lock, but clearly there is no rename/drop happening on
    my main db else i'd be out of business :-) I too run 8.3 as primary
    right now... testing 9.0 for deployment soon-ish.
    jkust so it will be clear - on some other occasion when we had the
    problem, i left it hng for a while. 5 minutes later i still didn't get
    the connection.

    Best regards,

    depesz


    --
    Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
    jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
  • Florian Weimer at Nov 30, 2010 at 10:12 am

    * hubert depesz lubaczewski:

    Now, the question is: why did it hang? Is there anything we can do to
    make it *not* hang?
    It might be some general system overload issue. Try running "echo w >
    /proc/sysrq-trigger" as root the next time it happens. This will dump
    kernel backtraces to dmesg, which might hint to what's going on with
    the system.

    --
    Florian Weimer <fweimer@bfk.de>
    BFK edv-consulting GmbH http://www.bfk.de/
    Kriegsstraße 100 tel: +49-721-96201-1
    D-76133 Karlsruhe fax: +49-721-96201-99
  • Hubert depesz lubaczewski at Nov 30, 2010 at 1:24 pm

    On Tue, Nov 30, 2010 at 10:12:47AM +0000, Florian Weimer wrote:
    * hubert depesz lubaczewski:
    Now, the question is: why did it hang? Is there anything we can do to
    make it *not* hang?
    It might be some general system overload issue. Try running "echo w >
    /proc/sysrq-trigger" as root the next time it happens. This will dump
    kernel backtraces to dmesg, which might hint to what's going on with
    the system.
    will check the idea, but the thing is that aside from postgresql -
    everything else on the server is working fine - ssh/top/*stat.
    and other postgreses seem to be locked too.

    Best regards,

    depesz

    --
    Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
    jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedNov 29, '10 at 5:18p
activeNov 30, '10 at 1:24p
posts13
users8
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase