I'm not sure if this is the write place to post this question or not, but I
hope someone can help me out.

I am using the copy_from command from the python psycopg2 library to do some
bulk data loading of a postgres database. This had been working OK, until
my script barfed because I was being careless, and it seemed to leave the
database in a strange state that I can't recover from. Here is what
happened:

I am connecting to postgres via TCP, but I happen to be running the script
on the same host ast the server right now. I had forgotten to clear the
contents of one table before loading it from a file and the copy_from
command exited with error because of conflicting primary keys. After this
any time I attempted to execute copy_from it would just hang (after
successfully connecting to the database and doing a series of DELETE FROM
SQL commands).

I just see something like this if I look through the output of ps:

postgres 30432 1.7 1.2 42884 12512 ? Ss 13:16 0:00 postgres:
biomart mgi_biomart_intermediate 127.0.1.1(41831) COPY

The script hangs forever, and I don't see anything in the postgres log.

If I switched my script to connect through a unix socket (remove
host='hostname' from the psycopg2.connect() call), then it works again.
I've restarted postgresql many times, the TCP connection always hangs on the
COPY from my script now.

I have fixed my script to be more careful and handle the copy_from() error
in a more sane way, but I'd like to fix whatever is now wrong with postgres.

Here are some entries from pg_hba.conf:

host all all 10.0.0.0/8 md5
host all all 127.0.0.1/32 md5
host all all 127.0.1.1/32 md5
local all biomart md5

Everything else in pg_hba.conf is default
The only change to postgresql.conf is to change the listen_address to '*'



Any insight would be appreciated.




--
Glen L. Beane
Software Engineer
The Jackson Laboratory
Phone (207) 288-6153

Search Discussions

  • Tom Lane at Nov 11, 2008 at 7:25 pm

    Glen Beane writes:
    I am using the copy_from command from the python psycopg2 library to do some
    bulk data loading of a postgres database. This had been working OK, until
    my script barfed because I was being careless, and it seemed to leave the
    database in a strange state that I can't recover from.
    What PG version is this exactly? What does pg_stat_activity show? Is
    there any indication of un-granted locks in pg_locks?

    The most direct evidence about why it's stuck would probably be had by
    attaching to the backend process with gdb and getting a stack trace.

    regards, tom lane
  • Glen Beane at Nov 11, 2008 at 10:29 pm

    On 11/11/08 2:25 PM, "Tom Lane" wrote:

    Glen Beane <Glen.Beane@jax.org> writes:
    I am using the copy_from command from the python psycopg2 library to do some
    bulk data loading of a postgres database. This had been working OK, until
    my script barfed because I was being careless, and it seemed to leave the
    database in a strange state that I can't recover from.
    What PG version is this exactly?
    8.3.3

    What does pg_stat_activity show?
    This is the only non-idle connection:

    16498 | mgi_biomart_intermediate | 31356 | 16386 | biomart | COPY
    markers FROM stdin USING DELIMITERS ' ' | f | 2008-11-11
    17:10:36.124919-05 | 2008-11-11 17:10:36.125522-05 | 2008-11-11
    17:10:32.213159-05 | 127.0.1.1 | 35021

    Is
    there any indication of un-granted locks in pg_locks?
    gbeane=# select * from pg_locks;
    locktype | database | relation | page | tuple | virtualxid |
    transactionid | classid | objid | objsubid | virtualtransaction | pid |
    mode | granted
    ---------------+----------+----------+------+-------+------------+----------
    -----+---------+-------+----------+--------------------+-------+------------
    ------+---------
    virtualxid | | | | | 5/3 |
    5/3 | 31361 | ExclusiveLock
    t
    relation | 16385 | 10969 | | | |
    5/3 | 31361 | AccessShareLock
    t
    relation | 16498 | 16507 | | | |
    3/1151 | 31356 | RowExclusiveLock
    t
    virtualxid | | | | | 3/1151 |
    3/1151 | 31356 | ExclusiveLock
    t
    transactionid | | | | | |
    637 | | | | 3/1151 | 31356 |
    ExclusiveLock | t
    relation | 16498 | 16510 | | | |
    3/1151 | 31356 | RowExclusiveLock
    t
    The most direct evidence about why it's stuck would probably be had by
    attaching to the backend process with gdb and getting a stack trace.
    I'll take a look with gdb when I have a moment.
    regards, tom lane
    --
    Glen L. Beane
    Software Engineer
    The Jackson Laboratory
    Phone (207) 288-6153
  • Glen Beane at Nov 12, 2008 at 2:07 am

    On 11/11/08 2:25 PM, "Tom Lane" wrote:

    The most direct evidence about why it's stuck would probably be had by
    attaching to the backend process with gdb and getting a stack trace.
    It wasn't built with debugging symbols so there is some missing info, but
    here is what I get if I attach with gdb:

    (gdb) where
    #0 0xb7f63410 in __kernel_vsyscall ()
    #1 0xb7ac9273 in read () from /lib/tls/i686/cmov/libc.so.6
    #2 0xb7d42d37 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #3 0x081a1226 in ?? ()
    #4 0xb7d40db1 in BIO_read () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #5 0xb7e05362 in ssl3_read_n () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #6 0xb7e05b2e in ssl3_read_bytes () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #7 0xb7e03096 in ssl3_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #8 0xb7e13b78 in SSL_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #9 0x081a128a in secure_read ()
    #10 0x081a94c1 in ?? ()
    #11 0x081a95d9 in pq_getbytes ()
    #12 0x081a9744 in pq_getmessage ()
    #13 0x0813853c in ?? ()
    #14 0x0813883a in ?? ()
    #15 0x08138a05 in ?? ()
    #16 0x0813b456 in DoCopy ()
    #17 0x082302e4 in ProcessUtility ()
    #18 0x0822cc6b in ?? ()
    #19 0x0822d8e3 in ?? ()
    #20 0x0822e104 in PortalRun ()
    #21 0x08228eb5 in ?? ()
    #22 0x08229f6b in PostgresMain ()
    #23 0x081fa39d in ?? ()
    #24 0x081fb3cb in PostmasterMain ()
    #25 0x081ab902 in main ()

    --
    Glen L. Beane
    Software Engineer
    The Jackson Laboratory
    Phone (207) 288-6153
  • Tom Lane at Nov 12, 2008 at 2:09 am

    Glen Beane writes:
    On 11/11/08 2:25 PM, "Tom Lane" wrote:
    The most direct evidence about why it's stuck would probably be had by
    attaching to the backend process with gdb and getting a stack trace.
    It wasn't built with debugging symbols so there is some missing info, but
    here is what I get if I attach with gdb:
    (gdb) where
    #0 0xb7f63410 in __kernel_vsyscall ()
    #1 0xb7ac9273 in read () from /lib/tls/i686/cmov/libc.so.6
    #2 0xb7d42d37 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #3 0x081a1226 in ?? ()
    #4 0xb7d40db1 in BIO_read () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #5 0xb7e05362 in ssl3_read_n () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #6 0xb7e05b2e in ssl3_read_bytes () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #7 0xb7e03096 in ssl3_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #8 0xb7e13b78 in SSL_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #9 0x081a128a in secure_read ()
    #10 0x081a94c1 in ?? ()
    #11 0x081a95d9 in pq_getbytes ()
    #12 0x081a9744 in pq_getmessage ()
    #13 0x0813853c in ?? ()
    #14 0x0813883a in ?? ()
    #15 0x08138a05 in ?? ()
    #16 0x0813b456 in DoCopy ()
    Well, this is sufficient to draw a conclusion: it's waiting for the
    client to send it some more COPY data. So next you need to look into
    what the client's state is. What software are you using on the client
    side, anyway?

    regards, tom lane
  • Glen Beane at Nov 12, 2008 at 2:20 am

    On 11/11/08 9:09 PM, "Tom Lane" wrote:

    Glen Beane <Glen.Beane@jax.org> writes:
    On 11/11/08 2:25 PM, "Tom Lane" wrote:
    The most direct evidence about why it's stuck would probably be had by
    attaching to the backend process with gdb and getting a stack trace.
    It wasn't built with debugging symbols so there is some missing info, but
    here is what I get if I attach with gdb:
    (gdb) where
    #0 0xb7f63410 in __kernel_vsyscall ()
    #1 0xb7ac9273 in read () from /lib/tls/i686/cmov/libc.so.6
    #2 0xb7d42d37 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #3 0x081a1226 in ?? ()
    #4 0xb7d40db1 in BIO_read () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #5 0xb7e05362 in ssl3_read_n () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #6 0xb7e05b2e in ssl3_read_bytes () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #7 0xb7e03096 in ssl3_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #8 0xb7e13b78 in SSL_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
    #9 0x081a128a in secure_read ()
    #10 0x081a94c1 in ?? ()
    #11 0x081a95d9 in pq_getbytes ()
    #12 0x081a9744 in pq_getmessage ()
    #13 0x0813853c in ?? ()
    #14 0x0813883a in ?? ()
    #15 0x08138a05 in ?? ()
    #16 0x0813b456 in DoCopy ()
    Well, this is sufficient to draw a conclusion: it's waiting for the
    client to send it some more COPY data. So next you need to look into
    what the client's state is. What software are you using on the client
    side, anyway?
    Python with the psycopg2 library. I swear this was working earlier today.
    Maybe I am imagining things :)
    It does work with a unix socket, and I have a deadline to meet, so for now I
    just need to make sure I am running this on the same host as the postgresql
    server
  • Adrian Klaver at Nov 12, 2008 at 2:31 am

    On Tuesday 11 November 2008 6:20:09 pm Glen Beane wrote:
    Python with the psycopg2 library. I swear this was working earlier today.
    Maybe I am imagining things :)
    It does work with a unix socket, and I have a deadline to meet, so for now
    I just need to make sure I am running this on the same host as the
    postgresqlserver
    See these two posts for potential solution:
    http://lists.initd.org/pipermail/psycopg/2008-October/006224.html
    http://lists.initd.org/pipermail/psycopg/2008-October/006225.html

    They point to a problem with ssl and psycopg2 COPY. Also I know there where
    some problems with psycopg2 and copy that where fixed in psycopg2 version
    2.0.7.
    --
    Adrian Klaver
    aklaver@comcast.net
  • Adrian Klaver at Nov 12, 2008 at 2:42 am

    On Tuesday 11 November 2008 6:31:04 pm Adrian Klaver wrote:
    On Tuesday 11 November 2008 6:20:09 pm Glen Beane wrote:
    Python with the psycopg2 library. I swear this was working earlier
    today. Maybe I am imagining things :)
    It does work with a unix socket, and I have a deadline to meet, so for
    now I just need to make sure I am running this on the same host as the
    postgresqlserver
    See these two posts for potential solution:
    http://lists.initd.org/pipermail/psycopg/2008-October/006224.html
    http://lists.initd.org/pipermail/psycopg/2008-October/006225.html

    They point to a problem with ssl and psycopg2 COPY. Also I know there where
    some problems with psycopg2 and copy that where fixed in psycopg2 version
    2.0.7.
    --
    Adrian Klaver
    aklaver@comcast.net
    That should have been version 2.0.8.

    --
    Adrian Klaver
    aklaver@comcast.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedNov 11, '08 at 7:15p
activeNov 12, '08 at 2:42a
posts8
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase