Hi,

I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
Server is mainly 1 user for spatial data processing. This involves queries
that can take hours.

This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB
RAM.

My postgresql.conf:
# - Memory -
shared_buffers = 1024MB # min 128kB
# (change requires restart)
temp_buffers = 256MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 1024MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
max_stack_depth = 7MB # min 100kB
wal_buffers = 8MB
effective_cache_size = 3072MB

Everything else is default.

My Pgbench results:
/usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
duration: 60 s
number of transactions actually processed: 7004
tps = 116.728199 (including connections establishing)
tps = 116.733012 (excluding connections establishing)


My question is if these are acceptable results, or if someone can recommend
settings which will improve my servers performance.

Andreas

Search Discussions

  • Kenneth Marshall at Mar 7, 2011 at 2:01 pm

    On Mon, Mar 07, 2011 at 02:45:03PM +0100, Andreas For? Tollefsen wrote:
    Hi,

    I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
    Server is mainly 1 user for spatial data processing. This involves queries
    that can take hours.

    This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB
    RAM.

    My postgresql.conf:
    # - Memory -
    shared_buffers = 1024MB # min 128kB
    # (change requires restart)
    temp_buffers = 256MB # min 800kB
    #max_prepared_transactions = 0 # zero disables the feature
    # (change requires restart)
    # Note: Increasing max_prepared_transactions costs ~600 bytes of shared
    memory
    # per transaction slot, plus lock space (see max_locks_per_transaction).
    # It is not advisable to set max_prepared_transactions nonzero unless you
    # actively intend to use prepared transactions.
    work_mem = 1024MB # min 64kB
    maintenance_work_mem = 256MB # min 1MB
    max_stack_depth = 7MB # min 100kB
    wal_buffers = 8MB
    effective_cache_size = 3072MB

    Everything else is default.

    My Pgbench results:
    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 7004
    tps = 116.728199 (including connections establishing)
    tps = 116.733012 (excluding connections establishing)


    My question is if these are acceptable results, or if someone can recommend
    settings which will improve my servers performance.

    Andreas
    Your results are I/O limited. Depending upon your requirements,
    you may be able to turn off synchronous_commit which can help.
    Your actual workload may be able to use batching to help as well.
    Your work_mem looks pretty darn high for a 6GB system.

    Cheers,
    Ken
  • Andreas Forø Tollefsen at Mar 7, 2011 at 2:17 pm
    Thanks, Ken.

    It seems like the tip to turn off synchronous_commit did the trick:

    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 86048
    tps = 1434.123199 (including connections establishing)
    tps = 1434.183362 (excluding connections establishing)

    Is this acceptable compared to others when considering my setup?

    Cheers,
    Andreas

    2011/3/7 Kenneth Marshall <ktm@rice.edu>
    On Mon, Mar 07, 2011 at 02:45:03PM +0100, Andreas For? Tollefsen wrote:
    Hi,

    I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
    Server is mainly 1 user for spatial data processing. This involves queries
    that can take hours.

    This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB
    RAM.

    My postgresql.conf:
    # - Memory -
    shared_buffers = 1024MB # min 128kB
    # (change requires restart)
    temp_buffers = 256MB # min 800kB
    #max_prepared_transactions = 0 # zero disables the feature
    # (change requires restart)
    # Note: Increasing max_prepared_transactions costs ~600 bytes of shared
    memory
    # per transaction slot, plus lock space (see max_locks_per_transaction).
    # It is not advisable to set max_prepared_transactions nonzero unless you
    # actively intend to use prepared transactions.
    work_mem = 1024MB # min 64kB
    maintenance_work_mem = 256MB # min 1MB
    max_stack_depth = 7MB # min 100kB
    wal_buffers = 8MB
    effective_cache_size = 3072MB

    Everything else is default.

    My Pgbench results:
    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 7004
    tps = 116.728199 (including connections establishing)
    tps = 116.733012 (excluding connections establishing)


    My question is if these are acceptable results, or if someone can recommend
    settings which will improve my servers performance.

    Andreas
    Your results are I/O limited. Depending upon your requirements,
    you may be able to turn off synchronous_commit which can help.
    Your actual workload may be able to use batching to help as well.
    Your work_mem looks pretty darn high for a 6GB system.

    Cheers,
    Ken
  • Kenneth Marshall at Mar 7, 2011 at 2:22 pm

    On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:
    Thanks, Ken.

    It seems like the tip to turn off synchronous_commit did the trick:

    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 86048
    tps = 1434.123199 (including connections establishing)
    tps = 1434.183362 (excluding connections establishing)

    Is this acceptable compared to others when considering my setup?

    Cheers,
    Andreas

    These are typical results for synchronous_commit off. The caveat
    is you must be able to handle loosing transactions if you have a
    database crash, but your database is still intact. This differs
    from turning fsync off in which a crash means you would need to
    restore from a backup.

    Cheers,
    Ken
  • Andreas Forø Tollefsen at Mar 7, 2011 at 2:29 pm
    Ok. Cheers. I will do some more testing on my heavy PostGIS queries which
    often takes hours to complete.

    Thanks.
    Andreas

    2011/3/7 Kenneth Marshall <ktm@rice.edu>
    On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:
    Thanks, Ken.

    It seems like the tip to turn off synchronous_commit did the trick:

    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 86048
    tps = 1434.123199 (including connections establishing)
    tps = 1434.183362 (excluding connections establishing)

    Is this acceptable compared to others when considering my setup?

    Cheers,
    Andreas

    These are typical results for synchronous_commit off. The caveat
    is you must be able to handle loosing transactions if you have a
    database crash, but your database is still intact. This differs
    from turning fsync off in which a crash means you would need to
    restore from a backup.

    Cheers,
    Ken
  • Oleg Bartunov at Mar 7, 2011 at 3:34 pm

    On Mon, 7 Mar 2011, Andreas For? Tollefsen wrote:

    Ok. Cheers. I will do some more testing on my heavy PostGIS queries which
    often takes hours to complete.
    I'd like to see hours long queries :) EXPLAIN ANALYZE
    Thanks.
    Andreas

    2011/3/7 Kenneth Marshall <ktm@rice.edu>
    On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:
    Thanks, Ken.

    It seems like the tip to turn off synchronous_commit did the trick:

    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 86048
    tps = 1434.123199 (including connections establishing)
    tps = 1434.183362 (excluding connections establishing)

    Is this acceptable compared to others when considering my setup?

    Cheers,
    Andreas

    These are typical results for synchronous_commit off. The caveat
    is you must be able to handle loosing transactions if you have a
    database crash, but your database is still intact. This differs
    from turning fsync off in which a crash means you would need to
    restore from a backup.

    Cheers,
    Ken
    Regards,
    Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83
  • Andreas Forø Tollefsen at Mar 7, 2011 at 9:50 pm
    The synchronous_commit off increased the TPS, but not the speed of the below
    query.

    Oleg:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the
    other is the country geometries of all countries in the world for a certain
    year.

    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE
    ST_In
    tersects(priogrid_land.cell, cshapeswdate.geom);
    QUERY
    PLAN

    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)

    2011/3/7 Oleg Bartunov <oleg@sai.msu.su>
    On Mon, 7 Mar 2011, Andreas For? Tollefsen wrote:

    Ok. Cheers. I will do some more testing on my heavy PostGIS queries which
    often takes hours to complete.
    I'd like to see hours long queries :) EXPLAIN ANALYZE


    Thanks.
    Andreas

    2011/3/7 Kenneth Marshall <ktm@rice.edu>
    On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:

    Thanks, Ken.

    It seems like the tip to turn off synchronous_commit did the trick:

    /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 1
    duration: 60 s
    number of transactions actually processed: 86048
    tps = 1434.123199 (including connections establishing)
    tps = 1434.183362 (excluding connections establishing)

    Is this acceptable compared to others when considering my setup?

    Cheers,
    Andreas
    These are typical results for synchronous_commit off. The caveat
    is you must be able to handle loosing transactions if you have a
    database crash, but your database is still intact. This differs
    from turning fsync off in which a crash means you would need to
    restore from a backup.

    Cheers,
    Ken
    Regards,
    Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83
  • David Kerr at Mar 7, 2011 at 11:29 pm
    On Mon, Mar 07, 2011 at 10:49:48PM +0100, Andreas For Tollefsen wrote:
    - The synchronous_commit off increased the TPS, but not the speed of the below
    - query.
    -
    - Oleg:
    - This is a query i am working on now. It creates an intersection of two
    - geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the
    - other is the country geometries of all countries in the world for a certain
    - year.
    -
    - priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    - ST_Intersection(pri
    - ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE
    - ST_In
    - tersects(priogrid_land.cell, cshapeswdate.geom);
    - QUERY
    - PLAN
    -
    - --------------------------------------------------------------------------------
    - ------------------------------------------------------------------
    - Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    - time=1.815..7
    - 074973.711 rows=130331 loops=1)
    - Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
    - -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248)
    - (actual
    - time=0.007..0.570 rows=242 loops=1)
    - -> Index Scan using idx_priogrid_land_cell on priogrid_land
    - (cost=0.00..7.1
    - 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    - Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    - Total runtime: 7075188.549 ms
    - (6 rows)

    Your estimated and actuals are way off, have you analyzed those tables?

    Dave
  • Tom Lane at Mar 8, 2011 at 12:38 am

    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com> writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual rowcounts imply
    that each country intersects about half of the grid cells, which doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom);
    QUERY
    PLAN
    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the ST_Intersects
    and/or ST_Intersection functions. The two scans are only accounting for
    perhaps 5.5 seconds, and the join infrastructure isn't going to be
    terribly expensive, so it's got to be those functions. Not knowing much
    about PostGIS, I don't know if the functions themselves can be expected
    to be really slow. If it's not them, it could be the cost of fetching
    their arguments --- in particular, I bet the country outlines are very
    large objects and are toasted out-of-line. There's been some past
    discussion of automatically avoiding repeated detoastings in scenarios
    like the above, but nothing's gotten to the point of acceptance yet.
    Possibly you could do something to force detoasting in a subquery.

    regards, tom lane
  • Andreas Forø Tollefsen at Mar 8, 2011 at 8:42 am
    Hi. Thanks for the comments. My data is right, and the result is exactly
    what i want, but as you say i think what causes the query to be slow is the
    ST_Intersection which creates the intersection between the vector grid
    (fishnet) and the country polygons.
    I will check with the postgis user list if they have any idea on how to
    speed up this query.

    Best,
    Andreas

    2011/3/8 Tom Lane <tgl@sss.pgh.pa.us>
    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com> writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual rowcounts imply
    that each country intersects about half of the grid cells, which doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom);
    QUERY
    PLAN

    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the ST_Intersects
    and/or ST_Intersection functions. The two scans are only accounting for
    perhaps 5.5 seconds, and the join infrastructure isn't going to be
    terribly expensive, so it's got to be those functions. Not knowing much
    about PostGIS, I don't know if the functions themselves can be expected
    to be really slow. If it's not them, it could be the cost of fetching
    their arguments --- in particular, I bet the country outlines are very
    large objects and are toasted out-of-line. There's been some past
    discussion of automatically avoiding repeated detoastings in scenarios
    like the above, but nothing's gotten to the point of acceptance yet.
    Possibly you could do something to force detoasting in a subquery.

    regards, tom lane
  • Andy Colson at Mar 8, 2011 at 3:21 pm
    I have seen really complex geometries cause problems. If you have
    thousands of points, when 10 would do, try ST_Simplify and see if it
    doesnt speed things up.

    -Andy

    On 3/8/2011 2:42 AM, Andreas Forø Tollefsen wrote:
    Hi. Thanks for the comments. My data is right, and the result is exactly
    what i want, but as you say i think what causes the query to be slow is
    the ST_Intersection which creates the intersection between the vector
    grid (fishnet) and the country polygons.
    I will check with the postgis user list if they have any idea on how to
    speed up this query.

    Best,
    Andreas

    2011/3/8 Tom Lane <tgl@sss.pgh.pa.us
    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com
    writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized
    cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual rowcounts imply
    that each country intersects about half of the grid cells, which doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
    cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom); QUERY
    PLAN

    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
    width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the ST_Intersects
    and/or ST_Intersection functions. The two scans are only accounting for
    perhaps 5.5 seconds, and the join infrastructure isn't going to be
    terribly expensive, so it's got to be those functions. Not knowing much
    about PostGIS, I don't know if the functions themselves can be expected
    to be really slow. If it's not them, it could be the cost of fetching
    their arguments --- in particular, I bet the country outlines are very
    large objects and are toasted out-of-line. There's been some past
    discussion of automatically avoiding repeated detoastings in scenarios
    like the above, but nothing's gotten to the point of acceptance yet.
    Possibly you could do something to force detoasting in a subquery.

    regards, tom lane
  • Andreas Forø Tollefsen at Mar 8, 2011 at 4:59 pm
    Andy. Thanks. That is a great tips. I tried it but i get the error:
    NOTICE: ptarray_simplify returned a <2 pts array.

    Query:
    SELECT ST_Intersection(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AS geom,
    priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate,
    capname, caplong, caplat, col, row, xcoord, ycoord
    FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND
    cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1';


    2011/3/8 Andy Colson <andy@squeakycode.net>
    I have seen really complex geometries cause problems. If you have
    thousands of points, when 10 would do, try ST_Simplify and see if it doesnt
    speed things up.

    -Andy


    On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote:

    Hi. Thanks for the comments. My data is right, and the result is exactly
    what i want, but as you say i think what causes the query to be slow is
    the ST_Intersection which creates the intersection between the vector
    grid (fishnet) and the country polygons.
    I will check with the postgis user list if they have any idea on how to
    speed up this query.

    Best,
    Andreas

    2011/3/8 Tom Lane <tgl@sss.pgh.pa.us >

    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com
    writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized
    cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual rowcounts
    imply
    that each country intersects about half of the grid cells, which
    doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
    cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom); QUERY
    PLAN

    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell,
    cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
    width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the ST_Intersects
    and/or ST_Intersection functions. The two scans are only accounting
    for
    perhaps 5.5 seconds, and the join infrastructure isn't going to be
    terribly expensive, so it's got to be those functions. Not knowing
    much
    about PostGIS, I don't know if the functions themselves can be expected
    to be really slow. If it's not them, it could be the cost of fetching
    their arguments --- in particular, I bet the country outlines are very
    large objects and are toasted out-of-line. There's been some past
    discussion of automatically avoiding repeated detoastings in scenarios
    like the above, but nothing's gotten to the point of acceptance yet.
    Possibly you could do something to force detoasting in a subquery.

    regards, tom lane

  • Andreas Forø Tollefsen at Mar 8, 2011 at 5:25 pm
    Forgot to mention that the query terminates the connection because of a
    crash of server process.

    2011/3/8 Andreas Forø Tollefsen <andreasft@gmail.com>
    Andy. Thanks. That is a great tips. I tried it but i get the error:
    NOTICE: ptarray_simplify returned a <2 pts array.

    Query:
    SELECT ST_Intersection(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AS geom,
    priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate,
    capname, caplong, caplat, col, row, xcoord, ycoord
    FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND
    cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1';


    2011/3/8 Andy Colson <andy@squeakycode.net>

    I have seen really complex geometries cause problems. If you have
    thousands of points, when 10 would do, try ST_Simplify and see if it doesnt
    speed things up.

    -Andy


    On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote:

    Hi. Thanks for the comments. My data is right, and the result is exactly
    what i want, but as you say i think what causes the query to be slow is
    the ST_Intersection which creates the intersection between the vector
    grid (fishnet) and the country polygons.
    I will check with the postgis user list if they have any idea on how to
    speed up this query.

    Best,
    Andreas

    2011/3/8 Tom Lane <tgl@sss.pgh.pa.us >>

    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com
    writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized
    cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual rowcounts
    imply
    that each country intersects about half of the grid cells, which
    doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
    cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom); QUERY
    PLAN

    --------------------------------------------------------------------------------
    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell,
    cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
    width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the ST_Intersects
    and/or ST_Intersection functions. The two scans are only accounting
    for
    perhaps 5.5 seconds, and the join infrastructure isn't going to be
    terribly expensive, so it's got to be those functions. Not knowing
    much
    about PostGIS, I don't know if the functions themselves can be
    expected
    to be really slow. If it's not them, it could be the cost of fetching
    their arguments --- in particular, I bet the country outlines are very
    large objects and are toasted out-of-line. There's been some past
    discussion of automatically avoiding repeated detoastings in scenarios
    like the above, but nothing's gotten to the point of acceptance yet.
    Possibly you could do something to force detoasting in a subquery.

    regards, tom lane

  • Andy Colson at Mar 8, 2011 at 7:13 pm

    On 3/8/2011 10:58 AM, Andreas Forø Tollefsen wrote:
    Andy. Thanks. That is a great tips. I tried it but i get the error:
    NOTICE: ptarray_simplify returned a <2 pts array.

    Query:
    SELECT ST_Intersection(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AS geom,
    priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate,
    enddate, capname, caplong, caplat, col, row, xcoord, ycoord
    FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell,
    ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND
    cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1';


    2011/3/8 Andy Colson <andy@squeakycode.net
    I have seen really complex geometries cause problems. If you have
    thousands of points, when 10 would do, try ST_Simplify and see if it
    doesnt speed things up.

    -Andy



    On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote:

    Hi. Thanks for the comments. My data is right, and the result is
    exactly
    what i want, but as you say i think what causes the query to be
    slow is
    the ST_Intersection which creates the intersection between the
    vector
    grid (fishnet) and the country polygons.
    I will check with the postgis user list if they have any idea on
    how to
    speed up this query.

    Best,
    Andreas

    2011/3/8 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us

    =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?=
    <andreasft@gmail.com <mailto:andreasft@gmail.com writes:
    This is a query i am working on now. It creates an intersection of two
    geometries. One is a grid of 0.5 x 0.5 decimal degree sized
    cells, while the
    other is the country geometries of all countries in the world for a certain
    year.
    Hm, are you sure your data is right? Because the actual
    rowcounts imply
    that each country intersects about half of the grid cells,
    which doesn't
    seem right.
    priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
    ST_Intersection(pri
    ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
    cshapeswdate WHERE
    ST_Intersects(priogrid_land.cell, cshapeswdate.geom); QUERY
    PLAN

    --------------------------------------------------------------------------------

    ------------------------------------------------------------------
    Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
    time=1.815..7
    074973.711 rows=130331 loops=1)
    Join Filter: _st_intersects(priogrid_land.cell,
    cshapeswdate.geom)
    -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
    width=87248)
    (actual
    time=0.007..0.570 rows=242 loops=1)
    -> Index Scan using idx_priogrid_land_cell on priogrid_land
    (cost=0.00..7.1
    5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
    Index Cond: (priogrid_land.cell && cshapeswdate.geom)
    Total runtime: 7075188.549 ms
    (6 rows)
    AFAICT, all of the runtime is going into calculating the
    ST_Intersects
    and/or ST_Intersection functions. The two scans are only
    accounting for
    perhaps 5.5 seconds, and the join infrastructure isn't going
    to be
    terribly expensive, so it's got to be those functions. Not
    knowing much
    about PostGIS, I don't know if the functions themselves can
    be expected
    to be really slow. If it's not them, it could be the cost
    of fetching
    their arguments --- in particular, I bet the country
    outlines are very
    large objects and are toasted out-of-line. There's been
    some past
    discussion of automatically avoiding repeated detoastings in
    scenarios
    like the above, but nothing's gotten to the point of
    acceptance yet.
    Possibly you could do something to force detoasting in a
    subquery.

    regards, tom lane




    ew... thats not good. Seems like it simplified it down to a single
    point? (not 100% sure that's what the error means, just a guess)

    Try getting some info about it:

    select
    ST_Npoints(geom) As before,
    ST_NPoints(ST_Simplify(geom,0.1)) as after
    from cshapeswdate


    Also try things like ST_IsSimple ST_IsValid. I seem to recall sometimes
    needing ST_Points or st_NumPoints instead of ST_Npoints.

    -Andy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMar 7, '11 at 1:47p
activeMar 8, '11 at 7:13p
posts14
users6
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase