Hello,

I need to compare quiery execution : I have 2 tables partitioned by Datex (
daily):

summary_daily (
counter | bigint
datasource_id | integer
application_id | integer
action | character(1)
srcreporter_id | integer
destreporter_id | integer
bytes | bigint
srcusergroup_id | integer
datex | timestamp with time zone
root_cause_id | integer
rule_id | integer
srcgeo_id | integer
destgeo_id | integer
mlapp_id | bigint
)

app (
counter | bigint
bytes | bigint
action | character(1)
datex | timestamp with time zone
datasource_id | integer
application_id | integer
mlapp_id | bigint
root_cause_id | integer
)


The second table has been created from the first by aggregation.

table Summary has 9 mln rec per partition,
table App has 7 mln rec per partition

execution plan looks the same except the actual time is a huge difference.

work_mem=10mb,

days/partitions query from Summary query from App

1 2.5 sec 1 sec
3 5.5 sec 1.5 sec
7 60 sec 8 sec.

when I set session work_mem=60mb query for 7 days takes 8.5 sec vs 60 sec.

how can I see where/when it is using disk or memory?

explain analyze SELECT summary_app.action, sum(summary_app.counter),
summary_app.mlapp_id,
summary_app.application_id, sum(summary_app.bytes),
summary_app.root_cause_id
FROM summary_app
WHERE summary_app.datasource_id = 10 and
summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time zone
AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time zone
group by mlapp_id, application_id,action, root_cause_id



HashAggregate (cost=8223.97..8226.97 rows=200 width=37) (actual
time=4505.607..4506.806 rows=3126 loops=1)
-> Append (cost=0.00..8213.42 rows=703 width=37) (actual
time=1071.043..4046.780 rows=283968 loops=1)
-> Seq Scan on summary_daily_data summary_app (cost=0.00..23.83
rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with
time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)
AND (datasource_id = 10))
-> Bitmap Heap Scan on summ_daily_15191 summary_app
(cost=1854.89..2764.60 rows=234 width=37) (actual time=1071.041..1343.235
rows=94656 loops=1)
Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))
-> BitmapAnd (cost=1854.89..1854.89 rows=234 width=0)
(actual time=1054.310..1054.310 rows=0 loops=1)
-> Bitmap Index Scan on ind_fw_15191
(cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896
rows=94656 loops=1)
Index Cond: (datasource_id = 10)
-> Bitmap Index Scan on ind_datex_15191
(cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
rows=9370944 loops=1)
Index Cond: ((datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))


the same query from the smaller table:


HashAggregate (cost=252859.36..253209.94 rows=23372 width=34) (actual
time=371.164..372.153 rows=3126 loops=1)
-> Append (cost=0.00..249353.62 rows=233716 width=34) (actual
time=11.028..115.915 rows=225072 loops=1)
-> Seq Scan on summary_app (cost=0.00..28.03 rows=1 width=37)
(actual time=0.001..0.001 rows=0 loops=1)
Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with
time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)
AND (datasource_id = 10))
-> Bitmap Heap Scan on summ_app_15191 summary_app
(cost=2299.40..82014.85 rows=72293 width=34) (actual time=11.027..31.341
rows=75024 loops=1)
Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))
-> Bitmap Index Scan on summ_app_fw_datex_15191
(cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910
rows=75024 loops=1)
Index Cond: ((datasource_id = 10) AND (datex >=
'2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))


Why the difference is so large? How I can tune this query?

thank you.

Helen






--
View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694681p4694681.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Search Discussions

  • Igor Neyman at Aug 15, 2011 at 6:19 pm

    -----Original Message-----
    From: hyelluas
    Sent: Friday, August 12, 2011 5:30 PM
    To: pgsql-performance@postgresql.org
    Subject: How to see memory usage using explain analyze ?

    Hello,

    I need to compare quiery execution : I have 2 tables partitioned by
    Datex (
    daily):

    summary_daily (
    counter | bigint
    datasource_id | integer
    application_id | integer
    action | character(1)
    srcreporter_id | integer
    destreporter_id | integer
    bytes | bigint
    srcusergroup_id | integer
    datex | timestamp with time zone
    root_cause_id | integer
    rule_id | integer
    srcgeo_id | integer
    destgeo_id | integer
    mlapp_id | bigint
    )

    app (
    counter | bigint
    bytes | bigint
    action | character(1)
    datex | timestamp with time zone
    datasource_id | integer
    application_id | integer
    mlapp_id | bigint
    root_cause_id | integer
    )


    The second table has been created from the first by aggregation.

    table Summary has 9 mln rec per partition,
    table App has 7 mln rec per partition

    execution plan looks the same except the actual time is a huge
    difference.

    work_mem=10mb,

    days/partitions query from Summary query from App

    1 2.5 sec 1 sec
    3 5.5 sec 1.5 sec
    7 60 sec 8 sec.

    when I set session work_mem=60mb query for 7 days takes 8.5 sec vs 60
    sec.

    how can I see where/when it is using disk or memory?

    explain analyze SELECT summary_app.action, sum(summary_app.counter),
    summary_app.mlapp_id,
    summary_app.application_id, sum(summary_app.bytes),
    summary_app.root_cause_id
    FROM summary_app
    WHERE summary_app.datasource_id = 10 and
    summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time
    zone
    AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time
    zone
    group by mlapp_id, application_id,action, root_cause_id



    HashAggregate (cost=8223.97..8226.97 rows=200 width=37) (actual
    time=4505.607..4506.806 rows=3126 loops=1)
    -> Append (cost=0.00..8213.42 rows=703 width=37) (actual
    time=1071.043..4046.780 rows=283968 loops=1)
    -> Seq Scan on summary_daily_data summary_app
    (cost=0.00..23.83
    rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1)
    Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp
    with
    time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time
    zone)
    AND (datasource_id = 10))
    -> Bitmap Heap Scan on summ_daily_15191 summary_app
    (cost=1854.89..2764.60 rows=234 width=37) (actual
    time=1071.041..1343.235
    rows=94656 loops=1)
    Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-
    08-03
    00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
    00:00:00+00'::timestamp with time zone))
    -> BitmapAnd (cost=1854.89..1854.89 rows=234 width=0)
    (actual time=1054.310..1054.310 rows=0 loops=1)
    -> Bitmap Index Scan on ind_fw_15191
    (cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896
    rows=94656 loops=1)
    Index Cond: (datasource_id = 10)
    -> Bitmap Index Scan on ind_datex_15191
    (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
    rows=9370944 loops=1)
    Index Cond: ((datex >= '2011-08-03
    00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
    00:00:00+00'::timestamp with time zone))


    the same query from the smaller table:


    HashAggregate (cost=252859.36..253209.94 rows=23372 width=34) (actual
    time=371.164..372.153 rows=3126 loops=1)
    -> Append (cost=0.00..249353.62 rows=233716 width=34) (actual
    time=11.028..115.915 rows=225072 loops=1)
    -> Seq Scan on summary_app (cost=0.00..28.03 rows=1 width=37)
    (actual time=0.001..0.001 rows=0 loops=1)
    Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp
    with
    time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time
    zone)
    AND (datasource_id = 10))
    -> Bitmap Heap Scan on summ_app_15191 summary_app
    (cost=2299.40..82014.85 rows=72293 width=34) (actual
    time=11.027..31.341
    rows=75024 loops=1)
    Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-
    08-03
    00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
    00:00:00+00'::timestamp with time zone))
    -> Bitmap Index Scan on summ_app_fw_datex_15191
    (cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910
    rows=75024 loops=1)
    Index Cond: ((datasource_id = 10) AND (datex >=
    '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-
    08-06
    00:00:00+00'::timestamp with time zone))


    Why the difference is so large? How I can tune this query?

    thank you.

    Helen






    --
    View this message in context:
    http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
    explain-analyze-tp4694681p4694681.html
    Sent from the PostgreSQL - performance mailing list archive at
    Nabble.com.
    Helen,

    I'm probably a bit late answering your question.
    But, just in case...

    It looks like one table has "combined" index summ_app_fw_datex_15191 on
    both: datasource_id and datex, which works better than 2 separate
    indexes ind_datex_15191(datex) and ind_fw_15191(datasource_id), that you
    have on the other table.
    Besides, this:

    -> Bitmap Index Scan on ind_datex_15191
    (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
    rows=9370944 loops=1)

    Shows that statistics on ind_datex_15191 are completely "out of wack"
    (expected rows=46855, actual rows=9370944).

    HTH,
    Igor Neyman
  • Hyelluas at Aug 15, 2011 at 6:32 pm
    Igor,

    thank you , my tests showed better performance against the larger summary
    tables when I splited the index for datasource_id & datex , I use to have a
    composed index.

    Regarding that index statistics - should I analyze the tables? I thought
    auto vacuum takes care of it.

    helen

    --
    View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4701919.html
    Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
  • Igor Neyman at Aug 17, 2011 at 3:27 pm

    -----Original Message-----
    From: hyelluas
    Sent: Monday, August 15, 2011 2:33 PM
    To: pgsql-performance@postgresql.org
    Subject: Re: How to see memory usage using explain analyze ?

    Igor,

    thank you , my tests showed better performance against the larger
    summary
    tables when I splited the index for datasource_id & datex , I use to
    have a
    composed index.

    Regarding that index statistics - should I analyze the tables? I
    thought
    auto vacuum takes care of it.

    helen

    --
    View this message in context:
    http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
    explain-analyze-tp4694962p4701919.html
    Sent from the PostgreSQL - performance mailing list archive at
    Nabble.com.

    But, having different sets of indexes, you can't compare execution
    plans.
    In regards to statistics, you could try to ANALYZE table manually, may
    be increasing "default_statistics_target".
    From the docs:

    "default_statistics_target (integer)

    Sets the default statistics target for table columns that have not
    had a column-specific target set via ALTER TABLE SET STATISTICS. Larger
    values increase the time needed to do ANALYZE, but might improve the
    quality of the planner's estimates. The default is 10. For more
    information on the use of statistics by the PostgreSQL query planner,
    refer to Section 14.2."

    HTH,
    Igor
  • Hyelluas at Aug 17, 2011 at 6:52 pm
    Igor,

    Thank you for the hint, I read about the planner, added "vacuum analyze " to
    my procedures.

    There is no join in my query but GROUP BY that is taking all the time and I
    don't know how to tune it.
    It gets executed by the procedure, the execution time requirement is < 4
    sec,
    but it takes 8-11 sec against 3 partitions , 9 mln rec each, it goes to 22
    sec for 5 partitions.


    I've been testing PostgreSQL performance for the last 2 months, comparing it
    whith MySQL,
    PostgreSQL performance with 5+ mln records on the table with 14 columns is
    worse.
    Is 14 columns is a big table for Postgres or 5mln rec is a big table?

    The whole picture is that there are 2 databases : OLTP & "OLAP" that use to
    be on different machines and on different databases.
    The new project requires to put it on one database & machine.

    I preferred Postgres ( poorly designed oltp would not suffer even more on
    mysql) and now I'm trying to tune OLAP db.

    thank you.

    Helen


    --
    View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4709415.html
    Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedAug 12, '11 at 9:29p
activeAug 17, '11 at 6:52p
posts5
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Hyelluas: 3 posts Igor Neyman: 2 posts

People

Translate

site design / logo © 2021 Grokbase