I have postgresql 9.0.1 on windows 2003 ent with 6GB ram, 4 disk SATA RAID
10.
I am running SymmetricDS to replication over WAN. But yesterday there was a
big problem, i updated alot of rows and query to gap data of SymmetricDS run
verry very slowly.

Here is my postgresql.conf to tunning PostgreSQL
effective_cache_size = 4GB
work_mem = 2097151
shared_buffers = 1GB

Here is query :
explain analyze select d.data_id, d.table_name, d.event_type, d.row_data,
d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id,
d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d
inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id
and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

And here is result :
Nested Loop (cost=0.00..1517515125.95 rows=26367212590 width=1403) (actual
time=14646.390..7745828.163 rows=2764140 loops=1)
-> Index Scan using sym_data_pkey on sym_data d (cost=0.00..637148.72
rows=3129103 width=1403) (actual time=71.989..55643.665 rows=3124631
loops=1)
Filter: ((channel_id)::text = 'sale_transaction'::text)
-> Index Scan using sym_data_gap_pkey on sym_data_gap g
(cost=0.00..358.37 rows=8426 width=8) (actual time=2.459..2.459 rows=1
loops=3124631)
Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
Filter: (g.status = 'GP'::bpchar)
Total runtime: 7746577.478 ms

Here is table sym_data it have 437319 rows with data_id between start_id and
end_id of sym_data_gap has status = 'GP'

CREATE TABLE sym_data
(
data_id serial NOT NULL,
table_name character varying(50) NOT NULL,
event_type character(1) NOT NULL,
row_data text,
pk_data text,
old_data text,
trigger_hist_id integer NOT NULL,
channel_id character varying(20),
transaction_id character varying(255),
source_node_id character varying(50),
external_data character varying(50),
create_time timestamp without time zone,
CONSTRAINT sym_data_pkey PRIMARY KEY (data_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE sym_data OWNER TO postgres;

-- Index: idx_d_channel_id

-- DROP INDEX idx_d_channel_id;

CREATE INDEX idx_d_channel_id
ON sym_data
USING btree
(data_id, channel_id);

And here is sym_data_gap table it have 57838 rows have status = 'GP'

CREATE TABLE sym_data_gap
(
start_id integer NOT NULL,
end_id integer NOT NULL,
status character(2),
create_time timestamp without time zone NOT NULL,
last_update_hostname character varying(255),
last_update_time timestamp without time zone NOT NULL,
CONSTRAINT sym_data_gap_pkey PRIMARY KEY (start_id, end_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE sym_data_gap OWNER TO postgres;

-- Index: idx_dg_status

-- DROP INDEX idx_dg_status;

CREATE INDEX idx_dg_status
ON sym_data_gap
USING btree
(status);

Because the query run very slowly so data is not replication between to
distance. Please help me.

Sorry for my English
Tuan Hoang ANh

Search Discussions

  • Kevin Grittner at Aug 5, 2011 at 5:21 pm

    tuanhoanganh wrote:

    I have postgresql 9.0.1
    http://www.postgresql.org/support/versioning
    6GB ram
    work_mem = 2097151
    I think that has the potential to push you into swapping:

    cc=> set work_mem = 2097151;
    SET
    cc=> show work_mem;
    work_mem
    -----------
    2097151kB
    (1 row)

    That's 2GB, and that much can be allocated, potentially several
    times, per connection.
    -> Index Scan using sym_data_pkey on sym_data d
    (cost=0.00..637148.72 rows=3129103 width=1403)
    (actual time=71.989..55643.665 rows=3124631 loops=1)
    Filter: ((channel_id)::text = 'sale_transaction'::text)
    This index scan is going to randomly access all tuples in the
    table's heap. That is probably going to be much slower than a
    sequential scan. It is apparently choosing this index to avoid a
    sort, because of the mis-estimation on the number of rows. Is it
    critical that the rows be returned in that order? If not, you might
    see much faster performance by leaving off the ORDER BY clause so
    that it can use the seqscan.

    You could potentially make queries like this much faster by indexing
    on channel_id, or by indexing on data_id WHERE channel_id =
    'sale_transaction'..

    You could also set up optimization barriers with clever use of a CTE
    or an OFFSET 0 to force it to use a seqscan followed by a sort, but
    I would look at the other options first.

    -Kevin
  • Tuanhoanganh at Aug 6, 2011 at 2:16 am
    Thanks for your help.
    I create index on channel_id and data_id like your comment.

    - Index: idx_d_channel_id2

    -- DROP INDEX idx_d_channel_id2;

    CREATE INDEX idx_d_channel_id2
    ON sym_data
    USING btree
    (channel_id);

    -- Index: idx_d_channel_id3

    -- DROP INDEX idx_d_channel_id3;

    CREATE INDEX idx_d_channel_id3
    ON sym_data
    USING btree
    (data_id)
    WHERE channel_id::text = 'sale_transaction'::text;

    -- Index: idx_d_channel_id4

    -- DROP INDEX idx_d_channel_id4;

    CREATE INDEX idx_d_channel_id4
    ON sym_data
    USING btree
    (data_id)
    WHERE channel_id::text = 'item'::text;

    Here is new explan analyze

    explain analyze select d.data_id, d.table_name, d.event_type, d.row_data,
    d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id,
    d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d
    inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id
    and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

    Nested Loop (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual
    time=25741.704..7650979.311 rows=2764140 loops=1)
    -> Index Scan using idx_d_channel_id3 on sym_data d
    (cost=0.00..1781979.40 rows=3117384 width=1401) (actual
    time=83.718..55126.002 rows=3124631 loops=1)
    -> Index Scan using sym_data_gap_pkey on sym_data_gap g
    (cost=0.00..358.37 rows=8426 width=8) (actual time=2.428..2.429 rows=1
    loops=3124631)
    Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
    Filter: (g.status = 'GP'::bpchar)
    Total runtime: 7651803.073 ms

    But query performance don't change.
    Please help me.

    Tuan Hoang ANh
    On Sat, Aug 6, 2011 at 12:20 AM, Kevin Grittner wrote:

    tuanhoanganh wrote:
    I have postgresql 9.0.1
    http://www.postgresql.org/support/versioning
    6GB ram
    work_mem = 2097151
    I think that has the potential to push you into swapping:

    cc=> set work_mem = 2097151;
    SET
    cc=> show work_mem;
    work_mem
    -----------
    2097151kB
    (1 row)

    That's 2GB, and that much can be allocated, potentially several
    times, per connection.
    -> Index Scan using sym_data_pkey on sym_data d
    (cost=0.00..637148.72 rows=3129103 width=1403)
    (actual time=71.989..55643.665 rows=3124631 loops=1)
    Filter: ((channel_id)::text = 'sale_transaction'::text)
    This index scan is going to randomly access all tuples in the
    table's heap. That is probably going to be much slower than a
    sequential scan. It is apparently choosing this index to avoid a
    sort, because of the mis-estimation on the number of rows. Is it
    critical that the rows be returned in that order? If not, you might
    see much faster performance by leaving off the ORDER BY clause so
    that it can use the seqscan.

    You could potentially make queries like this much faster by indexing
    on channel_id, or by indexing on data_id WHERE channel_id =
    'sale_transaction'..

    You could also set up optimization barriers with clever use of a CTE
    or an OFFSET 0 to force it to use a seqscan followed by a sort, but
    I would look at the other options first.

    -Kevin
  • Greg Williamson at Aug 6, 2011 at 3:09 am
    Tuan --
    Thanks for your help.
    I create index on channel_id and data_id like your comment.
    ...
    <...>
    explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, > d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where
    d.channel_id='sale_transaction' order by d.data_id asc;

    Nested Loop  (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual time=25741.704..7650979.311 rows=2764140 loops=1)
    ->  Index Scan using idx_d_channel_id3 on sym_data d  (cost=0.00..1781979.40 rows=3117384 width=1401) (actual time=83.718..55126.002 rows=3124631 loops=1)
    ->  Index Scan using sym_data_gap_pkey on sym_data_gap g  (cost=0.00..358.37 rows=8426 width=8) (actual time=2.428..2.429 rows=1 loops=3124631)
    Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
    Filter: (g.status = 'GP'::bpchar)
    Total runtime: 7651803.073 ms

    But query performance don't change.
    Please help me.
    Did you run an analyze on the table after building the new indexes ? The row estimates seem to be off wildly,
    although that may be a symptom of something else and not related, it is worth ruling out the easily tried.

    HTH,

    Greg Williamson
  • Tuanhoanganh at Aug 6, 2011 at 3:43 am
    Yes, I run
    VACUUM VERBOSE ANALYZE sym_data;
    VACUUM VERBOSE ANALYZE sym_data_gap;
    after create index.

    If i remove ORDER BY, the query run faster.

    explain analyze select d.data_id, d.table_name, d.event_type, d.row_data,
    d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id,
    d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d
    inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id
    and g.end_id where d.channel_id='sale_transaction';

    Nested Loop (cost=0.00..1384889042.54 rows=26266634550 width=1400) (actual
    time=63.546..36699.188 rows=2764140 loops=1)
    -> Index Scan using idx_dg_status on sym_data_gap g (cost=0.00..2802.42
    rows=75838 width=8) (actual time=63.348..122.565 rows=75838 loops=1)
    Index Cond: (status = 'GP'::bpchar)
    -> Index Scan using idx_d_channel_id3 on sym_data d (cost=0.00..13065.83
    rows=346352 width=1400) (actual time=0.027..0.450 rows=36 loops=75838)
    Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
    Total runtime: 37226.543 ms
    On Sat, Aug 6, 2011 at 10:09 AM, Greg Williamson wrote:


    Did you run an analyze on the table after building the new indexes ? The
    row estimates seem to be off wildly,
    although that may be a symptom of something else and not related, it is
    worth ruling out the easily tried.

    HTH,

    Greg Williamson
  • Kevin Grittner at Aug 6, 2011 at 4:07 pm
    [Please don't top-post; it makes the discussion hard to follow.]


    tuanhoanganh wrote:
    Greg Williamson wrote:
    Did you run an analyze on the table after building the new
    indexes? The row estimates seem to be off wildly, although that
    may be a symptom of something else
    I think that's because the optimizer doesn't know how to estimate the
    range test properly, and resorts to "magic numbers" based on
    percentages of the rows in the table.
    If i remove ORDER BY, the query run faster.
    Yeah, it thinks there will be 26 billion rows, and that sorting that
    would be very expensive. You really have only 76 thousand rows,
    which wouldn't be so bad. I'm not sure whether this would work, but
    if you need the ordering, you might try:

    WITH x AS SELECT * FROM x ORDER BY d.data_id;

    -Kevin
  • Kevin Grittner at Aug 6, 2011 at 4:26 pm

    "Kevin Grittner" wrote:

    WITH x AS SELECT * FROM x ORDER BY d.data_id;
    It ate part of what I had on that line. (Note to self: don't use
    angle-bracketing in posts.)

    Trying again with different punctuation:

    WITH x AS [original query] SELECT * FROM x ORDER BY d.data_id;

    -Kevin
  • Tuanhoanganh at Aug 8, 2011 at 6:19 pm
    Thanks for your help. But I can not change the query because it is generate
    by SymmetricDS program. So I only can create index on table and change
    config of postgres to tunning the query. Is there any way to do that?

    Sorry for my English

    Tuan Hoang ANh
    On Sat, Aug 6, 2011 at 11:26 PM, Kevin Grittner wrote:

    "Kevin Grittner" wrote:
    WITH x AS SELECT * FROM x ORDER BY d.data_id;
    It ate part of what I had on that line. (Note to self: don't use
    angle-bracketing in posts.)

    Trying again with different punctuation:

    WITH x AS [original query] SELECT * FROM x ORDER BY d.data_id;

    -Kevin
  • Kevin Grittner at Aug 8, 2011 at 6:35 pm

    tuanhoanganh wrote:

    I can not change the query because it is generate by SymmetricDS
    program. So I only can create index on table and change config of
    postgres to tunning the query. Is there any way to do that?
    I'm not familiar with SymetricDS. Would it be possible for you to
    create a VIEW which used the faster technique and have SymetricDS
    use the view?

    -Kevin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedAug 5, '11 at 4:43p
activeAug 8, '11 at 6:35p
posts9
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase