Hello I'm tuning a postgresql (7.4.2) server for best performance .
I have a question about the planner .
I have two identical tables : one stores short data (about 2.000.000
record now) and
the other historycal data ( about 8.000.000 record now and growing ...)


A simple test query : select tag_id,valore_tag,data_tag from
storico_misure where (data_tag>'2004-05-03' and data_tag <'2004-05-12')
and tag_id=37423 ;

Takes 57,637 ms on the short table and 1321,448 ms (!!) on the
historycal table .Tables are vacuumed and reindexed .



Tables and query plans :

\d storico_misure
Table "tenore.storico_misure"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------
data_tag | timestamp without time zone | not null
tag_id | integer | not null
unita_misura | character varying(6) | not null
valore_tag | numeric(20,3) | not null
qualita | integer | not null
numero_campioni | numeric(5,0) |
frequenza_campionamento | numeric(3,0) |
Indexes:
"pk_storico_misure_2" primary key, btree (data_tag, tag_id)
"pk_anagtstorico_misuree_idx_2" btree (tag_id)
"storico_misure_data_tag_idx_2" btree (data_tag)

storico=# \d storico_misure_short
Table "tenore.storico_misure_short"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------
data_tag | timestamp without time zone | not null
tag_id | integer | not null
unita_misura | character varying(6) | not null
valore_tag | numeric(20,3) | not null
qualita | integer | not null
numero_campioni | numeric(5,0) |
frequenza_campionamento | numeric(3,0) |
Indexes:
"storico_misure_short_pkey_2" primary key, btree (data_tag, tag_id)
"pk_anagtstorico_misuree_short_idx_2" btree (tag_id)
"storico_misure_short_data_tag_idx_2" btree (data_tag)

storico=#
storico=#
storico=# explain select tag_id,valore_tag,data_tag from storico_misure
where (data_tag>'2004-05-03' and data_tag <'2004-05-12') and
tag_id=37423 ;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------
Index Scan using pk_storico_misure_2 on storico_misure
(cost=0.00..1984.64 rows=658 width=21)
Index Cond: ((data_tag > '2004-05-03 00:00:00'::timestamp without
time zone) AND (data_tag < '2004-05-12 00:00:00'::timestamp without time
zone) AND (tag_id = 37423))
(2 rows)

Time: 1,667 ms
storico=# explain select tag_id,valore_tag,data_tag from
storico_misure_short where (data_tag>'2004-05-03' and data_tag
<'2004-05-12') and tag_id=37423 ;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-
Index Scan using pk_anagtstorico_misuree_short_idx_2 on
storico_misure_short (cost=0.00..1784.04 rows=629 width=20)
Index Cond: (tag_id = 37423)
Filter: ((data_tag > '2004-05-03 00:00:00'::timestamp without time
zone) AND (data_tag < '2004-05-12 00:00:00'::timestamp without time
zone))


How can i force the planner to use the same query plan ? I'd like to
test if using the same query plan i've better performace .

Thanks in advance




this is my posgresql.conf

#-----------------------------------------------------------------------
----
# CONNECTIONS AND AUTHENTICATION
#-----------------------------------------------------------------------
----

# - Connection Settings -

tcpip_socket = true
max_connections = 100
# note: increasing max_connections costs about 500 bytes of
shared
# memory per connection slot, in addition to costs from
shared_buffers
# and max_locks_per_transaction.
#superuser_reserved_connections = 2
port = 5432
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#virtual_host = '' # what interface to listen on; defaults
to any
#rendezvous_name = '' # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60 # 1-600, in seconds
#ssl = false
#password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false


#-----------------------------------------------------------------------
----
# RESOURCE USAGE (except WAL)
#-----------------------------------------------------------------------
----

# - Memory -

shared_buffers = 3000 # min 16, at least max_connections*2,
8KB each
sort_mem = 4096 # min 64, size in KB
vacuum_mem = 32768 # min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000 # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000 # min 25
#preload_libraries = ''


#-----------------------------------------------------------------------
----
# WRITE AHEAD LOG
#-----------------------------------------------------------------------
----

# - Settings -

fsync = false # turns forced synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasync
#wal_buffers = 8 # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 12 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#checkpoint_warning = 30 # 0 is off, in seconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000


#-----------------------------------------------------------------------
----
# QUERY TUNING
#-----------------------------------------------------------------------
----

# - Planner Method Enabling -

enable_hashagg = false
enable_hashjoin = false
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = false
enable_seqscan = true
enable_sort = false
enable_tidscan = false

# - Planner Cost Constants -

#effective_cache_size = 1000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch
cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 11
#geqo_effort = 1
#geqo_generations = 0
#geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit
JOINs


#-----------------------------------------------------------------------
----
# ERROR REPORTING AND LOGGING
#-----------------------------------------------------------------------
----

# - Syslog -

#syslog = 0 # range 0-2; 0=stdout; 1=both; 2=syslog
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2,
debug1,
# log, info, notice, warning, error

#log_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2,
debug1,
# info, notice, warning, error, log,
fatal,
# panic

#log_error_verbosity = default # terse, default, or verbose messages

#log_min_error_statement = panic # Values in order of increasing
severity:
# debug5, debug4, debug3, debug2,
debug1,
# info, notice, warning, error,
panic(off)

#log_min_duration_statement = -1 # Log all statements whose
# execution time exceeds the value, in
# milliseconds. Zero prints all
queries.
# Minus-one disables.

#silent_mode = false # DO NOT USE without Syslog!

# - What to Log -

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#log_connections = false
#log_duration = false
#log_pid = false
#log_statement = false
#log_timestamp = false
#log_hostname = false
#log_source_port = false


#-----------------------------------------------------------------------
----
# RUNTIME STATISTICS
#-----------------------------------------------------------------------
----

# - Statistics Monitoring -

#log_parser_stats = false
#log_planner_stats = false
#log_executor_stats = false
#log_statement_stats = false

# - Query/Index Statistics Collector -

stats_start_collector = true
stats_command_string = true
#stats_block_level = false
stats_row_level = true
#stats_reset_on_server_start = true


#-----------------------------------------------------------------------
----
# CLIENT CONNECTION DEFAULTS
#-----------------------------------------------------------------------
----

# - Statement Behavior -

#search_path = '$user,public' # schema names
#check_function_bodies = true
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = false
statement_timeout = 360000 # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown # actually, defaults to TZ environment
setting
#australian_timezones = false
#extra_float_digits = 0 # min -15, max 2
#client_encoding = sql_ascii # actually, defaults to database
encoding

# These settings are initialized by initdb -- they may be changed
lc_messages = 'it_IT.UTF-8' # locale for system error
message strings
lc_monetary = 'it_IT.UTF-8' # locale for monetary formatting
lc_numeric = 'it_IT.UTF-8' # locale for number formatting
lc_time = 'it_IT.UTF-8' # locale for time formatting

# - Other Defaults -

#explain_pretty_print = true
#dynamic_library_path = '$libdir'
#max_expr_depth = 10000 # min 10


#-----------------------------------------------------------------------
----
# LOCK MANAGEMENT
#-----------------------------------------------------------------------
----

#deadlock_timeout = 1000 # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes
each


#-----------------------------------------------------------------------
----
# VERSION/PLATFORM COMPATIBILITY
#-----------------------------------------------------------------------
----

# - Previous Postgres Versions -

#add_missing_from = true
#regex_flavor = advanced # advanced, extended, or basic
#sql_inheritance = true

# - Other Platforms & Clients -

#transform_null_equals = false

Search Discussions

  • Shridhar Daithankar at May 13, 2004 at 1:05 pm

    Fabio Panizzutti wrote:
    storico=# explain select tag_id,valore_tag,data_tag from storico_misure
    where (data_tag>'2004-05-03' and data_tag <'2004-05-12') and
    tag_id=37423 ;
    Can you please post explain analyze? That includes actual timings.

    Looking at the schema, can you try "and tag_id=37423::integer" instead?
    enable_hashagg = false
    enable_hashjoin = false
    enable_indexscan = true
    enable_mergejoin = true
    enable_nestloop = false
    enable_seqscan = true
    enable_sort = false
    enable_tidscan = false
    Why do you have these off? AFAIK, 7.4 improved hash aggregates a lot. So you
    might miss on these in this case.
    # - Planner Cost Constants -

    #effective_cache_size = 1000 # typically 8KB each
    You might set it to something realistic.

    And what is your hardware setup? Disks/CPU/RAM?

    Just to be sure, you went thr.
    http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html and
    http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html?

    HTH

    Regards
    Shridhar
  • Fabio Panizzutti at May 13, 2004 at 1:56 pm

    -----Messaggio originale-----
    Da: [email protected]
    Per conto di
    Shridhar Daithankar
    Inviato: giovedì 13 maggio 2004 15.05
    A: Fabio Panizzutti
    Cc: [email protected]
    Oggetto: Re: [PERFORM] Query plan on identical tables differs . Why ?


    Fabio Panizzutti wrote:
    storico=# explain select tag_id,valore_tag,data_tag from
    storico_misure where (data_tag>'2004-05-03' and data_tag
    <'2004-05-12') and tag_id=37423 ;
    Can you please post explain analyze? That includes actual timings.
    storico=# explain analyze select tag_id,valore_tag,data_tag from
    storico_misure where (data_tag>'2004-05-03' and data_tag <'2004-05-12')
    and tag_id=37423 ;

    QUERY PLAN
    ------------------------------------------------------------------------
    ------------------------------------------------------------------------
    --------------------------
    Index Scan using pk_storico_misure_2 on storico_misure
    (cost=0.00..1984.64 rows=658 width=21) (actual time=723.441..1858.107
    rows=835 loops=1)
    Index Cond: ((data_tag > '2004-05-03 00:00:00'::timestamp without
    time zone) AND (data_tag < '2004-05-12 00:00:00'::timestamp without time
    zone) AND (tag_id = 37423))
    Total runtime: 1860.641 ms
    (3 rows)

    storico=# explain analyze select tag_id,valore_tag,data_tag from
    storico_misure_short where (data_tag>'2004-05-03' and data_tag
    <'2004-05-12') and tag_id=37423 ;

    QUERY PLAN
    ------------------------------------------------------------------------
    ------------------------------------------------------------------------
    -------------------
    Index Scan using pk_anagtstorico_misuree_short_idx_2 on
    storico_misure_short (cost=0.00..1783.04 rows=629 width=20) (actual
    time=0.323..42.186 rows=864 loops=1)
    Index Cond: (tag_id = 37423)
    Filter: ((data_tag > '2004-05-03 00:00:00'::timestamp without time
    zone) AND (data_tag < '2004-05-12 00:00:00'::timestamp without time
    zone))
    Total runtime: 43.166 ms



    Looking at the schema, can you try "and
    tag_id=37423::integer" instead?
    I try :
    explain analyze select tag_id,valore_tag,data_tag from storico_misure
    where (data_tag>'2004-05-03' and data_tag <'2004-05-12') and
    tag_id=37423::integer;
    Index Scan using pk_storico_misure_2 on storico_misure
    (cost=0.00..1984.64 rows=658 width=21) (actual time=393.337..1303.998
    rows=835 loops=1)
    Index Cond: ((data_tag > '2004-05-03 00:00:00'::timestamp without
    time zone) AND (data_tag < '2004-05-12 00:00:00'::timestamp without time
    zone) AND (tag_id = 37423))
    Total runtime: 1306.484 ms
    enable_hashagg = false
    enable_hashjoin = false
    enable_indexscan = true
    enable_mergejoin = true
    enable_nestloop = false
    enable_seqscan = true
    enable_sort = false
    enable_tidscan = false
    Why do you have these off? AFAIK, 7.4 improved hash
    aggregates a lot. So you
    might miss on these in this case.
    I try for debug purpose , now i reset all 'enable' to default :

    select * from pg_settings where name like 'enable%';
    name | setting | context | vartype | source |
    min_val | max_val
    ------------------+---------+---------+---------+--------------------+--
    -------+---------
    enable_hashagg | on | user | bool | configuration file |

    enable_hashjoin | on | user | bool | configuration file |

    enable_indexscan | on | user | bool | configuration file |

    enable_mergejoin | on | user | bool | configuration file |

    enable_nestloop | on | user | bool | configuration file |

    enable_seqscan | on | user | bool | configuration file |

    enable_sort | on | user | bool | configuration file |

    enable_tidscan | on | user | bool | configuration file |
    (8 rows)

    The query plan are the same ....
    # - Planner Cost Constants -

    #effective_cache_size = 1000 # typically 8KB each
    You might set it to something realistic.
    I try 10000 and 100000 but nothing change .


    And what is your hardware setup? Disks/CPU/RAM?
    32GB SCSI/DUAL Intel(R) Pentium(R) III CPU family 1133MHz/ 1GB RAM
    and linux red-hat 9


    I don't understand why the planner chose a different query plan on
    identical tables with same indexes .

    Thanks a lot for help!.

    Fabio
  • Tom Lane at May 13, 2004 at 3:01 pm

    "Fabio Panizzutti" <[email protected]> writes:
    I don't understand why the planner chose a different query plan on
    identical tables with same indexes .
    Different data statistics; not to mention different table sizes
    (the cost equations are not linear).

    Have you ANALYZEd (or VACUUM ANALYZEd) both tables recently?

    If the stats are up to date but still not doing the right thing,
    you might try increasing the statistics target for the larger
    table's tag_id column. See ALTER TABLE SET STATISTICS.

    regards, tom lane
  • Stephan Szabo at May 13, 2004 at 3:16 pm

    On Thu, 13 May 2004, Fabio Panizzutti wrote:


    I don't understand why the planner chose a different query plan on
    identical tables with same indexes .
    Because it's more than table structure that affects the choice made by the
    planner. In addition the statistics about the values that are there as
    well as the estimated size of the table have effects. One way to see is
    to see what it thinks is best is to remove the indexes it is using and see
    what plan it gives then, how long it takes and the estimated costs for
    those plans.

    In other suggestions, I think having a (tag_id, data_tag) index rather
    than (data_tag, tag_id) may be a win for queries like this. Also, unless
    you're doing many select queries by only the first field of the composite
    index and you're not doing very many insert/update/deletes, you may want
    to drop the other index on just that field.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMay 13, '04 at 12:33p
activeMay 13, '04 at 3:16p
posts5
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase