Would some people please run the attached test procedure and report back
the results? I basically need to know the patch is an improvement on
more platforms than just my own. Thanks


Run this script and record the time reported:


Modify postgresql.conf:

stats_command_string = on

and reload the server. Do "SELECT * FROM pg_stat_activity;" to verify
the command string is enabled. You should see your query in the
"current query" column.

Rerun the stat.script again and record the time.

Apply this patch to CVS HEAD:


Run the stat.script again and record the time.

Report via email your three times and your platform.

If the patch worked, the first and third times will be similar, and
the second time will be high.


Bruce Momjian wrote:
Tom Lane wrote:
Michael Fuhr <mike@fuhr.org> writes:
Further tests show that for this application
the killer is stats_command_string, not stats_block_level or
I tried it with pgbench -c 10, and got these results:
41% reduction in TPS rate for stats_command_string
9% reduction in TPS rate for stats_block/row_level (any combination)

strace'ing a backend confirms my belief that stats_block/row_level send
just one stats message per transaction (at least for the relatively
small number of tables touched per transaction by pgbench). However
stats_command_string sends 14(!) --- there are seven commands per
pgbench transaction and each results in sending a <command> message and
later an <IDLE> message.

Given the rather lackadaisical way in which the stats collector makes
the data available, it seems like the backends are being much too
enthusiastic about posting their stats_command_string status
immediately. Might be worth thinking about how to cut back the
overhead by suppressing some of these messages.
I did some research on this because the numbers Tom quotes indicate there
is something wrong in the way we process stats_command_string

I made a small test script:

if [ ! -f /tmp/pgstat.sql ]
then i=0
while [ $i -lt 10000 ]
i=`expr $i + 1`
echo "SELECT 1;"
done > /tmp/pgstat.sql

time psql test </tmp/pgstat.sql >/dev/null

This sends 10,000 "SELECT 1" queries to the backend, and reports the
execution time. I found that without stats_command_string defined, it
ran in 3.5 seconds. With stats_command_string defined, it took 5.5
seconds, meaning the command string is causing a 57% slowdown. That is
way too much considering that the SELECT 1 has to be send from psql to
the backend, parsed, optimized, and executed, and the result returned to
the psql, while stats_command_string only has to send a string to a
backend collector. There is _no_ way that collector should take 57% of
the time it takes to run the actual query.

With the test program, I tried various options. The basic code we have
sends a UDP packet to a statistics buffer process, which recv()'s the
packet, puts it into a memory queue buffer, and writes it to a pipe()
that is read by the statistics collector process which processes the

I tried various ways of speeding up the buffer and collector processes.
I found if I put a pg_usleep(100) in the buffer process the backend
speed was good, but packets were lost. What I found worked well was to
do multiple recv() calls in a loop. The previous code did a select(),
then perhaps a recv() and pipe write() based on the results of the
select(). This caused many small packets to be written to the pipe and
the pipe write overhead seems fairly large. The best fix I found was to
loop over the recv() call at most 25 times, collecting a group of
packets that can then be sent to the collector in one pipe write. The
recv() socket is non-blocking, so a zero return indicates there are no
more packets available. Patch attached.

This change reduced the stats_command_string time from 5.5 to 3.9, which
is closer to the 3.5 seconds with stats_command_string off.

A second improvement I discovered is that the statistics collector is
calling gettimeofday() for every packet received, so it can determine
the timeout for the select() call to write the flat file. I removed
that behavior and instead used setitimer() to issue a SIGINT every
500ms, which was the original behavior. This eliminates the
gettimeofday() call and makes the code cleaner. Second patch attached.
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 21 | next ›
Discussion Overview
grouppgsql-hackers @
postedJun 15, '06 at 4:05a
activeJun 17, '06 at 5:43p



site design / logo © 2018 Grokbase