Attached is a patch to display getrusage output to EXPLAIN output.
This is the patch I mentioned previously in
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and
it raises the same issues we were talking about then. Should the
resource usage stats displayed be per-iteration totals for the whole
query execution for that node. I can't see dividing by nloops making
things clearer but the discrepancy is becoming more noticeable. I
wonder if there's anything we can do to make things clearer.

Incidentally, this is a first cut at the patch written in a single
sitting. I need to go through it again to make sure I didn't do
anything silly. One thing I know is broken is Windows support. I
fleshed out our getrusage stub a bit but I can't test it at all, and
the INSTR_TIME_* macros actually are defined differently on windows so
I can't use them quite so simply on struct timeval. The simplest
option is to define the struct timeval versions always even if
INSTR_TIME_* doesn't use them and this code can use them directly.
Another would be to modify the struct rusage definition on Windows so
we use the native Windows time datatype -- which would be tempting
since it would avoid the loss of precision in "only" having
microsecond precision.


postgres=# explain (analyze, resource) select * from x;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on x (cost=0.00..11540.00 rows=800000 width=4) (actual
time=69.851..1287.025 rows=800000 loops=1)
Resources: sys=240.000ms user=940.000ms read=27.7MB
Total runtime: 2431.237 ms
(3 rows)

[...flush buffers and drop caches between tests...]

postgres=# explain (analyze, verbose, buffers, resource) select * from x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on public.x (cost=0.00..11540.00 rows=800000 width=4)
(actual time=28.739..1290.786 rows=800000 loops=1)
Output: generate_series
Resources: sys=280.000ms user=890.000ms read=27.7MB minflt=65
nvcsw=9 nivcsw=633
Buffers: shared read=3540
Total runtime: 2487.629 ms
(5 rows)


--
greg

Search Discussions

  • Itagaki Takahiro at Oct 1, 2010 at 1:51 am

    On Fri, Oct 1, 2010 at 9:16 AM, Greg Stark wrote:
    Attached is a patch to display getrusage output to EXPLAIN output.
    This is the patch I mentioned previously in
    http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and
    it raises the same issues we were talking about then.
    How much overhead do you have with "resource" option?
    getrusage() calls for each tuple might have considerable overheads.
    How much difference between (analyze) and (analyze, resource) options?

    Auto_explain and pg_stat_statements will be also adjusted to the change
    when the patch is acceptable, I was asked for "queries ordered by CPU times"
    in pg_stat_statements several times. The getrusage infrastructure will
    make it a real possibility.

    --
    Itagaki Takahiro
  • Greg Stark at Oct 1, 2010 at 6:14 pm

    On Thu, Sep 30, 2010 at 6:51 PM, Itagaki Takahiro wrote:
    How much overhead do you have with "resource" option?
    getrusage() calls for each tuple might have considerable overheads.
    How much difference between (analyze) and (analyze, resource) options?
    Here's strace -c for a select count(*) from x where x is a table with
    800,000 narrow rows entirely in shared buffers.

    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    99.56 0.025299 0 6400024 getrusage
    0.44 0.000112 0 9872 read
    0.00 0.000000 0 1 open
    0.00 0.000000 0 13 lseek
    0.00 0.000000 0 10 sendto
    0.00 0.000000 0 5 recvfrom
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.025411 6409925 total

    That's under 40ns per getrusage call. My test C program seems to take
    about 300ns per getrusage call however so I'm not sure I trust the
    strace results. Either way that's pretty negligable. It's basically
    just copying a struct from kernel space to user space.

    Of course it depends on what query you run. In my very simple example
    of doing select count(*) from x where x has 800k rows the whole query
    takes about 100ms. Since there are two nodes (the seq scan and the
    aggregate) it has to do 3.2 million getrusage calls. That's about
    960ms total. So it imposes nearly a 10x speed slowdown but only
    because the base measurement was so low. For what it's worth that's
    less than the overhead from gettimeofday which seems to be imposing
    about a 1100ms of overhead.

    The observed timings are
    query only 100.5ms
    explain (analyze) 1234.9ms
    explain (analyze,resource) 2210.2ms

    --
    greg
  • Itagaki Takahiro at Oct 4, 2010 at 10:29 am

    On Sat, Oct 2, 2010 at 3:14 AM, Greg Stark wrote:

    The observed timings are
    query only                 100.5ms
    explain (analyze)         1234.9ms
    explain (analyze,resource) 2210.2ms
    It's very large overheads, but I think it is still useful
    if there are no overhead if we disable the resource option.

    Some random ideas and comments:
    * Only top-level rusage might be useful as a replacement for
    ancient log_xxx_stats, and should have smaller overhead than
    full-spec counters.
    * There are some overlaps between the feature and DTrace hooks.
    If we need such extension even though we have DTrace hooks,
    it might mean DTrace hooks are hard to use for average users
    and maybe also for postgres' hackers...
    * Some kinds of statistical sampling could solve too much overhead
    in stop-watch based algorithm. Sampling is not always accurate,
    but it might have better balance between overhead and resolution.

    --
    Itagaki Takahiro
  • Greg Stark at Oct 4, 2010 at 4:20 pm

    On Mon, Oct 4, 2010 at 3:29 AM, Itagaki Takahiro wrote:
    * There are some overlaps between the feature and DTrace hooks.
    If we need such extension even though we have DTrace hooks,
    it might mean DTrace hooks are hard to use for average users
    and maybe also for postgres' hackers...
    I've had it on my radar for a long time to make use of DTrace directly
    in EXPLAIN output. The dtrace folk are really not interested in
    programmatic access to the interfaces to the counters becasue they're
    afraid it'll nail down internal interfaces too early. But I do think
    there's lots of potential there.

    * Some kinds of statistical sampling could solve too much overhead
    in stop-watch based algorithm. Sampling is not always accurate,
    but it might have better balance between overhead and resolution.
    Tom experimented with doing statistical sampling and found the results
    were even worse than the overhead from stopwatch profiling. I think we
    should be able to subtract the overhead from the measurements if it's
    a problem. But actually in the case of getrusage stats I don't think
    it's an issue. The main interesting bits in getrusage are inblock and
    oublock which don't have any profiling overhead -- getrusage doesn't
    do any i/o. if you're only concerned with the cpu run-time of an
    in-memory query you don't really need to look at getrusage at all.

    --
    greg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 1, '10 at 12:17a
activeOct 4, '10 at 4:20p
posts5
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Greg Stark: 3 posts Itagaki Takahiro: 2 posts

People

Translate

site design / logo © 2021 Grokbase