Hi there,

I've got a database query that sounded easy at first, but I'm having a
hard time wrapping my head around how to get it to work.

Here's the situation. I have a table that lists about 20-30 server
paths. A program goes through and uses another tool to generate
information (as contents change) for all filespecs that start with
these paths. For example, one entry might be:
//depot/program/src/trunk/ and the maintenance program runs hourly and
generates data about everything under that (like
//depot/program/src/trunk/file.c,
//depot/program/src/trunk/tool/otherfile.cpp). As a result, I have
another table that's been populated with about 300,000 entries.

With this in mind, I want to write a query that will list the entries
in the first table (easy) along with a count() of how many entries in
the other table start with that path (the hard part).

The table with the 20-30 entrie list of paths:
CREATE TABLE trackedpaths
(
path_id int8 NOT NULL,
pathspec varchar(512),
path_name varchar(512),
lastupdated timestamp,
lastchangelist int8
)

The 300K+ table. Each 'filespec' below begins with a 'pathspec' from
the table above:
CREATE TABLE changehistory
(
linesadded int8,
linesdeleted int8,
lineschanged int8,
datestamp timestamp,
change int8 NOT NULL,
filespec varchar(512) NOT NULL,
changeauthor varchar(128),
"comment" varchar(32)
)

I tried handling this programmaticaly by having a loop that queries
each path, then does another query below of "SELECT COUNT(*) FROM
changehistory WHERE UPPER(filespec) LIKE UPPER('$pathspec%')". Each
count query works fine, but the performance is crippling.

Any ideas on how to make Postgres do the heavy lifting?

Thanks!

Ben Hallert

Search Discussions

  • Michael Fuhr at Sep 11, 2005 at 4:45 am

    On Thu, Sep 08, 2005 at 10:02:44AM -0700, Ben Hallert wrote:
    With this in mind, I want to write a query that will list the entries
    in the first table (easy) along with a count() of how many entries in
    the other table start with that path (the hard part). [...]
    I tried handling this programmaticaly by having a loop that queries
    each path, then does another query below of "SELECT COUNT(*) FROM
    changehistory WHERE UPPER(filespec) LIKE UPPER('$pathspec%')". Each
    count query works fine, but the performance is crippling.
    Do you have an expression index on upper(filespec)? That should
    speed up queries such as the above. Another possibility might
    involve using contrib/ltree. And instead of looping through each
    path, you could use an inner or outer join.

    CREATE TABLE trackedpaths (pathname ltree);
    CREATE TABLE changehistory (filespec ltree);

    INSERT INTO trackedpaths (pathname) VALUES ('abc.def');
    INSERT INTO trackedpaths (pathname) VALUES ('ghi.jkl');
    INSERT INTO trackedpaths (pathname) VALUES ('mno.pqr');

    INSERT INTO changehistory (filespec) VALUES ('abc.def.123');
    INSERT INTO changehistory (filespec) VALUES ('abc.def.123.456');
    INSERT INTO changehistory (filespec) VALUES ('ghi.jkl.789');

    SELECT t.pathname, count(c.*)
    FROM trackedpaths AS t
    LEFT OUTER JOIN changehistory AS c ON c.filespec <@ t.pathname
    GROUP BY t.pathname
    ORDER BY t.pathname;

    pathname | count
    ----------+-------
    abc.def | 2
    ghi.jkl | 1
    mno.pqr | 0
    (3 rows)

    --
    Michael Fuhr

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedSep 8, '05 at 5:02p
activeSep 11, '05 at 4:45a
posts2
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Ben Hallert: 1 post Michael Fuhr: 1 post

People

Translate

site design / logo © 2022 Grokbase