As was discussed in several threads, I'd handed over the
responsibility of hierarchical queries to Greg Stark several weeks
ago. He posted a preliminary patch which I don't believe anyone
looked at. For 8.3's sake, I wanted to make sure we get the status of
this out on the table so there won't be any surprises like those
related to 8.2.

Where are we at? Has anyone reviewed the preliminary work? Any
comments, suggestions, etc?

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

Search Discussions

  • Gregory Stark at Feb 21, 2007 at 8:56 pm

    "Jonah H. Harris" <jonah.harris@gmail.com> writes:

    As was discussed in several threads, I'd handed over the
    responsibility of hierarchical queries to Greg Stark several weeks
    ago. He posted a preliminary patch which I don't believe anyone
    looked at. For 8.3's sake, I wanted to make sure we get the status of
    this out on the table so there won't be any surprises like those
    related to 8.2.

    Where are we at?
    The preliminary patch didn't actually do anything recursive. It handled
    non-recursive WITH clauses by directly inlining the subquery as if it were a
    subquery RangeTable.

    Now that's not entirely useless, it's a handy syntactic sugar for having to
    write the same query multiple times.
    Has anyone reviewed the preliminary work? Any comments, suggestions, etc?
    I had asked questions about whether people thought the places where I was
    storing the state were appropriate. I'm not entirely clear on what types of
    state should live in the pstate versus in the parse tree versus elsewhere.

    Specifically I asked about a problem where I thought using the pstate to store
    the scope of the cte names would give the right semantics where they get
    inherited by subqueries but pass out of scope for outer queries. However for
    some reason I wasn't getting the behaviour I was expecting and subqueries
    didn't seem to have them in scope.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Gavin Sherry at Feb 21, 2007 at 9:07 pm

    On Wed, 21 Feb 2007, Jonah H. Harris wrote:

    As was discussed in several threads, I'd handed over the
    responsibility of hierarchical queries to Greg Stark several weeks
    ago. He posted a preliminary patch which I don't believe anyone
    looked at. For 8.3's sake, I wanted to make sure we get the status of
    this out on the table so there won't be any surprises like those
    related to 8.2.

    Where are we at? Has anyone reviewed the preliminary work? Any
    comments, suggestions, etc?
    Yes, I looked at it.

    The WITH support seems okay. I guess I'd thought it might be represented
    different internally (not a sub query) but the approach Greg has taken is
    probably more straight forward (in that you get a lot of proven code for
    free). It should work fine for recursive queries too, if you just re-seed
    the param keys for every scan of the 'sub-query'.

    I wonder if anyone can think of a good way to cost the recursive side of
    the query. I'm still pre-coffee and it hurts my head :).

    Gavin
  • Gregory Stark at Feb 21, 2007 at 10:52 pm

    "Gavin Sherry" <swm@alcove.com.au> writes:

    The WITH support seems okay. I guess I'd thought it might be represented
    different internally (not a sub query) but the approach Greg has taken is
    probably more straight forward (in that you get a lot of proven code for
    free). It should work fine for recursive queries too, if you just re-seed
    the param keys for every scan of the 'sub-query'.
    I don't think it works for recursive queries. Since you can't have the same
    executor plan in motion for two different sets of parameters simultaneously.
    That's why I was talking about a Memoize node.

    It is sufficient for the non-recursive case which might make it worthwhile
    putting it in 8.3. But even there user's expectations are probably that the
    reason they're writing it as a cte is precisely to avoid duplicate execution.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Gavin Sherry at Feb 22, 2007 at 1:09 am

    On Wed, 21 Feb 2007, Gregory Stark wrote:

    "Gavin Sherry" <swm@alcove.com.au> writes:
    The WITH support seems okay. I guess I'd thought it might be represented
    different internally (not a sub query) but the approach Greg has taken is
    probably more straight forward (in that you get a lot of proven code for
    free). It should work fine for recursive queries too, if you just re-seed
    the param keys for every scan of the 'sub-query'.
    I don't think it works for recursive queries. Since you can't have the same
    executor plan in motion for two different sets of parameters simultaneously.
    That's why I was talking about a Memoize node.
    Can you elaborate on the 'two different sets of parameters' bit? I'm still
    without coffee.
    It is sufficient for the non-recursive case which might make it worthwhile
    putting it in 8.3. But even there user's expectations are probably that the
    reason they're writing it as a cte is precisely to avoid duplicate execution.
    I wonder if the planner should decide that?

    Thanks,

    Gavin
  • Gregory Stark at Feb 22, 2007 at 1:29 am

    "Gavin Sherry" <swm@alcove.com.au> writes:

    Can you elaborate on the 'two different sets of parameters' bit? I'm still
    without coffee.
    The spec allows for arbitrarily complex recursive query structures. Including
    mutually recursive queries, and even non-linearly recursive queries. I found
    grokking them requires far stronger brews than coffee :)

    But in a simple recursive tree search you have a node which wants to do a join
    between the output of tree level n against some table to produce tree level
    n+1. It can't simply execute the plan to produce tree level n since that's the
    same tree it's executing itself. If it calls the Init method on itself it'll
    lose all its state.

    There's another reason it can't just execute the previous node. You really
    don't want to recompute all the results for level n when you go to produce
    level n+1. You want to keep them around from the previous iteration. Otherwise
    you have an n^2 algorithm.

    Think of the fibonacci sequence algorithm: if you implement it recursively the
    naive way you have to return all the way back to the beginning to calculate
    each number. But if you remember the last two you can calculate it directly
    without recalculating all the previous number repeatedly.

    It is sufficient for the non-recursive case which might make it worthwhile
    putting it in 8.3. But even there user's expectations are probably that the
    reason they're writing it as a cte is precisely to avoid duplicate execution.
    I wonder if the planner should decide that?
    That's one option. For the non-recursive case we could inline the cte subquery
    everywhere it's referenced and then add smarts to the planner to find
    identical subqueries and have a heuristic to determine when it would be
    advantageous to calculate the result once.

    The alternative is to retain them as references to a single plan. Then have a
    heuristic for when to inline them.

    In neither case is a heuristic going to be particularly good. The problem is
    that for any reasonably complex plan it'll be cheaper to execute it only once
    than multiple times. Unless there's an advantage to be gained by inlining it
    such as being able to push conditions down into it. But the only way to find
    out if that will be possible would be to try planning it and see.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Gavin Sherry at Feb 22, 2007 at 1:39 am

    On Thu, 22 Feb 2007, Gregory Stark wrote:

    "Gavin Sherry" <swm@alcove.com.au> writes:
    Can you elaborate on the 'two different sets of parameters' bit? I'm still
    without coffee.
    The spec allows for arbitrarily complex recursive query structures. Including
    mutually recursive queries, and even non-linearly recursive queries. I found
    grokking them requires far stronger brews than coffee :) Hehe.
    But in a simple recursive tree search you have a node which wants to do a join
    between the output of tree level n against some table to produce tree level
    n+1. It can't simply execute the plan to produce tree level n since that's the
    same tree it's executing itself. If it calls the Init method on itself it'll
    lose all its state.

    There's another reason it can't just execute the previous node. You really
    don't want to recompute all the results for level n when you go to produce
    level n+1. You want to keep them around from the previous iteration. Otherwise
    you have an n^2 algorithm.
    Right. When I've spent some idle cycles thinking through this in the past
    I figured that in a non-trivial query, we'd end up with a bunch of
    materialisations, one for each level of recursion. That sounds very ugly.
    It is sufficient for the non-recursive case which might make it worthwhile
    putting it in 8.3. But even there user's expectations are probably that the
    reason they're writing it as a cte is precisely to avoid duplicate execution.
    I wonder if the planner should decide that?
    That's one option. For the non-recursive case we could inline the cte subquery
    everywhere it's referenced and then add smarts to the planner to find
    identical subqueries and have a heuristic to determine when it would be
    advantageous to calculate the result once.

    The alternative is to retain them as references to a single plan. Then have a
    heuristic for when to inline them.

    In neither case is a heuristic going to be particularly good. The problem is
    that for any reasonably complex plan it'll be cheaper to execute it only once
    than multiple times. Unless there's an advantage to be gained by inlining it
    such as being able to push conditions down into it. But the only way to find
    out if that will be possible would be to try planning it and see.
    Pushing down predicates was the exact idea I had in mind.

    Thanks,

    Gavin
  • Gregory Stark at Feb 22, 2007 at 7:59 am

    "Gavin Sherry" <swm@alcove.com.au> writes:
    On Thu, 22 Feb 2007, Gregory Stark wrote:

    But in a simple recursive tree search you have a node which wants to do a join
    between the output of tree level n against some table to produce tree level
    n+1. It can't simply execute the plan to produce tree level n since that's the
    same tree it's executing itself. If it calls the Init method on itself it'll
    lose all its state.

    There's another reason it can't just execute the previous node. You really
    don't want to recompute all the results for level n when you go to produce
    level n+1. You want to keep them around from the previous iteration. Otherwise
    you have an n^2 algorithm.
    Right. When I've spent some idle cycles thinking through this in the past
    I figured that in a non-trivial query, we'd end up with a bunch of
    materialisations, one for each level of recursion. That sounds very ugly.
    Well as long as you have precisely one for each level of recursion I think
    you're doing ok. The problem is if you do it the naive way you calculate the
    first level, then for the second level you recalculate the first level again,
    then for the third level you recalculate both of the previous two, ... So you
    end up with n copies of the first level, n-1 copies of the second level, ...

    If you reuse the result sets for subsequent recursive calls then you actually
    only need to keep then nth level around until you're done generating the n+1
    level.

    The trick is being able to have two different call sites in the plan tree
    pulling records out of the Materialize node at different points in the result
    set. That currently isn't possible.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Jim C. Nasby at Feb 23, 2007 at 4:31 pm

    On Thu, Feb 22, 2007 at 07:59:35AM +0000, Gregory Stark wrote:
    "Gavin Sherry" <swm@alcove.com.au> writes:
    On Thu, 22 Feb 2007, Gregory Stark wrote:

    But in a simple recursive tree search you have a node which wants to do a join
    between the output of tree level n against some table to produce tree level
    n+1. It can't simply execute the plan to produce tree level n since that's the
    same tree it's executing itself. If it calls the Init method on itself it'll
    lose all its state.

    There's another reason it can't just execute the previous node. You really
    don't want to recompute all the results for level n when you go to produce
    level n+1. You want to keep them around from the previous iteration. Otherwise
    you have an n^2 algorithm.
    Right. When I've spent some idle cycles thinking through this in the past
    I figured that in a non-trivial query, we'd end up with a bunch of
    materialisations, one for each level of recursion. That sounds very ugly.
    Well as long as you have precisely one for each level of recursion I think
    you're doing ok. The problem is if you do it the naive way you calculate the
    first level, then for the second level you recalculate the first level again,
    then for the third level you recalculate both of the previous two, ... So you
    end up with n copies of the first level, n-1 copies of the second level, ...

    If you reuse the result sets for subsequent recursive calls then you actually
    only need to keep then nth level around until you're done generating the n+1
    level.

    The trick is being able to have two different call sites in the plan tree
    pulling records out of the Materialize node at different points in the result
    set. That currently isn't possible.
    So it's sounding like the best we can get in 8.3 is WITH doing
    single-level subquery replacement?
    --
    Jim Nasby jim@nasby.net
    EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 21, '07 at 3:24p
activeFeb 23, '07 at 4:31p
posts9
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase