FAQ
I am using the Python SQLite3 interface, but the question is probably
general to python and SQL.

I want to run a query like

select * from table a, table b where a.foo IN foobar(b.bar)

where foobar is a user function (registered by create_function in
pysqlite3) returning a list of integers. However such functions can
only return basic data types so the above is invalid. I am wondering
what the best way around this is.

I could fetch rows from table b, compute foobar(b.bar) and create a
new query for each result, but that seems very inefficient.
I could create a new table matching each row in b to all values of
b.bar and use that to join but that would be inefficient and very
redundant.

Rewriting the query to say
select * from table a, table b where foobar_predicate(a.foo, b.bar)
would work (foobar_predicate checks if a.foo is in foobar(b.bar). But
it does not allow to use an index on a.foo

If I knew the maximum length of foobar(b.bar) I could say
select * from table a, table b where a.foo in (foobar(b.bar,0), foobar
(b.bar,1), ..., foobar(b.bar,n))
where the second parameter to foobar chooses which element to return.
This is clearly not optimal.

Am I missing some obvious elegant way to do this or is it just not
possible given that the SQL IN statement does not really deal with
lists in the python sense of the word?

Thanks

Search Discussions

  • Peter Otten at Oct 16, 2009 at 5:04 pm

    Felix wrote:

    I am using the Python SQLite3 interface, but the question is probably
    general to python and SQL.

    I want to run a query like

    select * from table a, table b where a.foo IN foobar(b.bar)

    where foobar is a user function (registered by create_function in
    pysqlite3) returning a list of integers. However such functions can
    only return basic data types so the above is invalid. I am wondering
    what the best way around this is.

    I could fetch rows from table b, compute foobar(b.bar) and create a
    new query for each result, but that seems very inefficient.
    I could create a new table matching each row in b to all values of
    b.bar and use that to join but that would be inefficient and very
    redundant.

    Rewriting the query to say
    select * from table a, table b where foobar_predicate(a.foo, b.bar)
    would work (foobar_predicate checks if a.foo is in foobar(b.bar). But
    it does not allow to use an index on a.foo

    If I knew the maximum length of foobar(b.bar) I could say
    select * from table a, table b where a.foo in (foobar(b.bar,0), foobar
    (b.bar,1), ..., foobar(b.bar,n))
    where the second parameter to foobar chooses which element to return.
    This is clearly not optimal.

    Am I missing some obvious elegant way to do this or is it just not
    possible given that the SQL IN statement does not really deal with
    lists in the python sense of the word?
    Define a function foobar_contains() as follows:

    def foobar_contains(foo, bar):
    return foo in foobar(bar)

    and change the query to

    select * from table a, table b where foobar_contains(a.foo, b.bar)

    Peter
  • Felix at Oct 16, 2009 at 5:15 pm

    Rewriting the query to say
    select * from table a, table b where foobar_predicate(a.foo, b.bar)
    would work (foobar_predicate checks if a.foo is in foobar(b.bar). But
    it does not allow to use an index on a.foo
    Define a function foobar_contains() as follows:

    def foobar_contains(foo, bar):
    ? ? return foo in foobar(bar)

    and change the query to

    select * from table a, table b where foobar_contains(a.foo, b.bar)
    I thought about that (see above), but it would not use an index on
    a.foo which a regular a.foo IN (x,y,z) does.

    Felix
  • Peter Otten at Oct 16, 2009 at 5:27 pm

    Felix wrote:

    Define a function foobar_contains() as follows:

    def foobar_contains(foo, bar):
    return foo in foobar(bar)

    and change the query to

    select * from table a, table b where foobar_contains(a.foo, b.bar)
    I thought about that (see above), but it would not use an index on
    a.foo which a regular a.foo IN (x,y,z) does.
    Sorry for not reading your post carefully.

    Peter
  • Pjcoup at Oct 17, 2009 at 2:40 am
    On Oct 16, 9:50?am, Felix wrote:
    [snip]
    I could create a new table matching each row in b to all values of
    b.bar and use that to join but that would be inefficient and very
    redundant.
    [snip]

    Is foobar(b.bar) essentially static? (I'm guessing so if you
    considered this as an option).
    If so, then this actually sounds like the best option to me.
    Indexing on the foobar return values, this new table and joining
    as described has got to be faster than the other alternatives
    (even the foobar(b.bar,k) options).
    Without this new table, it seems you have to calculate foobar(b.bar)
    for every row of b (even if it is SQLite doing it, and not you
    directly), unless I'm missing something. I'm assuming that the
    overhead to store these function values won't kill you.
    Good luck!

    Pete
  • Lawrence D'Oliveiro at Oct 27, 2009 at 10:48 am
    In message <c601fad6-8126-4f43-
    b768-62ad6e7ecca1 at r5g2000yqb.googlegroups.com>, Felix wrote:
    I want to run a query like

    select * from table a, table b where a.foo IN foobar(b.bar)

    where foobar is a user function (registered by create_function in
    pysqlite3) returning a list of integers. However such functions can
    only return basic data types so the above is invalid. I am wondering
    what the best way around this is.
    Is it feasible to iterate over the entire inner join of both tables?

    for entry in iterator("select a.foo, b.bar ... from a, b") :
    if foobarp(entry["a.foo"], entry["b.bar"]) :
    .... this is a combination I want ...

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedOct 16, '09 at 4:50p
activeOct 27, '09 at 10:48a
posts6
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase