Grokbase
x

the IN clause saga

View TopicPrint | Flat  Thread  Threaded | Page 1 of 2: 1 2 > >>
1) Oliver Jowett Some of the threads on this are getting a bit bogged down, I thought I'd summarize the viable...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Some of the threads on this are getting a bit bogged down, I thought I'd
summarize the viable options I've seen so far (well, from my point of view
anyway)

setObject() currently allows the user to bypass parameter escaping via
setObject(n, string, Types.NUMERIC) (and variants on that). This needs to be
plugged as it's a potential security hole.

However the same functionality lets you do the (nonstandard) trick of
providing an IN clause to a PreparedStatement like "SELECT * FROM table
WHERE pk IN ?". It'd be good to still allow this functionality somehow after
setObject is fixed. This is going to be a postgresql-specific extension
however we do it.

Here are the permutations I can remember:

Option 1: add a method to PGStatement that explicitly sets an IN clause,
  taking either a java.sql.Array, java.util.Collection + component type,
  array + component type, or a custom postgresql object

  + there's no confusion as to what it means
  + using a custom object allows access via setObject(..., Types.OTHER) 
    consistently, as well as via the extension method.
  - java.sql.Array and java.util.Collection have problems as PGStatement is
    compiled for all JDKs and JDBC versions and those types may not be present
    (we could do a PGJDBC2Statement or something, but that's getting messy)
  - have to downcast to a PGStatement to use it

Option 2: make setArray() expand to an IN clause when the parameter follows " IN".

  + no new methods or types needed
  - setArray() behaves differently depending on query context
  - user has to wrap the underlying array in a java.sql.Array

Option 3: make setObject(n, Collection [, type]) expand to an IN clause.

  + no new methods or types needed
- must assume that the contents of the collection use the default type mapping
    if a type is not provided
  - if a type is provided and we apply it to the *components* of the
    collection, this breaks the general getObject() interface of "bind this
    object interpreting it as this particular type".
  - not obvious what to do with objects that are both Collections and some
    other SQL-relevant type; solutions make setObject's behaviour complex
    and/or query-context-dependent

Option 4: as 3, but use java arrays (native arrays, not java.sql.Array) instead of
   java.util.Collection

  + as 3, but the ambiguity of "object is both Collection and SQL type X"
    goes away.

Option 5: don't provide an extension at all i.e. do away with setting IN clauses
  in this way.

  + no issues with server-side prepare
  - obviously, you can't set IN clauses in one go any more.

1-4 all need to disable server-side prepare when used.

Did I miss anything? My personal order of preference is 1-2-4-5-3. I have a
partial implementation of 2 written but it's easy to adapt that to whatever
external interface.

setArray() needs fixing regardless of what happens here. I hope to have a
patch for that ready later today.

-O
2) Felipe Schnack I also prefer number one. Maybe we should do a poll? :-) > (send "unregister YourEmailAddressHere"...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
  I also prefer number one.
Maybe we should do a poll? :-)


On Tue, 22 Jul 2003 16:11:19 +1200
Oliver Jowett <oliver@opencloud.com> wrote:

> Some of the threads on this are getting a bit bogged down, I thought I'd
> summarize the viable options I've seen so far (well, from my point of view
> anyway)
>
> setObject() currently allows the user to bypass parameter escaping via
> setObject(n, string, Types.NUMERIC) (and variants on that). This needs to be
> plugged as it's a potential security hole.
>
> However the same functionality lets you do the (nonstandard) trick of
> providing an IN clause to a PreparedStatement like "SELECT * FROM table
> WHERE pk IN ?". It'd be good to still allow this functionality somehow after
> setObject is fixed. This is going to be a postgresql-specific extension
> however we do it.
>
> Here are the permutations I can remember:
>
> Option 1: add a method to PGStatement that explicitly sets an IN clause,
> taking either a java.sql.Array, java.util.Collection + component type,
> array + component type, or a custom postgresql object
>
>   + there's no confusion as to what it means
> + using a custom object allows access via setObject(..., Types.OTHER)
> consistently, as well as via the extension method.
> - java.sql.Array and java.util.Collection have problems as PGStatement is
> compiled for all JDKs and JDBC versions and those types may not be present
> (we could do a PGJDBC2Statement or something, but that's getting messy)
>   - have to downcast to a PGStatement to use it
>
> Option 2: make setArray() expand to an IN clause when the parameter follows " IN".
>
>   + no new methods or types needed
> - setArray() behaves differently depending on query context
> - user has to wrap the underlying array in a java.sql.Array
>
> Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
>
>   + no new methods or types needed
> - must assume that the contents of the collection use the default type mapping
>     if a type is not provided
> - if a type is provided and we apply it to the *components* of the
> collection, this breaks the general getObject() interface of "bind this
> object interpreting it as this particular type".
> - not obvious what to do with objects that are both Collections and some
> other SQL-relevant type; solutions make setObject's behaviour complex
>     and/or query-context-dependent
>
> Option 4: as 3, but use java arrays (native arrays, not java.sql.Array) instead of
>    java.util.Collection
>
> + as 3, but the ambiguity of "object is both Collection and SQL type X"
>     goes away.
>
> Option 5: don't provide an extension at all i.e. do away with setting IN clauses
>   in this way.
>
>   + no issues with server-side prepare
> - obviously, you can't set IN clauses in one go any more.
>
> 1-4 all need to disable server-side prepare when used.
>
> Did I miss anything? My personal order of preference is 1-2-4-5-3. I have a
> partial implementation of 2 written but it's easy to adapt that to whatever
> external interface.
>
> setArray() needs fixing regardless of what happens here. I hope to have a
> patch for that ready later today.
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])


--

/~\ The ASCII Felipe Schnack ([email protected: fe...@ritterdosreis.br])
 \ / Ribbon Campaign  Analista de Sistemas
  X  Against HTML     Cel.: 51-91287530
 / \ Email!           Linux Counter #281893

Centro Universitário Ritter dos Reis
http://www.ritterdosreis.br
[email protected: r...@ritterdosreis.br]
Fone: 51-32303341
3) Fernando Nasser Thanks for summarizing it Oliver. I've asked Tom Lane about the backend behavior and he informed me...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Thanks for summarizing it Oliver.

I've asked Tom Lane about the backend behavior and he informed me that:

1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
$3) (i.e., our (?, ?, ?) syntax).

2) 7.4 backends have a PostgreSQL specific extension that allows you to
fill the IN predicate with a list: ($1)  (i.e., our (?) ).  One has to 
pass a PostgreSQL array, like integer[] to fill the list.  Note that the 
parenthesis is already in place, it is not generated by the ? expansion.

The feature 2 in 7.4 backends is of limited use as the planner does not
know about the list, so the generated plan will not be as good as if you
pass the list with fixed values since the beginning.  But an improvement 
for this can be attempted for 7.5.

Regards,
Fernando





Oliver Jowett wrote:
> Some of the threads on this are getting a bit bogged down, I thought I'd
> summarize the viable options I've seen so far (well, from my point of view
> anyway)
>
> setObject() currently allows the user to bypass parameter escaping via
> setObject(n, string, Types.NUMERIC) (and variants on that). This needs to be
> plugged as it's a potential security hole.
>
> However the same functionality lets you do the (nonstandard) trick of
> providing an IN clause to a PreparedStatement like "SELECT * FROM table
> WHERE pk IN ?". It'd be good to still allow this functionality somehow after
> setObject is fixed. This is going to be a postgresql-specific extension
> however we do it.
>
> Here are the permutations I can remember:
>
> Option 1: add a method to PGStatement that explicitly sets an IN clause,
> taking either a java.sql.Array, java.util.Collection + component type,
> array + component type, or a custom postgresql object
>
>   + there's no confusion as to what it means
> + using a custom object allows access via setObject(..., Types.OTHER)
> consistently, as well as via the extension method.
> - java.sql.Array and java.util.Collection have problems as PGStatement is
> compiled for all JDKs and JDBC versions and those types may not be present
> (we could do a PGJDBC2Statement or something, but that's getting messy)
>   - have to downcast to a PGStatement to use it
>
> Option 2: make setArray() expand to an IN clause when the parameter follows " IN".
>
>   + no new methods or types needed
> - setArray() behaves differently depending on query context
> - user has to wrap the underlying array in a java.sql.Array
>
> Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
>
>   + no new methods or types needed
> - must assume that the contents of the collection use the default type mapping
>     if a type is not provided
> - if a type is provided and we apply it to the *components* of the
> collection, this breaks the general getObject() interface of "bind this
> object interpreting it as this particular type".
> - not obvious what to do with objects that are both Collections and some
> other SQL-relevant type; solutions make setObject's behaviour complex
>     and/or query-context-dependent
>
> Option 4: as 3, but use java arrays (native arrays, not java.sql.Array) instead of
>    java.util.Collection
>
> + as 3, but the ambiguity of "object is both Collection and SQL type X"
>     goes away.
>
> Option 5: don't provide an extension at all i.e. do away with setting IN clauses
>   in this way.
>
>   + no issues with server-side prepare
> - obviously, you can't set IN clauses in one go any more.
>
> 1-4 all need to disable server-side prepare when used.
>
> Did I miss anything? My personal order of preference is 1-2-4-5-3. I have a
> partial implementation of 2 written but it's easy to adapt that to whatever
> external interface.
>
> setArray() needs fixing regardless of what happens here. I hope to have a
> patch for that ready later today.
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])
>


--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
4) Dmitry Tkach it doesn't (at least, not in the current implementation) - Types.OTHER ends up calling setString(),...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
>
>
>Here are the permutations I can remember:
>
>Option 1: add a method to PGStatement that explicitly sets an IN clause,
> taking either a java.sql.Array, java.util.Collection + component type,
> array + component type, or a custom postgresql object
>
>  + there's no confusion as to what it means
> + using a custom object allows access via setObject(..., Types.OTHER)
> consistently, as well as via the extension method.
>

it doesn't (at least, not in the current implementation) - Types.OTHER
ends up calling setString(), that makes it useless for IN parameters

> - java.sql.Array and java.util.Collection have problems as PGStatement is
> compiled for all JDKs and JDBC versions and those types may not be present
> (we could do a PGJDBC2Statement or something, but that's getting messy)
>
you could declare it to take Object, I suppose (that would be the only
way anyway if you wanted to support arrays of primitive types anyway)

>  - have to downcast to a PGStatement to use it
>
>  
>
>
>Option 2: make setArray() expand to an IN clause when the parameter follows " IN".
>
>  + no new methods or types needed
> - setArray() behaves differently depending on query context
> - user has to wrap the underlying array in a java.sql.Array
>
>Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
>
>  + no new methods or types needed
> - must assume that the contents of the collection use the default type mapping
>    if a type is not provided
>
You can require the type to be provided.

> - if a type is provided and we apply it to the *components* of the
> collection, this breaks the general getObject() interface of "bind this
> object interpreting it as this particular type".
> - not obvious what to do with objects that are both Collections and some
> other SQL-relevant type; solutions make setObject's behaviour complex
>    and/or query-context-dependent
>  
>

>Option 4: as 3, but use java arrays (native arrays, not java.sql.Array) instead of
>   java.util.Collection
>
> + as 3, but the ambiguity of "object is both Collection and SQL type X"
>    goes away.
>
>Option 5: don't provide an extension at all i.e. do away with setting IN clauses
>  in this way.
>
>  + no issues with server-side prepare
> - obviously, you can't set IN clauses in one go any more.
>
>1-4 all need to disable server-side prepare when used.
>
>Did I miss anything? My personal order of preference is 1-2-4-5-3.
>
For what it's worth, mine is 3-4-1,2,5  (commas meaning that the last 
three seem equally useless).

Dima


> I have a
>partial implementation of 2 written but it's easy to adapt that to whatever
>external interface.
>
>setArray() needs fixing regardless of what happens here. I hope to have a
>patch for that ready later today.
>
>-O
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])
>  
>
5) Fernando Nasser Oliver has to update his summary first. There are some new info from the backend side. Note that...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Felipe Schnack wrote:
>   I also prefer number one.
>   Maybe we should do a poll? :-)
>

Oliver has to update his summary first.  There are some new info from 
the backend side.

Note that option 2 now should read "when inside the parenthesis that
define an <in value list> of the IN <predicate>. (I am using the SQL
standard clause names here).

I would go with number 2 because that is exactly what the backend does
in its PREPARE statement.

Cheers,
Fernando




> On Tue, 22 Jul 2003 16:11:19 +1200
> Oliver Jowett <oliver@opencloud.com> wrote:
>
>
>>Some of the threads on this are getting a bit bogged down, I thought I'd
>>summarize the viable options I've seen so far (well, from my point of view
>>anyway)
>>
>>setObject() currently allows the user to bypass parameter escaping via
>>setObject(n, string, Types.NUMERIC) (and variants on that). This needs to be
>>plugged as it's a potential security hole.
>>
>>However the same functionality lets you do the (nonstandard) trick of
>>providing an IN clause to a PreparedStatement like "SELECT * FROM table
>>WHERE pk IN ?". It'd be good to still allow this functionality somehow after
>>setObject is fixed. This is going to be a postgresql-specific extension
>>however we do it.
>>
>>Here are the permutations I can remember:
>>
>>Option 1: add a method to PGStatement that explicitly sets an IN clause,
>> taking either a java.sql.Array, java.util.Collection + component type,
>> array + component type, or a custom postgresql object
>>
>>  + there's no confusion as to what it means
>> + using a custom object allows access via setObject(..., Types.OTHER)
>> consistently, as well as via the extension method.
>> - java.sql.Array and java.util.Collection have problems as PGStatement is
>> compiled for all JDKs and JDBC versions and those types may not be present
>> (we could do a PGJDBC2Statement or something, but that's getting messy)
>> - have to downcast to a PGStatement to use it
>>
>>Option 2: make setArray() expand to an IN clause when the parameter follows " IN".
>>
>>  + no new methods or types needed
>> - setArray() behaves differently depending on query context
>> - user has to wrap the underlying array in a java.sql.Array
>>
>>Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
>>
>>  + no new methods or types needed
>> - must assume that the contents of the collection use the default type mapping
>>    if a type is not provided
>> - if a type is provided and we apply it to the *components* of the
>> collection, this breaks the general getObject() interface of "bind this
>> object interpreting it as this particular type".
>> - not obvious what to do with objects that are both Collections and some
>> other SQL-relevant type; solutions make setObject's behaviour complex
>>    and/or query-context-dependent
>>
>>Option 4: as 3, but use java arrays (native arrays, not java.sql.Array) instead of
>>   java.util.Collection
>>
>> + as 3, but the ambiguity of "object is both Collection and SQL type X"
>>    goes away.
>>
>>Option 5: don't provide an extension at all i.e. do away with setting IN clauses
>>  in this way.
>>
>>  + no issues with server-side prepare
>> - obviously, you can't set IN clauses in one go any more.
>>
>>1-4 all need to disable server-side prepare when used.
>>
>>Did I miss anything? My personal order of preference is 1-2-4-5-3. I have a
>>partial implementation of 2 written but it's easy to adapt that to whatever
>>external interface.
>>
>>setArray() needs fixing regardless of what happens here. I hope to have a
>>patch for that ready later today.
>>
>>-O
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])
>
>
>


--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
6) Dmitry Tkach This is the same problem, as it generally exists with x=? - the query plan is generally not as good...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Fernando Nasser wrote:

> Thanks for summarizing it Oliver.
>
> I've asked Tom Lane about the backend behavior and he informed me that:
>
> 1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
> $3) (i.e., our (?, ?, ?) syntax).
>
> 2) 7.4 backends have a PostgreSQL specific extension that allows you
> to fill the IN predicate with a list: ($1) (i.e., our (?) ). One has
> to pass a PostgreSQL array, like integer[] to fill the list. Note
> that the parenthesis is already in place, it is not generated by the ?
> expansion.
>
> The feature 2 in 7.4 backends is of limited use as the planner does
> not know about the list, so the generated plan will not be as good as
> if you pass the list with fixed values since the beginning.

This is the same problem, as it generally exists with x=? - the query
plan is generally not as good as x=1, because the planner doesn't know
the value to use with statistics.

Are you saying that #2 only works with integers? Or can you give it any
array?

Dima
7) Dmitry Tkach If I read this correctly, there is no need for any special handling from the driver side - just...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Fernando Nasser wrote:

> Thanks for summarizing it Oliver.
>
> I've asked Tom Lane about the backend behavior and he informed me that:
>
> 1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
> $3) (i.e., our (?, ?, ?) syntax).
>
> 2) 7.4 backends have a PostgreSQL specific extension that allows you
> to fill the IN predicate with a list: ($1) (i.e., our (?) ). One has
> to pass a PostgreSQL array, like integer[] to fill the list. Note
> that the parenthesis is already in place, it is not generated by the ?
> expansion.

If I read this correctly, there is no need for any special handling
from  the driver side - just setArray() should work. Or am I missing 
something?

Dima
8) Oliver Jowett Well, certainly, we'd need to change setObject to understand this new type. Good point. Hmm, so...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, Jul 22, 2003 at 10:27:17AM -0400, Dmitry Tkach wrote:
> >
> >
> >Here are the permutations I can remember:
> >
> >Option 1: add a method to PGStatement that explicitly sets an IN clause,
> > taking either a java.sql.Array, java.util.Collection + component type,
> > array + component type, or a custom postgresql object
> >
> > + there's no confusion as to what it means
> > + using a custom object allows access via setObject(..., Types.OTHER)
> > consistently, as well as via the extension method.
> >
> it doesn't (at least, not in the current implementation) - Types.OTHER
> ends up calling setString(), that makes it useless for IN parameters

Well, certainly, we'd need to change setObject to understand this new type.

> > - java.sql.Array and java.util.Collection have problems as PGStatement is
> > compiled for all JDKs and JDBC versions and those types may not be
> >   present
> > (we could do a PGJDBC2Statement or something, but that's getting messy)
> >
> you could declare it to take Object, I suppose (that would be the only
> way anyway if you wanted to support arrays of primitive types anyway)

Good point.

> >Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
> >
> > + no new methods or types needed
> > - must assume that the contents of the collection use the default type
> > mapping
> >   if a type is not provided
> >
> You can require the type to be provided.

Hmm, so what does setObject with no type do in that case? Also see the next
point.

> > - if a type is provided and we apply it to the *components* of the
> > collection, this breaks the general getObject() interface of "bind this
> > object interpreting it as this particular type".

-O
9) Fernando Nasser With the new V3 protocol this is probably true (7.4 will support V3).
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Dmitry Tkach wrote:
> Fernando Nasser wrote:
>
>> Thanks for summarizing it Oliver.
>>
>> I've asked Tom Lane about the backend behavior and he informed me that:
>>
>> 1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
>> $3) (i.e., our (?, ?, ?) syntax).
>>
>> 2) 7.4 backends have a PostgreSQL specific extension that allows you
>> to fill the IN predicate with a list: ($1) (i.e., our (?) ). One has
>> to pass a PostgreSQL array, like integer[] to fill the list. Note
>> that the parenthesis is already in place, it is not generated by the ?
>> expansion.
>
>
> If I read this correctly, there is no need for any special handling
> from the driver side - just setArray() should work. Or am I missing
> something?
>

With the new V3 protocol this is probably true (7.4 will support V3).

--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
10) Dmitry Tkach It will throw an exception - "Unrecognized parameter type: " + Object.getClass().getName () Well......
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
>
>
>>>Option 3: make setObject(n, Collection [, type]) expand to an IN clause.
>>>
>>>+ no new methods or types needed
>>>- must assume that the contents of the collection use the default type
>>>mapping
>>>  if a type is not provided
>>>
>>>      
>>>
>>You can require the type to be provided.
>>    
>>
>
>Hmm, so what does setObject with no type do in that case? Also see the next
>point.
>  
>

It will throw an exception - "Unrecognized parameter type: " +
Object.getClass().getName ()

>  
>
>>>- if a type is provided and we apply it to the *components* of the
>>> collection, this breaks the general getObject() interface of "bind this
>>> object interpreting it as this particular type".
>>>      
>>>
Well... this "general interface" is *by implication* only. It is not
defined this way in the spec, it is not documented to always work this way.
So, you just *assume*, that this is the general interface... It doesn't
have to be like that... Certainly not at the cost of valuable
functionality...

Dima.

P.S. Actually, in light of that previous message about 7.4 support for
arrays in the in clause, this whole discussion seems to be moot :-)
It seems to me that just setArray () should then work, without any
special handling by the driver... Isn't it the case?
11) Oliver Jowett I assume this is only when you're doing a PREPARE/EXECUTE? Hm, then it sounds like the right...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, Jul 22, 2003 at 09:05:45AM -0400, Fernando Nasser wrote:
> Thanks for summarizing it Oliver.
>
> I've asked Tom Lane about the backend behavior and he informed me that:
>
> 1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
> $3) (i.e., our (?, ?, ?) syntax).
>
> 2) 7.4 backends have a PostgreSQL specific extension that allows you to
> fill the IN predicate with a list: ($1) (i.e., our (?) ). One has to
> pass a PostgreSQL array, like integer[] to fill the list. Note that the
> parenthesis is already in place, it is not generated by the ? expansion.

I assume this is only when you're doing a PREPARE/EXECUTE?

> The feature 2 in 7.4 backends is of limited use as the planner does not
> know about the list, so the generated plan will not be as good as if you
> pass the list with fixed values since the beginning. But an improvement
> for this can be attempted for 7.5.

Hm, then it sounds like the right solution is to have setArray() expand as
the guts of an IN clause when the backend is <7.4 or server prepares are
off, and the parameter is in a query of the form "... IN (?)", and as a
normal array otherwise.

-O
12) Fernando Nasser yes. That is _exactly_ what I am proposing (option 2 of your summary)...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Oliver Jowett wrote:
> On Tue, Jul 22, 2003 at 09:05:45AM -0400, Fernando Nasser wrote:
>
>>Thanks for summarizing it Oliver.
>>
>>I've asked Tom Lane about the backend behavior and he informed me that:
>>
>>1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
>>$3) (i.e., our (?, ?, ?) syntax).
>>
>>2) 7.4 backends have a PostgreSQL specific extension that allows you to
>>fill the IN predicate with a list: ($1) (i.e., our (?) ). One has to
>>pass a PostgreSQL array, like integer[] to fill the list. Note that the
>>parenthesis is already in place, it is not generated by the ? expansion.
>
>
> I assume this is only when you're doing a PREPARE/EXECUTE?
>

yes.

>
>>The feature 2 in 7.4 backends is of limited use as the planner does not
>>know about the list, so the generated plan will not be as good as if you
>>pass the list with fixed values since the beginning. But an improvement
>>for this can be attempted for 7.5.
>
>
> Hm, then it sounds like the right solution is to have setArray() expand as
> the guts of an IN clause when the backend is <7.4 or server prepares are
> off, and the parameter is in a query of the form "... IN (?)", and as a
> normal array otherwise.
>

That is _exactly_ what I am proposing (option 2 of your summary)



--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
13) Darin Ohashi I'm not sure if this makes sense, but could you have a conflict between a set containing a single...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
> Oliver has to update his summary first. There are some new info from
> the backend side.
>
> Note that option 2 now should read "when inside the parenthesis that
> define an <in value list> of the IN <predicate>. (I am using the SQL
> standard clause names here).
>
> I would go with number 2 because that is exactly what the
> backend does
> in its PREPARE statement.

I'm not sure if this makes sense, but could you have a conflict between a set
containing a single element that is the array and a set containing the elements
of the array.

Darin
14) Tom Lane No, it's not the same thing --- the planner can generate an indexscan plan when scalar params are...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Dmitry Tkach <dmitry@openratings.com> writes:
> Fernando Nasser wrote:
>> The feature 2 in 7.4 backends is of limited use as the planner does
>> not know about the list, so the generated plan will not be as good as
>> if you pass the list with fixed values since the beginning.

> This is the same problem, as it generally exists with x=? - the query
> plan is generally not as good as x=1, because the planner doesn't know
> the value to use with statistics.

No, it's not the same thing --- the planner can generate an indexscan
plan when scalar params are involved, although it might choose not to.
The planner is simply not aware that any comparable optimization might
be possible when using the new array syntax.  Let me attach the example
I sent Fernando last night ...


Fernando Nasser <fnasser@redhat.com> writes:
> PREPARE tststmt (integer[]) AS SELECT * from testmetadata where id IN (?);
> PREPARE tststmt (integer, integer) AS SELECT * from testmetadata where id IN (?, ?);
> all give parsing errors.

The second case works fine from the command line:

regression=# prepare z(int,int) as select * from tenk1 where unique1 in ($1,$2);
PREPARE
regression=# execute z(42,66);
unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4
---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------
42 | 5530 | 0 | 2 | 2 | 2 | 42 | 42 | 42 | 42 | 42 | 84 | 85 | QBAAAA | SEIAAA | OOOOxx
66 | 6723 | 0 | 2 | 6 | 6 | 66 | 66 | 66 | 66 | 66 | 132 | 133 | OCAAAA | PYJAAA | VVVVxx
(2 rows)

Perhaps JDBC has some problem with it?

I would not expect the first case to work, since it violates the plain
meaning of IN.  But Joe Conway has implemented some non-SQL syntax to
handle that in 7.4:

regression=# prepare zz(int[]) as select * from tenk1 where unique1 = ANY ($1);
PREPARE
regression=# execute zz(ARRAY[42,66]);
unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4
---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------
42 | 5530 | 0 | 2 | 2 | 2 | 42 | 42 | 42 | 42 | 42 | 84 | 85 | QBAAAA | SEIAAA | OOOOxx
66 | 6723 | 0 | 2 | 6 | 6 | 66 | 66 | 66 | 66 | 66 | 132 | 133 | OCAAAA | PYJAAA | VVVVxx
(2 rows)

I should warn you though that this is not yet executed efficiently; the
planner has no idea about reducing it to a set of indexscans.  Compare

regression=# explain analyze execute z(42,66);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tenk1_unique1, tenk1_unique1 on tenk1 (cost=0.00..12.02 rows=2 width=244) (actual time=0.28..0.48 rows=2 loops=1)
   Index Cond: ((unique1 = $1) OR (unique1 = $2))
Total runtime: 1.35 msec
(3 rows)

regression=# explain analyze execute zz(ARRAY[42,66]);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..708.00 rows=5000 width=244) (actual time=70.03..126.16 rows=2 loops=1)
   Filter: (unique1 = ANY ($1))
Total runtime: 126.78 msec
(3 rows)


Perhaps we can make it work better in 7.5.

   regards, tom lane
15) Dmitry Tkach Oops :-( I see... that is a world of a differnce :-( Dima ...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Oops :-(

I see... that is a world of a differnce :-(

Dima


Tom Lane wrote:

>Dmitry Tkach <dmitry@openratings.com> writes:
>  
>
>>Fernando Nasser wrote:
>>    
>>
>>>The feature 2 in 7.4 backends is of limited use as the planner does
>>>not know about the list, so the generated plan will not be as good as
>>>if you pass the list with fixed values since the beginning.
>>>      
>>>
>
>  
>
>>This is the same problem, as it generally exists with x=? - the query
>>plan is generally not as good as x=1, because the planner doesn't know
>>the value to use with statistics.
>>    
>>
>
>No, it's not the same thing --- the planner can generate an indexscan
>plan when scalar params are involved, although it might choose not to.
>The planner is simply not aware that any comparable optimization might
>be possible when using the new array syntax. Let me attach the example
>I sent Fernando last night ...
>
>
>Fernando Nasser <fnasser@redhat.com> writes:
>  
>
>>PREPARE tststmt (integer[]) AS SELECT * from testmetadata where id IN (?);
>>PREPARE tststmt (integer, integer) AS SELECT * from testmetadata where id IN (?, ?);
>>all give parsing errors.
>>    
>>
>
>The second case works fine from the command line:
>
>regression=# prepare z(int,int) as select * from tenk1 where unique1 in ($1,$2);
>PREPARE
>regression=# execute z(42,66);
> unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4
>---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------
> 42 | 5530 | 0 | 2 | 2 | 2 | 42 | 42 | 42 | 42 | 42 | 84 | 85 | QBAAAA | SEIAAA | OOOOxx
> 66 | 6723 | 0 | 2 | 6 | 6 | 66 | 66 | 66 | 66 | 66 | 132 | 133 | OCAAAA | PYJAAA | VVVVxx
>(2 rows)
>
>Perhaps JDBC has some problem with it?
>
>I would not expect the first case to work, since it violates the plain
>meaning of IN. But Joe Conway has implemented some non-SQL syntax to
>handle that in 7.4:
>
>regression=# prepare zz(int[]) as select * from tenk1 where unique1 = ANY ($1);
>PREPARE
>regression=# execute zz(ARRAY[42,66]);
> unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4
>---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------
> 42 | 5530 | 0 | 2 | 2 | 2 | 42 | 42 | 42 | 42 | 42 | 84 | 85 | QBAAAA | SEIAAA | OOOOxx
> 66 | 6723 | 0 | 2 | 6 | 6 | 66 | 66 | 66 | 66 | 66 | 132 | 133 | OCAAAA | PYJAAA | VVVVxx
>(2 rows)
>
>I should warn you though that this is not yet executed efficiently; the
>planner has no idea about reducing it to a set of indexscans. Compare
>
>regression=# explain analyze execute z(42,66);
> QUERY PLAN
>-------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using tenk1_unique1, tenk1_unique1 on tenk1 (cost=0.00..12.02 rows=2 width=244) (actual time=0.28..0.48 rows=2 loops=1)
>   Index Cond: ((unique1 = $1) OR (unique1 = $2))
> Total runtime: 1.35 msec
>(3 rows)
>
>regression=# explain analyze execute zz(ARRAY[42,66]);
> QUERY PLAN
>-------------------------------------------------------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..708.00 rows=5000 width=244) (actual time=70.03..126.16 rows=2 loops=1)
>   Filter: (unique1 = ANY ($1))
> Total runtime: 126.78 msec
>(3 rows)
>
>
>Perhaps we can make it work better in 7.5.
>
>   regards, tom lane
>  
>
16) Fernando Nasser You will need to have an Array of Arrays in that case.
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Darin Ohashi wrote:
>>Oliver has to update his summary first. There are some new info from
>>the backend side.
>>
>>Note that option 2 now should read "when inside the parenthesis that
>>define an <in value list> of the IN <predicate>. (I am using the SQL
>>standard clause names here).
>>
>>I would go with number 2 because that is exactly what the
>>backend does
>>in its PREPARE statement.
>
>
> I'm not sure if this makes sense, but could you have a conflict between a set
> containing a single element that is the array and a set containing the elements
> of the array.
>

You will need to have an Array of Arrays in that case.


--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
17) Oliver Jowett Ouch. That syntax is going to be messy to transform into an IN clause for <7.4 backends. -O
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, Jul 22, 2003 at 10:51:51AM -0400, Tom Lane wrote:

> Fernando Nasser <fnasser@redhat.com> writes:
> > PREPARE tststmt (integer[]) AS SELECT * from testmetadata where id IN (?);
> > PREPARE tststmt (integer, integer) AS SELECT * from testmetadata where id IN (?, ?);
> > all give parsing errors.

> I would not expect the first case to work, since it violates the plain
> meaning of IN. But Joe Conway has implemented some non-SQL syntax to
> handle that in 7.4:
>
> regression=# prepare zz(int[]) as select * from tenk1 where unique1 = ANY ($1);
> PREPARE
> regression=# execute zz(ARRAY[42,66]);

Ouch. That syntax is going to be messy to transform into an IN clause for
<7.4 backends.

-O
18) Felipe Schnack > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Am I the only the only one who doesn't like the idea of the driver parsing SQL statements (to check if there is a IN clause)

On Tue, 22 Jul 2003 10:41:22 -0400
Fernando Nasser <fnasser@redhat.com> wrote:

> Oliver Jowett wrote:
> > On Tue, Jul 22, 2003 at 09:05:45AM -0400, Fernando Nasser wrote:
> >
> >>Thanks for summarizing it Oliver.
> >>
> >>I've asked Tom Lane about the backend behavior and he informed me that:
> >>
> >>1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
> >>$3) (i.e., our (?, ?, ?) syntax).
> >>
> >>2) 7.4 backends have a PostgreSQL specific extension that allows you to
> >>fill the IN predicate with a list: ($1) (i.e., our (?) ). One has to
> >>pass a PostgreSQL array, like integer[] to fill the list. Note that the
> >>parenthesis is already in place, it is not generated by the ? expansion.
> >
> >
> > I assume this is only when you're doing a PREPARE/EXECUTE?
> >
>
> yes.
>
> >
> >>The feature 2 in 7.4 backends is of limited use as the planner does not
> >>know about the list, so the generated plan will not be as good as if you
> >>pass the list with fixed values since the beginning. But an improvement
> >>for this can be attempted for 7.5.
> >
> >
> > Hm, then it sounds like the right solution is to have setArray() expand as
> > the guts of an IN clause when the backend is <7.4 or server prepares are
> > off, and the parameter is in a query of the form "... IN (?)", and as a
> > normal array otherwise.
> >
>
> That is _exactly_ what I am proposing (option 2 of your summary)
>
>
>
> --
> Fernando Nasser
> Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
> 2323 Yonge Street, Suite #300
> Toronto, Ontario   M4P 2C9
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])


--

/~\ The ASCII Felipe Schnack ([email protected: fe...@ritterdosreis.br])
 \ / Ribbon Campaign  Analista de Sistemas
  X  Against HTML     Cel.: 51-91287530
 / \ Email!           Linux Counter #281893

Centro Universitário Ritter dos Reis
http://www.ritterdosreis.br
[email protected: r...@ritterdosreis.br]
Fone: 51-32303341
19) Fernando Nasser Mind that this is only necessary for backward compatibility. With 7.4 and the V3 protocol you just...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Felipe Schnack wrote:
> Am I the only the only one who doesn't like the idea of the driver parsing SQL statements (to check if there is a IN clause)
>

Mind that this is only necessary for backward compatibility.  With 7.4 
and the V3 protocol you just send an array and the backend sorts it out.






> On Tue, 22 Jul 2003 10:41:22 -0400
> Fernando Nasser <fnasser@redhat.com> wrote:
>
>
>>Oliver Jowett wrote:
>>
>>>On Tue, Jul 22, 2003 at 09:05:45AM -0400, Fernando Nasser wrote:
>>>
>>>
>>>>Thanks for summarizing it Oliver.
>>>>
>>>>I've asked Tom Lane about the backend behavior and he informed me that:
>>>>
>>>>1) 7.4 backends do support parameters in the IN predicate, as ($1, $2,
>>>>$3) (i.e., our (?, ?, ?) syntax).
>>>>
>>>>2) 7.4 backends have a PostgreSQL specific extension that allows you to
>>>>fill the IN predicate with a list: ($1) (i.e., our (?) ). One has to
>>>>pass a PostgreSQL array, like integer[] to fill the list. Note that the
>>>>parenthesis is already in place, it is not generated by the ? expansion.
>>>
>>>
>>>I assume this is only when you're doing a PREPARE/EXECUTE?
>>>
>>
>>yes.
>>
>>
>>>>The feature 2 in 7.4 backends is of limited use as the planner does not
>>>>know about the list, so the generated plan will not be as good as if you
>>>>pass the list with fixed values since the beginning. But an improvement
>>>>for this can be attempted for 7.5.
>>>
>>>
>>>Hm, then it sounds like the right solution is to have setArray() expand as
>>>the guts of an IN clause when the backend is <7.4 or server prepares are
>>>off, and the parameter is in a query of the form "... IN (?)", and as a
>>>normal array otherwise.
>>>
>>
>>That is _exactly_ what I am proposing (option 2 of your summary)
>>
>>
>>
>>--
>>Fernando Nasser
>>Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
>>2323 Yonge Street, Suite #300
>>Toronto, Ontario   M4P 2C9
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to [email protected: majo...@postgresql.org])
>
>
>


--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
20) Fernando Nasser Remember we will already have to know that we are handling the <in values list> clause (i.e. it is...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Oliver Jowett wrote:
> On Tue, Jul 22, 2003 at 10:51:51AM -0400, Tom Lane wrote:
>
>
>>Fernando Nasser <fnasser@redhat.com> writes:
>>
>>>PREPARE tststmt (integer[]) AS SELECT * from testmetadata where id IN (?);
>>>PREPARE tststmt (integer, integer) AS SELECT * from testmetadata where id IN (?, ?);
>>>all give parsing errors.
>>
>
>>I would not expect the first case to work, since it violates the plain
>>meaning of IN. But Joe Conway has implemented some non-SQL syntax to
>>handle that in 7.4:
>>
>>regression=# prepare zz(int[]) as select * from tenk1 where unique1 = ANY ($1);
>>PREPARE
>>regression=# execute zz(ARRAY[42,66]);
>
>
> Ouch. That syntax is going to be messy to transform into an IN clause for
> <7.4 backends.
>

Remember we will already have to know that we are handling the <in
values list> clause (i.e. it is a " IN (?)'), so we can very well
special case the expansion of the array.

But it will be much better on 7.4 and V3, I agree.

--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: [email protected: fn...@redhat.com]
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9
spacer
View TopicPrint | Flat  Thread  Threaded | Page 1 of 2: 1 2 > >>
Home > Groups > PostgreSQL - JDBC > the IN clause saga (31 posts)