FAQ
This works fine...
COPY pgbench_accounts TO '/tmp/acc' BINARY;

This new format does not
COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
ERROR: syntax error at or near "BINARY" at character 47

which looks like I've mistyped something. Until you realise that this
statement gives a completely different error message.

COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname);
ERROR: COPY format "anyname" not recognized

and we also note that there are no examples in the docs, nor
regression tests to cover this situation.

So I conclude that this hasn't ever worked since it was introduced in 9.0.

The cause is that there is an overlap between the old and the new COPY
syntax, relating to the word BINARY. It's the grammar generating the
error, not post parse analysis.

My attempts to fix that look pretty ugly, so I'm not even going to
post them. I can stop the error on binary by causing errors on csv and
text, obviously not a fix. Any grammar based fix looks like it would
restrict the list of formats, which breaks the orginal intention of
the syntax change.

I'm advised that

COPY pgbench_accounts FROM '/tmp/acc' (FORMAT 'BINARY');

works fine. Though that is not documented and I doubt anyone much uses that.

My conclusion is that we should *not* fix this bug, but just alter the
manual slightly to explain what the correct usage is (use quotes
around 'binary'). Reason for that suggestion is that nobody ever
reported this bug, so either few people use binary mode or they use
the old syntax. Of course, that is not a normal suggestion, so feel
free to walk up and down my spine with boots on for suggesting it.

Thoughts?

--
  Simon Riggs http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

  • Heikki Linnakangas at May 26, 2013 at 3:35 pm

    On 26.05.2013 04:31, Simon Riggs wrote:
    This works fine...
    COPY pgbench_accounts TO '/tmp/acc' BINARY;

    This new format does not
    COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
    ERROR: syntax error at or near "BINARY" at character 47

    which looks like I've mistyped something. Until you realise that this
    statement gives a completely different error message.

    COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname);
    ERROR: COPY format "anyname" not recognized

    and we also note that there are no examples in the docs, nor
    regression tests to cover this situation.

    So I conclude that this hasn't ever worked since it was introduced in 9.0.

    The cause is that there is an overlap between the old and the new COPY
    syntax, relating to the word BINARY. It's the grammar generating the
    error, not post parse analysis.
    Hmm, the problem is that BINARY is a type_func_keyword, so it doesn't
    match the ColId rule used to capture the format argument.
    My attempts to fix that look pretty ugly, so I'm not even going to
    post them. I can stop the error on binary by causing errors on csv and
    text, obviously not a fix. Any grammar based fix looks like it would
    restrict the list of formats, which breaks the orginal intention of
    the syntax change.
    This seems to work:

    --- a/src/backend/parser/gram.y
    +++ b/src/backend/parser/gram.y
    @@ -2528,3 +2528,7 @@ copy_generic_opt_elem:
           {
            $$ = makeDefElem($1, $2);
           }
    + | ColLabel BINARY
    + {
    + $$ = makeDefElem($1, (Node *) makeString("binary"));
    + }


    Am I missing something?

    - Heikki
  • Tom Lane at May 26, 2013 at 4:11 pm

    Heikki Linnakangas writes:
    On 26.05.2013 04:31, Simon Riggs wrote:
    This new format does not [work:]
    COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
    ERROR: syntax error at or near "BINARY" at character 47
    This seems to work:
    --- a/src/backend/parser/gram.y
    +++ b/src/backend/parser/gram.y
    @@ -2528,3 +2528,7 @@ copy_generic_opt_elem:
    {
    $$ = makeDefElem($1, $2);
    }
    + | ColLabel BINARY
    + {
    + $$ = makeDefElem($1, (Node *) makeString("binary"));
    + }
    That only fixes things for the specific case of FORMAT BINARY. I think
    it would be better to generalize ColId_or_Sconst. This one-liner works,
    but it's pretty ugly:

    *** gram.y~ Sun May 26 11:58:42 2013
    --- gram.y Sun May 26 12:00:03 2013
    *************** opt_encoding:
    *** 1548,1553 ****
    --- 1548,1554 ----

       ColId_or_Sconst:
          ColId { $$ = $1; }
    + | type_func_name_keyword { $$ = pstrdup($1); }
    Sconst { $$ = $1; }
         ;

    More readable would be to invent an intermediate nonterminal falling
    between ColId and ColLabel, whose expansion would be "IDENT |
    unreserved_keyword | col_name_keyword | type_func_name_keyword", and
    then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
    Any thoughts about a name for that new nonterminal?

    BTW, I tried just replacing ColId with ColLabel here, and that *almost*
    works, but there are some places where we can see either ColId_or_Sconst
    or DEFAULT. I don't know of any sane way to express "all reserved
    keywords except DEFAULT" to bison, so the best we can realistically do
    is to accept all not-fully-reserved keywords in these places.

        regards, tom lane
  • Simon Riggs at May 27, 2013 at 10:37 am

    On 26 May 2013 17:10, Tom Lane wrote:
    Heikki Linnakangas <hlinnakangas@vmware.com> writes:
    More readable would be to invent an intermediate nonterminal falling
    between ColId and ColLabel, whose expansion would be "IDENT |
    unreserved_keyword | col_name_keyword | type_func_name_keyword", and
    then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
    Any thoughts about a name for that new nonterminal?
    Do you think complicating the parser in that way is worth the trouble
    for this case? Could that slow down parsing?

    We don't actually have to fix it; clearly not too many people are
    bothered, since no complaints in 3 years. Documenting 'binary' seems
    better.

    --
      Simon Riggs http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Tom Lane at May 27, 2013 at 2:31 pm

    Simon Riggs writes:
    On 26 May 2013 17:10, Tom Lane wrote:
    More readable would be to invent an intermediate nonterminal falling
    between ColId and ColLabel, whose expansion would be "IDENT |
    unreserved_keyword | col_name_keyword | type_func_name_keyword", and
    then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
    Any thoughts about a name for that new nonterminal?
    Do you think complicating the parser in that way is worth the trouble
    for this case? Could that slow down parsing?
    It makes the grammar tables a bit larger (1% or so IIRC). There would
    be some distributed penalty for that, but probably not much. Of course
    there's always the slippery-slope argument about that.
    We don't actually have to fix it; clearly not too many people are
    bothered, since no complaints in 3 years. Documenting 'binary' seems
    better.
    Well, my thought is there are other cases. For instance:

    regression=# create role binary;
    ERROR: syntax error at or near "binary"
    LINE 1: create role binary;
                         ^
    regression=# create user cross;
    ERROR: syntax error at or near "cross"
    LINE 1: create user cross;
                         ^

    If we don't have to treat type_func_name_keywords as reserved in these
    situations, shouldn't we avoid doing so?

        regards, tom lane
  • Robert Haas at May 28, 2013 at 2:06 pm

    On Mon, May 27, 2013 at 10:31 AM, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On 26 May 2013 17:10, Tom Lane wrote:
    More readable would be to invent an intermediate nonterminal falling
    between ColId and ColLabel, whose expansion would be "IDENT |
    unreserved_keyword | col_name_keyword | type_func_name_keyword", and
    then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
    Any thoughts about a name for that new nonterminal?
    Do you think complicating the parser in that way is worth the trouble
    for this case? Could that slow down parsing?
    It makes the grammar tables a bit larger (1% or so IIRC). There would
    be some distributed penalty for that, but probably not much. Of course
    there's always the slippery-slope argument about that.
    We don't actually have to fix it; clearly not too many people are
    bothered, since no complaints in 3 years. Documenting 'binary' seems
    better.
    Well, my thought is there are other cases. For instance:

    regression=# create role binary;
    ERROR: syntax error at or near "binary"
    LINE 1: create role binary;
    ^
    regression=# create user cross;
    ERROR: syntax error at or near "cross"
    LINE 1: create user cross;
    ^

    If we don't have to treat type_func_name_keywords as reserved in these
    situations, shouldn't we avoid doing so?
    I am almost always in favor of making more things less reserved, so +1 from me.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Jun 1, 2013 at 6:36 pm

    On 27 May 2013 15:31, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On 26 May 2013 17:10, Tom Lane wrote:
    More readable would be to invent an intermediate nonterminal falling
    between ColId and ColLabel, whose expansion would be "IDENT |
    unreserved_keyword | col_name_keyword | type_func_name_keyword", and
    then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
    Any thoughts about a name for that new nonterminal?
    Do you think complicating the parser in that way is worth the trouble
    for this case? Could that slow down parsing?
    It makes the grammar tables a bit larger (1% or so IIRC). There would
    be some distributed penalty for that, but probably not much. Of course
    there's always the slippery-slope argument about that.
    We don't actually have to fix it; clearly not too many people are
    bothered, since no complaints in 3 years. Documenting 'binary' seems
    better.
    Well, my thought is there are other cases. For instance:

    regression=# create role binary;
    ERROR: syntax error at or near "binary"
    LINE 1: create role binary;
    ^
    regression=# create user cross;
    ERROR: syntax error at or near "cross"
    LINE 1: create user cross;
    ^

    If we don't have to treat type_func_name_keywords as reserved in these
    situations, shouldn't we avoid doing so?

    Seems reasonable argument, so +1. Sorry for delayed reply.

    --
      Simon Riggs http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Simon Riggs at May 27, 2013 at 10:31 am

    On 26 May 2013 16:35, Heikki Linnakangas wrote:

    My attempts to fix that look pretty ugly, so I'm not even going to
    post them. I can stop the error on binary by causing errors on csv and
    text, obviously not a fix. Any grammar based fix looks like it would
    restrict the list of formats, which breaks the orginal intention of
    the syntax change.

    This seems to work:
    This was almost exactly the fix I described above that only fixes that
    specific case and then breaks others.
    --- a/src/backend/parser/gram.y
    +++ b/src/backend/parser/gram.y
    @@ -2528,3 +2528,7 @@ copy_generic_opt_elem:
    {
    $$ = makeDefElem($1, $2);
    }
    + | ColLabel BINARY
    + {
    + $$ = makeDefElem($1, (Node *)
    makeString("binary"));
    + }
    So, no that doesn't work.

    --
      Simon Riggs http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 26, '13 at 8:32a
activeJun 1, '13 at 6:36p
posts8
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase