Grokbase Groups Hive dev January 2012
FAQ
Hive multi group by single reducer optimization causes invalid column reference error
-------------------------------------------------------------------------------------

Key: HIVE-2750
URL: https://issues.apache.org/jira/browse/HIVE-2750
Project: Hive
Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

E.g.
FROM src
INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Kevin Wilfong (Updated) (JIRA) at Jan 26, 2012 at 2:44 am
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Kevin Wilfong updated HIVE-2750:
    --------------------------------

    Status: Patch Available (was: Open)
    Hive multi group by single reducer optimization causes invalid column reference error
    -------------------------------------------------------------------------------------

    Key: HIVE-2750
    URL: https://issues.apache.org/jira/browse/HIVE-2750
    Project: Hive
    Issue Type: Bug
    Reporter: Kevin Wilfong
    Assignee: Kevin Wilfong
    Attachments: HIVE-2750.D1455.1.patch


    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
    This results in an invalid column reference error on src.value
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Phabricator (Updated) (JIRA) at Jan 26, 2012 at 2:44 am
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Phabricator updated HIVE-2750:
    ------------------------------

    Attachment: HIVE-2750.D1455.1.patch

    kevinwilfong requested code review of "HIVE-2750 [jira] Hive multi group by single reducer optimization causes invalid column reference error".
    Reviewers: JIRA

    When generating the list of value columns for the reduce sink operator, in the case of multiple group bys occurring in the same reducer, only the columns used by the first query block was being considered, due to a typo. This patch fixes this typo, and adds a testcase to ensure the error does not reoccur.

    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

    This results in an invalid column reference error on src.value

    TEST PLAN
    EMPTY

    REVISION DETAIL
    https://reviews.facebook.net/D1455

    AFFECTED FILES
    ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
    ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
    ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

    MANAGE HERALD DIFFERENTIAL RULES
    https://reviews.facebook.net/herald/view/differential/

    WHY DID I GET THIS EMAIL?
    https://reviews.facebook.net/herald/transcript/3015/

    Tip: use the X-Herald-Rules header to filter Herald messages in your client.

    Hive multi group by single reducer optimization causes invalid column reference error
    -------------------------------------------------------------------------------------

    Key: HIVE-2750
    URL: https://issues.apache.org/jira/browse/HIVE-2750
    Project: Hive
    Issue Type: Bug
    Reporter: Kevin Wilfong
    Assignee: Kevin Wilfong
    Attachments: HIVE-2750.D1455.1.patch


    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
    This results in an invalid column reference error on src.value
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Namit Jain (Commented) (JIRA) at Jan 26, 2012 at 5:22 am
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193555#comment-13193555 ]

    Namit Jain commented on HIVE-2750:
    ----------------------------------

    +1
    Hive multi group by single reducer optimization causes invalid column reference error
    -------------------------------------------------------------------------------------

    Key: HIVE-2750
    URL: https://issues.apache.org/jira/browse/HIVE-2750
    Project: Hive
    Issue Type: Bug
    Reporter: Kevin Wilfong
    Assignee: Kevin Wilfong
    Attachments: HIVE-2750.D1455.1.patch


    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
    This results in an invalid column reference error on src.value
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Hudson (Commented) (JIRA) at Jan 26, 2012 at 4:35 pm
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193949#comment-13193949 ]

    Hudson commented on HIVE-2750:
    ------------------------------

    Integrated in Hive-trunk-h0.21 #1222 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1222/])
    HIVE-2750 Hive multi group by single reducer optimization causes invalid column
    reference error (Kevin Wilfong via namit)

    namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150
    Files :
    * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
    * /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
    * /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out

    Hive multi group by single reducer optimization causes invalid column reference error
    -------------------------------------------------------------------------------------

    Key: HIVE-2750
    URL: https://issues.apache.org/jira/browse/HIVE-2750
    Project: Hive
    Issue Type: Bug
    Reporter: Kevin Wilfong
    Assignee: Kevin Wilfong
    Attachments: HIVE-2750.D1455.1.patch


    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
    This results in an invalid column reference error on src.value
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Amareshwari Sriramadasu (Updated) (JIRA) at Jan 30, 2012 at 9:53 am
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amareshwari Sriramadasu updated HIVE-2750:
    ------------------------------------------

    Resolution: Fixed
    Fix Version/s: 0.9.0
    Status: Resolved (was: Patch Available)

    Seems the issue missed resolution. Resolving.

    Hive multi group by single reducer optimization causes invalid column reference error
    -------------------------------------------------------------------------------------

    Key: HIVE-2750
    URL: https://issues.apache.org/jira/browse/HIVE-2750
    Project: Hive
    Issue Type: Bug
    Reporter: Kevin Wilfong
    Assignee: Kevin Wilfong
    Fix For: 0.9.0

    Attachments: HIVE-2750.D1455.1.patch


    After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
    E.g.
    FROM src
    INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
    INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
    This results in an invalid column reference error on src.value
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedJan 26, '12 at 2:10a
activeJan 30, '12 at 9:53a
posts6
users1
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase