Grokbase Groups Hive user July 2010
FAQ
Hi,

I have a question about the JOIN operation in Hive.

For example, I have a query, like

select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1);

Clearly, there is a JOIN involved in the statement.
1. tmp2 and tmp7 are two tables.
2. c2 and c1 are columns belonging to tmp7 and tmp2 respectively.

I found that this query is executed in Hive with a MapReduce Job.
Therefore, I am wondering if tmp2 and tmp7 are both assumed to share the
same InputFormat class.

What if tmp2 and tmp7 are using different InputFormat classes to read
records?


Thanks,

WS

Search Discussions

  • Namit Jain at Jul 1, 2010 at 5:00 pm
    That's fine
    The 2 tables can have different inputformats

    Sent from my iPhone
    On Jul 1, 2010, at 9:51 AM, "yan qi" wrote:

    Hi,

    I have a question about the JOIN operation in Hive.

    For example, I have a query, like

    select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1);

    Clearly, there is a JOIN involved in the statement.
    1. tmp2 and tmp7 are two tables.
    2. c2 and c1 are columns belonging to tmp7 and tmp2 respectively.

    I found that this query is executed in Hive with a MapReduce Job.
    Therefore, I am wondering if tmp2 and tmp7 are both assumed to share
    the same InputFormat class.

    What if tmp2 and tmp7 are using different InputFormat classes to
    read records?


    Thanks,

    WS
  • Yan qi at Jul 1, 2010 at 5:17 pm
    sHi, Namit,

    Thanks a lot for your reply!

    I checked the source code. Given a query, (select tmp7.* from tmp7 join
    tmp2 on (tmp7.c2 = tmp2.c1)), there is only a MapReduce job generated. As
    far as I know, the function setInputFormat would be used to set the job's
    InputFormat class, in the ExecDriver.java.

    Then I didn't see any chance to set two different InputFormat classes in
    one job. Or did I miss something here?

    Thanks,

    On Thu, Jul 1, 2010 at 10:00 AM, Namit Jain wrote:

    That's fine
    The 2 tables can have different inputformats

    Sent from my iPhone
    On Jul 1, 2010, at 9:51 AM, "yan qi" wrote:

    Hi,

    I have a question about the JOIN operation in Hive.

    For example, I have a query, like

    select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1);

    Clearly, there is a JOIN involved in the statement.
    1. tmp2 and tmp7 are two tables.
    2. c2 and c1 are columns belonging to tmp7 and tmp2 respectively.

    I found that this query is executed in Hive with a MapReduce Job.
    Therefore, I am wondering if tmp2 and tmp7 are both assumed to share
    the same InputFormat class.

    What if tmp2 and tmp7 are using different InputFormat classes to
    read records?


    Thanks,

    WS
  • John Sichi at Jul 1, 2010 at 8:23 pm
    Take a look at [Combine]HiveInputFormat; they are what we wrap around your input formats in order to allow Hive to access data from multiple input formats in the same job.

    JVS

    On Jul 1, 2010, at 10:16 AM, yan qi wrote:

    sHi, Namit,

    Thanks a lot for your reply!

    I checked the source code. Given a query, (select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1)), there is only a MapReduce job generated. As far as I know, the function setInputFormat would be used to set the job's InputFormat class, in the ExecDriver.java.

    Then I didn't see any chance to set two different InputFormat classes in one job. Or did I miss something here?

    Thanks,


    On Thu, Jul 1, 2010 at 10:00 AM, Namit Jain wrote:
    That's fine
    The 2 tables can have different inputformats

    Sent from my iPhone
    On Jul 1, 2010, at 9:51 AM, "yan qi" wrote:

    Hi,

    I have a question about the JOIN operation in Hive.

    For example, I have a query, like

    select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1);

    Clearly, there is a JOIN involved in the statement.
    1. tmp2 and tmp7 are two tables.
    2. c2 and c1 are columns belonging to tmp7 and tmp2 respectively.

    I found that this query is executed in Hive with a MapReduce Job.
    Therefore, I am wondering if tmp2 and tmp7 are both assumed to share
    the same InputFormat class.

    What if tmp2 and tmp7 are using different InputFormat classes to
    read records?


    Thanks,

    WS

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 1, '10 at 4:51p
activeJul 1, '10 at 8:23p
posts4
users3
websitehive.apache.org

3 users in discussion

Yan qi: 2 posts Namit Jain: 1 post John Sichi: 1 post

People

Translate

site design / logo © 2021 Grokbase