Grokbase Groups Hive user May 2009
FAQ
I have some mappers already coded in Java. So I want to use it
as much as possible in Hive environment.
Then, how can I call a Java mapper to "select transform" in Hive?
For example, what is wrong with the query below and why?

INSERT OVERWRITE TABLE u_data_new
SELECT
TRANSFORM (userid, movieid, rating, unixtime)
USING 'java WeekdayMapper'
AS (userid, movieid, rating, weekday)
FROM u_data;

Thank you.


Regards,
Manhee

Search Discussions

  • Zheng Shao at May 25, 2009 at 3:10 am
    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and Writable
    value.

    Zheng

    2009/5/24 Manhee Jo <jo@nttdocomo.com>
    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    * USING 'java WeekdayMapper'
    * AS (userid, movieid, rating, weekday)
    FROM u_data;
    Thank you.


    Regards,
    Manhee


    --
    Yours,
    Zheng
  • Manhee Jo at May 25, 2009 at 5:50 am
    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost same
    thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable class. It
    receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee

    ----- Original Message -----
    From: Zheng Shao
    To: hive-user@hadoop.apache.org
    Sent: Monday, May 25, 2009 10:28 AM
    Subject: Re: Hive and Hadoop streaming


    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and
    Writable value.

    Zheng


    2009/5/24 Manhee Jo <jo@nttdocomo.com>

    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    USING 'java WeekdayMapper'
    AS (userid, movieid, rating, weekday)
    FROM u_data;

    Thank you.


    Regards,
    Manhee



    --
    Yours,
    Zheng
  • Raghu Murthy at May 25, 2009 at 7:32 am
    You might want to provide the jar file which has your class in the class
    path in the USING clause. Something like:

    USING 'java -cp /path/to/your/jar WeekdayMapper'

    On 5/24/09 10:50 PM, "Manhee Jo" wrote:

    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost same
    thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable class. It
    receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee
    ----- Original Message -----

    From: Zheng Shao >
    To: hive-user@hadoop.apache.org

    Sent: Monday, May 25, 2009 10:28 AM

    Subject: Re: Hive and Hadoop streaming


    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and Writable
    value.

    Zheng


    2009/5/24 Manhee Jo <jo@nttdocomo.com>

    I have some mappers already coded in Java. So I want to use it

    as much as possible in Hive environment.

    Then, how can I call a Java mapper to "select transform" in Hive?

    For example, what is wrong with the query below and why?



    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    USING 'java WeekdayMapper'
    AS (userid, movieid, rating, weekday)
    FROM u_data;

    Thank you.





    Regards,

    Manhee


    --
    Yours,
    Zheng
  • Zheng Shao at May 25, 2009 at 7:33 am
    In this case, you just need to compile your .java into a jar file, and do
    *
    add jar* fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3, col4)
    USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2, outcol3,
    outcol4)"

    Let us know if it works out or not.

    Zheng
    On Sun, May 24, 2009 at 10:50 PM, Manhee Jo wrote:

    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost same
    thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable class.
    It receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee


    ----- Original Message -----
    *From:* Zheng Shao <zshao9@gmail.com>
    *To:* hive-user@hadoop.apache.org
    *Sent:* Monday, May 25, 2009 10:28 AM
    *Subject:* Re: Hive and Hadoop streaming

    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and Writable
    value.

    Zheng

    2009/5/24 Manhee Jo <jo@nttdocomo.com>
    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    * USING 'java WeekdayMapper'
    * AS (userid, movieid, rating, weekday)
    FROM u_data;
    Thank you.


    Regards,
    Manhee


    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Manhee Jo at May 25, 2009 at 7:59 am
    Thank you so much!!!

    ----- Original Message -----
    From: Zheng Shao
    To: hive-user@hadoop.apache.org
    Sent: Monday, May 25, 2009 4:33 PM
    Subject: Re: Hive and Hadoop streaming


    In this case, you just need to compile your .java into a jar file, and do

    add jar fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3,
    col4) USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2,
    outcol3, outcol4)"

    Let us know if it works out or not.

    Zheng


    On Sun, May 24, 2009 at 10:50 PM, Manhee Jo wrote:

    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost
    same thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable
    class. It receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee

    ----- Original Message -----
    From: Zheng Shao
    To: hive-user@hadoop.apache.org
    Sent: Monday, May 25, 2009 10:28 AM
    Subject: Re: Hive and Hadoop streaming


    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and
    Writable value.

    Zheng


    2009/5/24 Manhee Jo <jo@nttdocomo.com>

    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    USING 'java WeekdayMapper'
    AS (userid, movieid, rating, weekday)
    FROM u_data;

    Thank you.


    Regards,
    Manhee



    --
    Yours,
    Zheng




    --
    Yours,
    Zheng
  • Min Zhou at May 25, 2009 at 8:11 am
    Hey Zheng,

    I don't think hive support 'add jar' command right now, cauz code on this
    issue hasnot been committed yet.
    check it out at:
    https://issues.apache.org/jira/browse/HIVE-338
    On Mon, May 25, 2009 at 3:59 PM, Manhee Jo wrote:

    Thank you so much!!!


    ----- Original Message -----
    *From:* Zheng Shao <zshao9@gmail.com>
    *To:* hive-user@hadoop.apache.org
    *Sent:* Monday, May 25, 2009 4:33 PM
    *Subject:* Re: Hive and Hadoop streaming

    In this case, you just need to compile your .java into a jar file, and do
    *
    add jar* fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3,
    col4) USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2,
    outcol3, outcol4)"

    Let us know if it works out or not.

    Zheng
    On Sun, May 24, 2009 at 10:50 PM, Manhee Jo wrote:

    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost same
    thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable class.
    It receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee


    ----- Original Message -----
    *From:* Zheng Shao <zshao9@gmail.com>
    *To:* hive-user@hadoop.apache.org
    *Sent:* Monday, May 25, 2009 10:28 AM
    *Subject:* Re: Hive and Hadoop streaming

    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and
    Writable value.

    Zheng

    2009/5/24 Manhee Jo <jo@nttdocomo.com>
    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    * USING 'java WeekdayMapper'
    * AS (userid, movieid, rating, weekday)
    FROM u_data;
    Thank you.


    Regards,
    Manhee


    --
    Yours,
    Zheng

    --
    Yours,
    Zheng

    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Ashish Thusoo at May 26, 2009 at 9:54 pm
    Actually

    add file

    is the correct command.

    Ashish

    ________________________________
    From: Min Zhou
    Sent: Monday, May 25, 2009 1:11 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive and Hadoop streaming

    Hey Zheng,

    I don't think hive support 'add jar' command right now, cauz code on this issue hasnot been committed yet.
    check it out at:
    https://issues.apache.org/jira/browse/HIVE-338

    On Mon, May 25, 2009 at 3:59 PM, Manhee Jo wrote:
    Thank you so much!!!

    ----- Original Message -----
    From: Zheng Shao
    To: hive-user@hadoop.apache.org
    Sent: Monday, May 25, 2009 4:33 PM
    Subject: Re: Hive and Hadoop streaming

    In this case, you just need to compile your .java into a jar file, and do

    add jar fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3, col4) USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2, outcol3, outcol4)"

    Let us know if it works out or not.

    Zheng

    On Sun, May 24, 2009 at 10:50 PM, Manhee Jo wrote:
    Thank you Zheng,
    Here is my WeekdayMapper.java, which is just a test that does almost same thing as the "weekday_mapper.py" does.
    As you see below, it does not take WritableComparable nor Writable class. It receives the 4 columns just string
    arguments. Any advice would be very appreciated.

    /**
    * WeekdayMapper.java
    */

    import java.io.*;
    import java.util.*;

    class WeekdayMapper {
    public static void main (String[] args) throws IOException {
    Scanner stdIn = new Scanner(System.in);
    String line=null;
    String[] column;
    long unixTime;
    Date d;
    GregorianCalendar cal1 = new GregorianCalendar();

    while (stdIn.hasNext()) {
    line = stdIn.nextLine();
    column = line.split("\t");
    unixTime = Long.parseLong(column[3]);
    d = new Date(unixTime*1000);
    cal1.setTime(d);
    int dow = cal1.get(Calendar.DAY_OF_WEEK);
    System.out.println(column[0] + "\t" + column[1] + "\t"
    + column[2] + "\t" + dow);
    }
    }
    }

    Thanks,
    Manhee

    ----- Original Message -----
    From: Zheng Shao
    To: hive-user@hadoop.apache.org
    Sent: Monday, May 25, 2009 10:28 AM
    Subject: Re: Hive and Hadoop streaming

    How do your java map function receive the 4 columns?
    I assume your java map function takes a WritableComparable key and Writable value.

    Zheng

    2009/5/24 Manhee Jo <jo@nttdocomo.com
    I have some mappers already coded in Java. So I want to use it
    as much as possible in Hive environment.
    Then, how can I call a Java mapper to "select transform" in Hive?
    For example, what is wrong with the query below and why?

    INSERT OVERWRITE TABLE u_data_new
    SELECT
    TRANSFORM (userid, movieid, rating, unixtime)
    USING 'java WeekdayMapper'
    AS (userid, movieid, rating, weekday)
    FROM u_data;
    Thank you.


    Regards,
    Manhee



    --
    Yours,
    Zheng



    --
    Yours,
    Zheng



    --
    My research interests are distributed systems, parallel computing and bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 25, '09 at 1:11a
activeMay 26, '09 at 9:54p
posts8
users5
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase