FAQ
Hello all,

Very newto Hive (haven't even installed it yet!), but I had a use case that
I didn't see demonstrated in any of the tutorial/documentation that I've
read thus far.

Let's say that I have apache logs that I want to process with Hadoop/Hive.
Of course there may be different types of log records all tying back to the
same user or IP address or other log attribute. Is there a way to submit a
SINGLE Hive query to get back results that may look like:


IP Action1Count Action2Count Action3Count

.. where the different actions correspond to different log events for that
IP address.

Do I have to submit 3 different Hive queries here or can I submit a single
Hive query? In a regular Java-based map/reduce job, I would have written a
custom Writable that would record counts for each of the different actions,
and submit it to the reducer using output.collect(IP, customWritable). Here
I wouldn't have to submit multiple map/reduce jobs, just 1.

Thanks
Ryan

Search Discussions

  • Zheng Shao at Oct 10, 2009 at 6:49 pm
    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2', 1,
    0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng
    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case that
    I didn't see demonstrated in any of the tutorial/documentation that I've
    read thus far.

    Let's say that I have apache logs that I want to process with Hadoop/Hive.
    Of course there may be different types of log records all tying back to the
    same user or IP address or other log attribute. Is there a way to submit a
    SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for that
    IP address.

    Do I have to submit 3 different Hive queries here or can I submit a single
    Hive query? In a regular Java-based map/reduce job, I would have written a
    custom Writable that would record counts for each of the different actions,
    and submit it to the reducer using output.collect(IP, customWritable). Here
    I wouldn't have to submit multiple map/reduce jobs, just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 11, 2009 at 12:15 am
    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following error
    when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan
    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2', 1,
    0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with Hadoop/Hive.
    Of course there may be different types of log records all tying back to the
    same user or IP address or other log attribute. Is there a way to submit a
    SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for that
    IP address.

    Do I have to submit 3 different Hive queries here or can I submit a single
    Hive query? In a regular Java-based map/reduce job, I would have written a
    custom Writable that would record counts for each of the different actions,
    and submit it to the reducer using output.collect(IP, customWritable). Here
    I wouldn't have to submit multiple map/reduce jobs, just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng
  • Zheng Shao at Oct 11, 2009 at 12:43 am
    Try modify bin/hive and print out the last line in that file.
    It should display some classpaths stuff, make sure those classpaths are
    valid.

    Zheng
    On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following error
    when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 11, 2009 at 12:50 am
    I printed out the classpath environment variables that I saw in the file,
    and the paths were valid... hmmm... is there something else I could try?
    On Sat, Oct 10, 2009 at 8:41 PM, Zheng Shao wrote:

    Try modify bin/hive and print out the last line in that file.
    It should display some classpaths stuff, make sure those classpaths are
    valid.

    Zheng

    On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following error
    when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 11, 2009 at 1:12 am
    I was able to get this working -- just needed to adjust classpaths. Thanks!

    Ryan

    On Sat, Oct 10, 2009 at 8:50 PM, Ryan LeCompte wrote:

    I printed out the classpath environment variables that I saw in the file,
    and the paths were valid... hmmm... is there something else I could try?

    On Sat, Oct 10, 2009 at 8:41 PM, Zheng Shao wrote:

    Try modify bin/hive and print out the last line in that file.
    It should display some classpaths stuff, make sure those classpaths are
    valid.

    Zheng

    On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following
    error when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 11, 2009 at 1:30 am
    Ah, this time I'm running into a different issue.

    So I've created my Hive table and I'm now at the point where I want to load
    data into it from HDFS. However, I get the following error on the load data
    command:

    Loading data to table actions
    Failed with exception Wrong file format. Please check the file's format.
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask


    Any ideas how to get more info on what's wrong? The file is a SequenceFile.

    On Sat, Oct 10, 2009 at 9:10 PM, Ryan LeCompte wrote:

    I was able to get this working -- just needed to adjust classpaths. Thanks!

    Ryan


    On Sat, Oct 10, 2009 at 8:50 PM, Ryan LeCompte wrote:

    I printed out the classpath environment variables that I saw in the file,
    and the paths were valid... hmmm... is there something else I could try?

    On Sat, Oct 10, 2009 at 8:41 PM, Zheng Shao wrote:

    Try modify bin/hive and print out the last line in that file.
    It should display some classpaths stuff, make sure those classpaths are
    valid.

    Zheng

    On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following
    error when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action =
    'action2', 1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 11, 2009 at 3:36 am
    Again, this is now working.

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 9:30 PM, Ryan LeCompte wrote:

    Ah, this time I'm running into a different issue.

    So I've created my Hive table and I'm now at the point where I want to load
    data into it from HDFS. However, I get the following error on the load data
    command:

    Loading data to table actions
    Failed with exception Wrong file format. Please check the file's format.
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask


    Any ideas how to get more info on what's wrong? The file is a SequenceFile.


    On Sat, Oct 10, 2009 at 9:10 PM, Ryan LeCompte wrote:

    I was able to get this working -- just needed to adjust classpaths.
    Thanks!

    Ryan


    On Sat, Oct 10, 2009 at 8:50 PM, Ryan LeCompte wrote:

    I printed out the classpath environment variables that I saw in the file,
    and the paths were valid... hmmm... is there something else I could try?

    On Sat, Oct 10, 2009 at 8:41 PM, Zheng Shao wrote:

    Try modify bin/hive and print out the last line in that file.
    It should display some classpaths stuff, make sure those classpaths are
    valid.

    Zheng

    On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following
    error when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action =
    'action2', 1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use
    case that I didn't see demonstrated in any of the tutorial/documentation
    that I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Schubert Zhang at Oct 19, 2009 at 6:00 pm
    Hi Ryan,

    How about the classpath issue?
    I have the same problem.
    On Sun, Oct 11, 2009 at 8:14 AM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following error
    when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng
  • Ryan LeCompte at Oct 19, 2009 at 6:05 pm
    Hello,

    I solved this by making sure that the Hive libraries in /build/dist/lib were
    specified in my hadoop/conf/hadoop-env.sh config file CLASSPATH. This way
    they were available when starting up the hive CLI tool.

    Thanks,
    Ryan

    On Mon, Oct 19, 2009 at 2:00 PM, Schubert Zhang wrote:

    Hi Ryan,

    How about the classpath issue?
    I have the same problem.
    On Sun, Oct 11, 2009 at 8:14 AM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following error
    when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng
  • Schubert Zhang at Oct 19, 2009 at 6:10 pm
    Yes, I just find this. Thanks.
    In my hadoop-env.sh, I defined my HADOOP_CLASSPATH.
    Now, I define my HADOOP_CLASSPATH as following:
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:my defined classpaths....

    On Tue, Oct 20, 2009 at 2:05 AM, Ryan LeCompte wrote:

    Hello,

    I solved this by making sure that the Hive libraries in /build/dist/lib
    were specified in my hadoop/conf/hadoop-env.sh config file CLASSPATH. This
    way they were available when starting up the hive CLI tool.

    Thanks,
    Ryan


    On Mon, Oct 19, 2009 at 2:00 PM, Schubert Zhang wrote:

    Hi Ryan,

    How about the classpath issue?
    I have the same problem.
    On Sun, Oct 11, 2009 at 8:14 AM, Ryan LeCompte wrote:

    Thank you!

    Very helpful.

    Another problem:

    I am trying to install Hive 0.4, and I'm coming across the following
    error when I try to start bin/hive after building:


    java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 7 more


    Any ideas?

    Thanks,
    Ryan

    On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao wrote:

    Yes, we can do this:

    SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2',
    1, 0)), SUM(IF(action = 'action3', 1, 0))
    FROM mytable
    GROUP BY ip;

    For more details on IF, please refer to:
    http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if

    Zheng

    On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote:

    Hello all,

    Very newto Hive (haven't even installed it yet!), but I had a use case
    that I didn't see demonstrated in any of the tutorial/documentation that
    I've read thus far.

    Let's say that I have apache logs that I want to process with
    Hadoop/Hive. Of course there may be different types of log records all tying
    back to the same user or IP address or other log attribute. Is there a way
    to submit a SINGLE Hive query to get back results that may look like:


    IP Action1Count Action2Count Action3Count

    .. where the different actions correspond to different log events for
    that IP address.

    Do I have to submit 3 different Hive queries here or can I submit a
    single Hive query? In a regular Java-based map/reduce job, I would have
    written a custom Writable that would record counts for each of the different
    actions, and submit it to the reducer using output.collect(IP,
    customWritable). Here I wouldn't have to submit multiple map/reduce jobs,
    just 1.

    Thanks
    Ryan

    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedOct 10, '09 at 6:43p
activeOct 19, '09 at 6:10p
posts11
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase