Grokbase Groups Hive user July 2011
FAQ
Hi,
I have a table, which has close to a billion rows.. I am trying to
create an index for the table, when I do the alter command, I always end up
with map-reduce jobs with errors. The same runs fine for small tables
though, I also notice that the number of reducers are set to 24, even if set
manually to 1 and also the reduce percentage changes in a bizarre way, it
increases, then decreases and finally reaches 100 with a message job ended
with errors. It would be useful, if I can get help on this..

Thanks,
Siddharth

Search Discussions

  • Siddharth Ramanan at Jul 28, 2011 at 8:38 pm
    Hi,
    I am adding the log information for a reduce task. I am running hadoop
    in standalone mode.

    2011-07-28 19:16:42,621 ERROR
    org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC
    connection to jdbc:derby:;databaseName=TempStatsStore;create=true.
    java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at
    org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:55)
    at
    org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:781)
    at
    org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:649)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    2011-07-28 19:16:42,622 ERROR
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error:
    cannot connect to database
    2011-07-28 19:16:42,622 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:126
    2011-07-28 19:16:42,622 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    5 Close done
    2011-07-28 19:16:42,622 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
    4 Close done
    2011-07-28 19:16:42,625 INFO org.apache.hadoop.mapred.TaskRunner:
    Task:attempt_201107271749_0029_r_000019_0 is done. And is in the process of
    commiting
    2011-07-28 19:16:42,627 INFO org.apache.hadoop.mapred.TaskRunner: Task
    'attempt_201107271749_0029_r_000019_0' done.

    On 28 July 2011 16:19, Siddharth Ramanan wrote:

    Hi,
    I have a table, which has close to a billion rows.. I am trying to
    create an index for the table, when I do the alter command, I always end up
    with map-reduce jobs with errors. The same runs fine for small tables
    though, I also notice that the number of reducers are set to 24, even if set
    manually to 1 and also the reduce percentage changes in a bizarre way, it
    increases, then decreases and finally reaches 100 with a message job ended
    with errors. It would be useful, if I can get help on this..

    Thanks,
    Siddharth
  • Siddharth Ramanan at Aug 1, 2011 at 7:56 pm
    The reduce percentage keeps fluctuating when, the alter index command is
    being keyed. The logs just give " out of memory error " after tweaking some
    properties, the earlier exceptions doesn't appear now. Can anyone guide me
    here? I have increased the heap space upto 4gb.. still, getting the same
    exception..
    On 28 July 2011 16:37, Siddharth Ramanan wrote:

    Hi,
    I am adding the log information for a reduce task. I am running hadoop
    in standalone mode.

    2011-07-28 19:16:42,621 ERROR
    org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC
    connection to jdbc:derby:;databaseName=TempStatsStore;create=true.
    java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at
    org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:55)
    at
    org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:781)
    at
    org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:649)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    2011-07-28 19:16:42,622 ERROR
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error:
    cannot connect to database
    2011-07-28 19:16:42,622 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:126
    2011-07-28 19:16:42,622 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    5 Close done
    2011-07-28 19:16:42,622 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 4 Close done
    2011-07-28 19:16:42,625 INFO org.apache.hadoop.mapred.TaskRunner:
    Task:attempt_201107271749_0029_r_000019_0 is done. And is in the process of
    commiting
    2011-07-28 19:16:42,627 INFO org.apache.hadoop.mapred.TaskRunner: Task
    'attempt_201107271749_0029_r_000019_0' done.

    On 28 July 2011 16:19, Siddharth Ramanan wrote:

    Hi,
    I have a table, which has close to a billion rows.. I am trying to
    create an index for the table, when I do the alter command, I always end up
    with map-reduce jobs with errors. The same runs fine for small tables
    though, I also notice that the number of reducers are set to 24, even if set
    manually to 1 and also the reduce percentage changes in a bizarre way, it
    increases, then decreases and finally reaches 100 with a message job ended
    with errors. It would be useful, if I can get help on this..

    Thanks,
    Siddharth
  • Siddharth Ramanan at Aug 3, 2011 at 5:52 pm
    Hi all,
    I have used compact index for my table and the response time is
    same for a query with as well as without index now. Previously, it was
    showing improvement. I just changed some parameters to increase heap size
    and then it is behaving weird. so, how can I make sure that my query is
    using index?

    Thanks,
    Siddharth
  • Shouguo Li at Aug 5, 2011 at 10:02 pm
    on a side note, i'm looking at adding indexes to our hive tables as well, is
    there a performance/space trade off comparison or metrics?

    thx!
    On Wed, Aug 3, 2011 at 10:52 AM, Siddharth Ramanan wrote:

    Hi all,
    I have used compact index for my table and the response time is
    same for a query with as well as without index now. Previously, it was
    showing improvement. I just changed some parameters to increase heap size
    and then it is behaving weird. so, how can I make sure that my query is
    using index?

    Thanks,
    Siddharth

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 28, '11 at 8:20p
activeAug 5, '11 at 10:02p
posts5
users2
websitehive.apache.org

2 users in discussion

Siddharth Ramanan: 4 posts Shouguo Li: 1 post

People

Translate

site design / logo © 2022 Grokbase