Grokbase Groups Hive user April 2009
FAQ
Hi - I'm having a problem with a query below. When I try to run any
aggregate function on a column from the sub-query, the job fails.
The queries and output messages are below.

Suggestions?

thanks in advance.

-- works: 2 map-reduces jobs.
select m.description, o_buyers.num as buyers
from clickstream_output o
join merchandise m on (o.merchandise_id = m.merchandise_id)
left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
as num from clickstream_output o1
where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

Successful output: PROD7362, 3

-- fails: 3 map-reduce jobs - 2nd reduce fails.
select m.description, o_buyers.num as buyers*, count(1) as total* -- sum
or max(o_buyers.num) and removing from group by also fails.
from clickstream_output o
join merchandise m on (o.merchandise_id = m.merchandise_id)
left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
as num from clickstream_output o1
where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
*group by m.description, o_buyers.num* limit 40 ;


java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)

Search Discussions

  • Amr Awadallah at Apr 22, 2009 at 10:41 pm
    in the group by, try this instead:

    *group by m.description, buyers* limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <=
    '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers*, count(1) as total* --
    sum or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <=
    '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
  • Matt Pestritto at Apr 22, 2009 at 10:44 pm
    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers
    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah wrote:


    in the group by, try this instead:

    *group by m.description, buyers* limit 40 ;

    Matt Pestritto wrote:

    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers*, count(1) as total* -- sum
    or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)


  • Matt Pestritto at Apr 22, 2009 at 10:47 pm
    Taking a look at the explain, the first stage of both queries are
    identical. The only differences are in the reduce in the 2nd stage.
    I'm not sure if this helps or not.
    Thanks.

    Working Reduce:
    Reduce Operator Tree:
    Join Operator
    condition map:
    Inner Join 0 to 1
    Left Outer Join0 to 2
    condition expressions:
    0 {VALUE.0} {VALUE.1}
    1 {VALUE.0} {VALUE.1}
    2 {VALUE.0} {VALUE.1}
    Filter Operator
    predicate:
    expr: ((1 >= '20090216') and (1 <= '20090217'))
    type: boolean
    Select Operator
    expressions:
    expr: 3
    type: string
    expr: 5
    type: bigint
    Limit
    File Output Operator
    compressed: false
    GlobalTableId: 0
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format:
    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

    Failed Reduce:
    Reduce Operator Tree:
    Join Operator
    condition map:
    Inner Join 0 to 1
    Left Outer Join0 to 2
    condition expressions:
    0 {VALUE.0} {VALUE.1}
    1 {VALUE.0} {VALUE.1}
    2 {VALUE.0} {VALUE.1}
    Filter Operator
    predicate:
    expr: ((1 >= '20090216') and (1 <= '20090217'))
    type: boolean
    Group By Operator
    aggregations:
    expr: count(1)
    keys:
    expr: 3
    type: string
    expr: 5
    type: bigint
    mode: hash
    File Output Operator
    compressed: false
    GlobalTableId: 0
    table:
    input format:
    org.apache.hadoop.mapred.SequenceFileInputFormat
    output format:
    org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
    name: binary_table


    On Wed, Apr 22, 2009 at 6:43 PM, Matt Pestritto wrote:

    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah wrote:


    in the group by, try this instead:

    *group by m.description, buyers* limit 40 ;

    Matt Pestritto wrote:

    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers*, count(1) as total* --
    sum or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)


  • Prasad Chakka at Apr 22, 2009 at 10:49 pm
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org>
    Subject: Re: Aggregrate Query Fails.

    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah wrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
  • Matt Pestritto at Apr 22, 2009 at 10:53 pm
    So the only change was to remove the column alias correct?
    Still no luck. Same result.
    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka wrote:

    That is strange... Does below also fail?

    select m.description,* o_buyers.num ,* count(1) as total

    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'

    group by m.description, o_buyers.num limit 40 ;


    ------------------------------
    *From: *Matt Pestritto <matt@pestritto.com>
    *Reply-To: *<hive-user@hadoop.apache.org>
    *Date: *Wed, 22 Apr 2009 15:43:40 -0700
    *To: *<hive-user@hadoop.apache.org>
    *Subject: *Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah wrote:



    in the group by, try this instead:

    *group by m.description, buyers* limit 40 ;

    Matt Pestritto wrote:

    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers*, count(1) as total* -- sum
    or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text
    at
    org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at
    org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)




  • Ashish Thusoo at Apr 23, 2009 at 12:12 am
    Can you do an explain <query> and send us the plan.

    Another thing that you may try is to put the entire subquery in the from clause and then do an aggregation on it..

    i.e.

    select ..., count(1)
    from (select ...
    from clickstream_output o join .....) t
    group by t.description, t.num limit 40;

    Also are you using the 0.3.0 release candidate?

    Ashish

    ________________________________________
    From: Matt Pestritto [matt@pestritto.com]
    Sent: Wednesday, April 22, 2009 3:53 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Aggregrate Query Fails.

    So the only change was to remove the column alias correct?
    Still no luck. Same result.

    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka wrote:
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
    Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Subject: Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.comwrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
  • Zheng Shao at Apr 23, 2009 at 1:35 am
    Hi Matt,

    There is a known bug with JOIN - all output columns from JOIN will become
    STRING.
    The stacktrace shows this is exactly because of that bug.

    The workaround right now is to replace all columns with CAST(xxx as STRING),
    if the column type is not a STRING.


    select m.description, o_buyers.num as buyers*, count(1) as total*
    from clickstream_output o
    join merchandise m on (CAST(o.merchandise_id AS STRING) =
    CAST(m.merchandise_id AS STRING))
    left outer join ( select CAST(o1.merchandise_id AS STRING) as
    merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
    clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
    o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) o_buyers
    on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
    STRING))
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.

    Zheng
    On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo wrote:

    Can you do an explain <query> and send us the plan.

    Another thing that you may try is to put the entire subquery in the from
    clause and then do an aggregation on it..

    i.e.

    select ..., count(1)
    from (select ...
    from clickstream_output o join .....) t
    group by t.description, t.num limit 40;

    Also are you using the 0.3.0 release candidate?

    Ashish

    ________________________________________
    From: Matt Pestritto [matt@pestritto.com]
    Sent: Wednesday, April 22, 2009 3:53 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Aggregrate Query Fails.

    So the only change was to remove the column alias correct?
    Still no luck. Same result.

    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka wrote:
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
    as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
    o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
    Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Subject: Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
    http://aaa@cloudera.com>> wrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
    as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
    o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum or
    max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
    as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
    o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text
    at
    org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at
    org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)





    --
    Yours,
    Zheng
  • Jeff Hammerbacher at Apr 23, 2009 at 3:22 am
    Hey Zheng,

    Thanks for the insight. Perhaps these sorts of quirks could be added to the
    documentation on the wiki?

    Thanks,
    Jeff
    On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao wrote:

    Hi Matt,

    There is a known bug with JOIN - all output columns from JOIN will become
    STRING.
    The stacktrace shows this is exactly because of that bug.

    The workaround right now is to replace all columns with CAST(xxx as
    STRING), if the column type is not a STRING.


    select m.description, o_buyers.num as buyers*, count(1) as total*
    from clickstream_output o
    join merchandise m on (CAST(o.merchandise_id AS STRING) =
    CAST(m.merchandise_id AS STRING))
    left outer join ( select CAST(o1.merchandise_id AS STRING) as
    merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
    clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
    o_buyers
    on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
    STRING))
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.

    Zheng

    On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo wrote:

    Can you do an explain <query> and send us the plan.

    Another thing that you may try is to put the entire subquery in the from
    clause and then do an aggregation on it..

    i.e.

    select ..., count(1)
    from (select ...
    from clickstream_output o join .....) t
    group by t.description, t.num limit 40;

    Also are you using the 0.3.0 release candidate?

    Ashish

    ________________________________________
    From: Matt Pestritto [matt@pestritto.com]
    Sent: Wednesday, April 22, 2009 3:53 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Aggregrate Query Fails.

    So the only change was to remove the column alias correct?
    Still no luck. Same result.

    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
    wrote:
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
    Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Subject: Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
    http://aaa@cloudera.com>> wrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum
    or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text

    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text
    at
    org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at
    org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)





    --
    Yours,
    Zheng
  • Zheng Shao at Apr 23, 2009 at 8:39 am
    Hi Jeff,

    Thanks for the suggestion. That's a good idea.
    I just added that to http://wiki.apache.org/hadoop/Hive/FAQ


    Zheng
    On Wed, Apr 22, 2009 at 8:22 PM, Jeff Hammerbacher wrote:

    Hey Zheng,

    Thanks for the insight. Perhaps these sorts of quirks could be added to the
    documentation on the wiki?

    Thanks,
    Jeff

    On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao wrote:

    Hi Matt,

    There is a known bug with JOIN - all output columns from JOIN will become
    STRING.
    The stacktrace shows this is exactly because of that bug.

    The workaround right now is to replace all columns with CAST(xxx as
    STRING), if the column type is not a STRING.


    select m.description, o_buyers.num as buyers*, count(1) as total*
    from clickstream_output o
    join merchandise m on (CAST(o.merchandise_id AS STRING) =
    CAST(m.merchandise_id AS STRING))
    left outer join ( select CAST(o1.merchandise_id AS STRING) as
    merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
    clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
    o_buyers
    on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id
    AS STRING))
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.

    Zheng

    On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo wrote:

    Can you do an explain <query> and send us the plan.

    Another thing that you may try is to put the entire subquery in the from
    clause and then do an aggregation on it..

    i.e.

    select ..., count(1)
    from (select ...
    from clickstream_output o join .....) t
    group by t.description, t.num limit 40;

    Also are you using the 0.3.0 release candidate?

    Ashish

    ________________________________________
    From: Matt Pestritto [matt@pestritto.com]
    Sent: Wednesday, April 22, 2009 3:53 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Aggregrate Query Fails.

    So the only change was to remove the column alias correct?
    Still no luck. Same result.

    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
    wrote:
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
    Reply-To: <hive-user@hadoop.apache.org<http://hive-user@
    hadoop.apache.org>>
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Subject: Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
    http://aaa@cloudera.com>> wrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum
    or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text

    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text
    at
    org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at
    org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)





    --
    Yours,
    Zheng

    --
    Yours,
    Zheng
  • Matt Pestritto at Apr 23, 2009 at 3:38 am
    Zheng,

    Eureka! That did it. I had already changed all of my table definitions so
    all columns were strings so all I needed to do was cast the aggregate
    function in the join sub-query and that did it for me. --
    CAST(count(distinct o1.consumer_id) AS STRING) as num. This would also
    explain some of the exceptions I was seeing on other joins.

    Thanks so much
    -Matt
    On Wed, Apr 22, 2009 at 9:35 PM, Zheng Shao wrote:

    Hi Matt,

    There is a known bug with JOIN - all output columns from JOIN will become
    STRING.
    The stacktrace shows this is exactly because of that bug.

    The workaround right now is to replace all columns with CAST(xxx as
    STRING), if the column type is not a STRING.


    select m.description, o_buyers.num as buyers*, count(1) as total*
    from clickstream_output o
    join merchandise m on (CAST(o.merchandise_id AS STRING) =
    CAST(m.merchandise_id AS STRING))
    left outer join ( select CAST(o1.merchandise_id AS STRING) as
    merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
    clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
    o_buyers
    on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
    STRING))
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    *group by m.description, o_buyers.num* limit 40 ;


    It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.

    Zheng

    On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo wrote:

    Can you do an explain <query> and send us the plan.

    Another thing that you may try is to put the entire subquery in the from
    clause and then do an aggregation on it..

    i.e.

    select ..., count(1)
    from (select ...
    from clickstream_output o join .....) t
    group by t.description, t.num limit 40;

    Also are you using the 0.3.0 release candidate?

    Ashish

    ________________________________________
    From: Matt Pestritto [matt@pestritto.com]
    Sent: Wednesday, April 22, 2009 3:53 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Aggregrate Query Fails.

    So the only change was to remove the column alias correct?
    Still no luck. Same result.

    On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
    wrote:
    That is strange... Does below also fail?

    select m.description, o_buyers.num , count(1) as total
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;


    ________________________________
    From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
    Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
    Date: Wed, 22 Apr 2009 15:43:40 -0700
    To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
    Subject: Re: Aggregrate Query Fails.


    Thanks.
    I tried that also earlier:
    FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
    buyers

    On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
    http://aaa@cloudera.com>> wrote:


    in the group by, try this instead:

    group by m.description, buyers limit 40 ;

    Matt Pestritto wrote:
    Hi - I'm having a problem with a query below. When I try to run any
    aggregate function on a column from the sub-query, the job fails.
    The queries and output messages are below.

    Suggestions?

    thanks in advance.

    -- works: 2 map-reduces jobs.
    select m.description, o_buyers.num as buyers
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

    Successful output: PROD7362, 3

    -- fails: 3 map-reduce jobs - 2nd reduce fails.
    select m.description, o_buyers.num as buyers, count(1) as total -- sum
    or max(o_buyers.num) and removing from group by also fails.
    from clickstream_output o
    join merchandise m on (o.merchandise_id = m.merchandise_id)
    left outer join ( select o1.merchandise_id, count(distinct
    o1.consumer_id) as num from clickstream_output o1
    where o1.file_date >= '20090216' and o1.file_date <= '20090217'
    and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
    on (o_buyers.merchandise_id = o.merchandise_id)
    where o.file_date >= '20090216' and o.file_date <= '20090217'
    group by m.description, o_buyers.num limit 40 ;



    java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text

    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
    cast to org.apache.hadoop.io.Text
    at
    org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
    at
    org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

    at
    org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)





    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 22, '09 at 10:22p
activeApr 23, '09 at 8:39a
posts11
users6
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase