i'm seeing "Error: Java heapspace" msg quite often when running hive
queries. i know one way to fix it is to up the jvm memory using this
property: mapred.child.java.opts (currently our cluster uses this value:
i'm hesitant to increase this value because we're running other services
other than hadoop on the same boxes and i wouldn't want to risk running out
of memory for those services. and also i wonder if it's because my query is
so my query looks like this:
INSERT OVERWRITE TABLE tmp_metrics_filtered
select metrics.* from metrics left outer join internal_users on
where internal_users.uid is null and date_str='2011-07-13'
it's filtering out internal user requests from our metrics logs and then
storing the result into a temporary table. there's only 40 entries in
internal_users table. we have about 10GB of metrics logs for that day.
i've also attached a log file for more details.