FAQ
This is sort of a CM issue in that I can't resolve how to the correct
values into CM related to this error in order to resolve them (I think) at
the TaskTracker level.

I can post this elsewhere if this should perhaps be in the 'CDH Users'
instead. But the core issue is trying to resolve in CM where to make the
correct changes to end these Java Heap Space OOME's

We have a 10 DN cluster each box has 48Gb and 24 cores.

Tim

*syslog logs*

2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
*2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
2014-04-09 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
  at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
  at org.apache.hadoop.mapred.Child.main(Child.java:262)

To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

  • Darren Lo at Apr 9, 2014 at 6:20 pm
    Hi Tim,

    What version of CM are you using? What version of CDH?

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do you
    have something set for "MapReduce Child Java Opts Base (Client Override)"?

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)" since I
    see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren

    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens wrote:

    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH Users'
    instead. But the core issue is trying to resolve in CM where to make the
    correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it, send an
    email to scm-users+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Tim R. Havens at Apr 10, 2014 at 4:29 pm
    Darren,

    Thanks for your response. In CDH3 all manually built and configured by
    myself I had this same cluster doing WAY WAY more than it's able to do
    without dying mid way now in CDH4.

    I'm reasonably confident that it's configuration issue. I was Cloudera
    Admin Certified for CDH3, but since CDH4.4+ I've been kinda lost using
    Cloudera Manager. A lot of the settings available just seem kind of vague
    to me, or redundant, even though I'm sure they aren't the GUI is making
    this much more difficult for me that when done by hand. I don't think I'm
    stupid :-) But I'm surely having a fair amount of difficulty getting up to
    speed with CM. I'm fairly uneasy about the results it generates, lets put
    it that way. I'm sure this is my own short coming.

    *Anyway answers to your questions the best I can figure are:*

    On Wednesday, April 9, 2014 1:20:12 PM UTC-5, Darren Lo wrote:

    Hi Tim,

    What version of CM are you using? What version of CDH?
    CM: 4.7.2
    CDH: 4.6

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).
    Yes this is set correctly as I downloaded them, and then placed them there
    by hand.

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do you
    have something set for "MapReduce Child Java Opts Base (Client Override)"?
    *Via CM: mapreduce1 > Configuration with the 'search box set to 'MapReduce
    Child Java Maximum Heap Size'*

    Gateway (Default) / Resouce Management > MapReduce Child Java Maximum Heap
    Size = 1 GiB (perhaps this is too small?)
    Gateway (Default) / Resource Management > I/O Sort Memory Buffer (MiB)
    io.sort.mb = 715 MiB (perhaps this is to large?)

    The rest of the results under this search are set to "Default value is
    empty"

    * "MapReduce Child Java Opts Base (Client Override)"? *

    Is set to "Default value empty"

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)" since
    I see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren


    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens <timh...@gmail.com<javascript:>
    wrote:
    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH Users'
    instead. But the core issue is trying to resolve in CM where to make the
    correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Darren Lo at Apr 10, 2014 at 4:44 pm
    Hi Tim,

    If you know the hadoop name for a property you want to configure, like
    "io.sort.mb", you can use the search features I mentioned to quickly find
    where those are located in the CM UI. To see the results it generates, you
    can inspect the generated files by clicking on the relevant role, then the
    processes tab, then expand "show", or by downloading client configuration
    via the link from the relevant service's page. If something's missing from
    a role's file even though it's configured in CM, that's probably because
    that role doesn't actually use that configuration. Hopefully this helps you
    use CM a little more easily.

    I would try bumping up both your child java max heap size and io sort
    memory buffers. Keep I/O Sort Memory Buffer (MiB) between 25% to 70% of the
    value of MapReduce Child Java Maximum Heap Size.

    If that doesn't help, then ask around cdh-user to find what other hadoop
    parameters you might need to tweak.

    Thanks,
    Darren

    On Thu, Apr 10, 2014 at 9:29 AM, Tim R. Havens wrote:

    Darren,

    Thanks for your response. In CDH3 all manually built and configured by
    myself I had this same cluster doing WAY WAY more than it's able to do
    without dying mid way now in CDH4.

    I'm reasonably confident that it's configuration issue. I was Cloudera
    Admin Certified for CDH3, but since CDH4.4+ I've been kinda lost using
    Cloudera Manager. A lot of the settings available just seem kind of vague
    to me, or redundant, even though I'm sure they aren't the GUI is making
    this much more difficult for me that when done by hand. I don't think I'm
    stupid :-) But I'm surely having a fair amount of difficulty getting up to
    speed with CM. I'm fairly uneasy about the results it generates, lets put
    it that way. I'm sure this is my own short coming.

    *Anyway answers to your questions the best I can figure are:*

    On Wednesday, April 9, 2014 1:20:12 PM UTC-5, Darren Lo wrote:

    Hi Tim,

    What version of CM are you using? What version of CDH?
    CM: 4.7.2
    CDH: 4.6

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).
    Yes this is set correctly as I downloaded them, and then placed them there
    by hand.

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do you
    have something set for "MapReduce Child Java Opts Base (Client Override)"?
    *Via CM: mapreduce1 > Configuration with the 'search box set to 'MapReduce
    Child Java Maximum Heap Size'*

    Gateway (Default) / Resouce Management > MapReduce Child Java Maximum Heap
    Size = 1 GiB (perhaps this is too small?)
    Gateway (Default) / Resource Management > I/O Sort Memory Buffer (MiB)
    io.sort.mb = 715 MiB (perhaps this is to large?)

    The rest of the results under this search are set to "Default value is
    empty"

    * "MapReduce Child Java Opts Base (Client Override)"? *

    Is set to "Default value empty"

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)" since
    I see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren

    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens wrote:

    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH Users'
    instead. But the core issue is trying to resolve in CM where to make the
    correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Tim R. Havens at Apr 10, 2014 at 5:58 pm
    Thanks again Darin,

    I'll see what I can figure out with this info you've provided.

    I'm trying heap size of 1.5GiB now and io.sort.mb of 512M

    I suppose that CM may have too many slots configured for Map/Reduce and
    something like that could cause a OOME I would think? I just used what CM
    'suggested' though.

    But thanks again.
    On Thursday, April 10, 2014 11:44:13 AM UTC-5, Darren Lo wrote:

    Hi Tim,

    If you know the hadoop name for a property you want to configure, like
    "io.sort.mb", you can use the search features I mentioned to quickly find
    where those are located in the CM UI. To see the results it generates, you
    can inspect the generated files by clicking on the relevant role, then the
    processes tab, then expand "show", or by downloading client configuration
    via the link from the relevant service's page. If something's missing from
    a role's file even though it's configured in CM, that's probably because
    that role doesn't actually use that configuration. Hopefully this helps you
    use CM a little more easily.

    I would try bumping up both your child java max heap size and io sort
    memory buffers. Keep I/O Sort Memory Buffer (MiB) between 25% to 70% of the
    value of MapReduce Child Java Maximum Heap Size.

    If that doesn't help, then ask around cdh-user to find what other hadoop
    parameters you might need to tweak.

    Thanks,
    Darren


    On Thu, Apr 10, 2014 at 9:29 AM, Tim R. Havens <timh...@gmail.com<javascript:>
    wrote:
    Darren,

    Thanks for your response. In CDH3 all manually built and configured by
    myself I had this same cluster doing WAY WAY more than it's able to do
    without dying mid way now in CDH4.

    I'm reasonably confident that it's configuration issue. I was Cloudera
    Admin Certified for CDH3, but since CDH4.4+ I've been kinda lost using
    Cloudera Manager. A lot of the settings available just seem kind of vague
    to me, or redundant, even though I'm sure they aren't the GUI is making
    this much more difficult for me that when done by hand. I don't think I'm
    stupid :-) But I'm surely having a fair amount of difficulty getting up to
    speed with CM. I'm fairly uneasy about the results it generates, lets put
    it that way. I'm sure this is my own short coming.

    *Anyway answers to your questions the best I can figure are:*

    On Wednesday, April 9, 2014 1:20:12 PM UTC-5, Darren Lo wrote:

    Hi Tim,

    What version of CM are you using? What version of CDH?
    CM: 4.7.2
    CDH: 4.6

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).
    Yes this is set correctly as I downloaded them, and then placed them
    there by hand.

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do
    you have something set for "MapReduce Child Java Opts Base (Client
    Override)"?
    *Via CM: mapreduce1 > Configuration with the 'search box set to
    'MapReduce Child Java Maximum Heap Size'*

    Gateway (Default) / Resouce Management > MapReduce Child Java Maximum
    Heap Size = 1 GiB (perhaps this is too small?)
    Gateway (Default) / Resource Management > I/O Sort Memory Buffer (MiB)
    io.sort.mb = 715 MiB (perhaps this is to large?)

    The rest of the results under this search are set to "Default value is
    empty"

    * "MapReduce Child Java Opts Base (Client Override)"? *

    Is set to "Default value empty"

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)"
    since I see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren

    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens wrote:

    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH Users'
    instead. But the core issue is trying to resolve in CM where to make the
    correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Tim R. Havens at Apr 15, 2014 at 2:23 pm
    Darin,

    Thanks so much for your help last week. I was able to resolve this by
    upping the Heap, and lowering the io.sort.mb to 30% of the heap.

    I sure so appreciate your help!

    Tim
    On Thursday, April 10, 2014 12:58:25 PM UTC-5, Tim R. Havens wrote:

    Thanks again Darin,

    I'll see what I can figure out with this info you've provided.

    I'm trying heap size of 1.5GiB now and io.sort.mb of 512M

    I suppose that CM may have too many slots configured for Map/Reduce and
    something like that could cause a OOME I would think? I just used what CM
    'suggested' though.

    But thanks again.
    On Thursday, April 10, 2014 11:44:13 AM UTC-5, Darren Lo wrote:

    Hi Tim,

    If you know the hadoop name for a property you want to configure, like
    "io.sort.mb", you can use the search features I mentioned to quickly find
    where those are located in the CM UI. To see the results it generates, you
    can inspect the generated files by clicking on the relevant role, then the
    processes tab, then expand "show", or by downloading client configuration
    via the link from the relevant service's page. If something's missing from
    a role's file even though it's configured in CM, that's probably because
    that role doesn't actually use that configuration. Hopefully this helps you
    use CM a little more easily.

    I would try bumping up both your child java max heap size and io sort
    memory buffers. Keep I/O Sort Memory Buffer (MiB) between 25% to 70% of the
    value of MapReduce Child Java Maximum Heap Size.

    If that doesn't help, then ask around cdh-user to find what other hadoop
    parameters you might need to tweak.

    Thanks,
    Darren

    On Thu, Apr 10, 2014 at 9:29 AM, Tim R. Havens wrote:

    Darren,

    Thanks for your response. In CDH3 all manually built and configured by
    myself I had this same cluster doing WAY WAY more than it's able to do
    without dying mid way now in CDH4.

    I'm reasonably confident that it's configuration issue. I was Cloudera
    Admin Certified for CDH3, but since CDH4.4+ I've been kinda lost using
    Cloudera Manager. A lot of the settings available just seem kind of vague
    to me, or redundant, even though I'm sure they aren't the GUI is making
    this much more difficult for me that when done by hand. I don't think I'm
    stupid :-) But I'm surely having a fair amount of difficulty getting up to
    speed with CM. I'm fairly uneasy about the results it generates, lets put
    it that way. I'm sure this is my own short coming.

    *Anyway answers to your questions the best I can figure are:*

    On Wednesday, April 9, 2014 1:20:12 PM UTC-5, Darren Lo wrote:

    Hi Tim,

    What version of CM are you using? What version of CDH?
    CM: 4.7.2
    CDH: 4.6

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).
    Yes this is set correctly as I downloaded them, and then placed them
    there by hand.

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do
    you have something set for "MapReduce Child Java Opts Base (Client
    Override)"?
    *Via CM: mapreduce1 > Configuration with the 'search box set to
    'MapReduce Child Java Maximum Heap Size'*

    Gateway (Default) / Resouce Management > MapReduce Child Java Maximum
    Heap Size = 1 GiB (perhaps this is too small?)
    Gateway (Default) / Resource Management > I/O Sort Memory Buffer (MiB)
    io.sort.mb = 715 MiB (perhaps this is to large?)

    The rest of the results under this search are set to "Default value is
    empty"

    * "MapReduce Child Java Opts Base (Client Override)"? *

    Is set to "Default value empty"

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)"
    since I see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren

    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens wrote:

    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH Users'
    instead. But the core issue is trying to resolve in CM where to make the
    correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it,
    send an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Darren Lo at Apr 15, 2014 at 4:11 pm
    Glad you figured it out, and thanks for checking back in.

    On Tue, Apr 15, 2014 at 7:23 AM, Tim R. Havens wrote:

    Darin,

    Thanks so much for your help last week. I was able to resolve this by
    upping the Heap, and lowering the io.sort.mb to 30% of the heap.

    I sure so appreciate your help!

    Tim

    On Thursday, April 10, 2014 12:58:25 PM UTC-5, Tim R. Havens wrote:

    Thanks again Darin,

    I'll see what I can figure out with this info you've provided.

    I'm trying heap size of 1.5GiB now and io.sort.mb of 512M

    I suppose that CM may have too many slots configured for Map/Reduce and
    something like that could cause a OOME I would think? I just used what CM
    'suggested' though.

    But thanks again.
    On Thursday, April 10, 2014 11:44:13 AM UTC-5, Darren Lo wrote:

    Hi Tim,

    If you know the hadoop name for a property you want to configure, like
    "io.sort.mb", you can use the search features I mentioned to quickly find
    where those are located in the CM UI. To see the results it generates, you
    can inspect the generated files by clicking on the relevant role, then the
    processes tab, then expand "show", or by downloading client configuration
    via the link from the relevant service's page. If something's missing from
    a role's file even though it's configured in CM, that's probably because
    that role doesn't actually use that configuration. Hopefully this helps you
    use CM a little more easily.

    I would try bumping up both your child java max heap size and io sort
    memory buffers. Keep I/O Sort Memory Buffer (MiB) between 25% to 70% of the
    value of MapReduce Child Java Maximum Heap Size.

    If that doesn't help, then ask around cdh-user to find what other hadoop
    parameters you might need to tweak.

    Thanks,
    Darren

    On Thu, Apr 10, 2014 at 9:29 AM, Tim R. Havens wrote:

    Darren,

    Thanks for your response. In CDH3 all manually built and configured by
    myself I had this same cluster doing WAY WAY more than it's able to do
    without dying mid way now in CDH4.

    I'm reasonably confident that it's configuration issue. I was Cloudera
    Admin Certified for CDH3, but since CDH4.4+ I've been kinda lost using
    Cloudera Manager. A lot of the settings available just seem kind of vague
    to me, or redundant, even though I'm sure they aren't the GUI is making
    this much more difficult for me that when done by hand. I don't think I'm
    stupid :-) But I'm surely having a fair amount of difficulty getting up to
    speed with CM. I'm fairly uneasy about the results it generates, lets put
    it that way. I'm sure this is my own short coming.

    *Anyway answers to your questions the best I can figure are:*

    On Wednesday, April 9, 2014 1:20:12 PM UTC-5, Darren Lo wrote:

    Hi Tim,

    What version of CM are you using? What version of CDH?
    CM: 4.7.2
    CDH: 4.6

    On the machine from which you're launching the job, do you have client
    configuration deployed? This is normally in /etc/hadoop/conf. Make sure
    mapred-site.xml it says it was generated by cloudera manager and the file
    timestamp looks reasonable. You can re-deploy client configuration (in the
    dropdown menu near cluster name on home page) to make sure these are
    up-to-date (and double-check the timestamp).
    Yes this is set correctly as I downloaded them, and then placed them
    there by hand.

    What is your setting for "MapReduce Child Java Maximum Heap Size"? Do
    you have something set for "MapReduce Child Java Opts Base (Client
    Override)"?
    *Via CM: mapreduce1 > Configuration with the 'search box set to
    'MapReduce Child Java Maximum Heap Size'*

    Gateway (Default) / Resouce Management > MapReduce Child Java Maximum
    Heap Size = 1 GiB (perhaps this is too small?)
    Gateway (Default) / Resource Management > I/O Sort Memory Buffer (MiB)
    io.sort.mb = 715 MiB (perhaps this is to large?)

    The rest of the results under this search are set to "Default value is
    empty"

    * "MapReduce Child Java Opts Base (Client Override)"? *

    Is set to "Default value empty"

    You may also / instead need to tweak "I/O Sort Memory Buffer (MiB)"
    since I see "createSortingCollector" in your stack trace.

    cdh-user would be helpful for interpreting these error messages if my
    suggestions don't get you anywhere. They've more expertise in interpreting
    CDH stack traces.

    Thanks,
    Darren

    On Wed, Apr 9, 2014 at 10:51 AM, Tim R. Havens wrote:

    This is sort of a CM issue in that I can't resolve how to the correct
    values into CM related to this error in order to resolve them (I think) at
    the TaskTracker level.

    I can post this elsewhere if this should perhaps be in the 'CDH
    Users' instead. But the core issue is trying to resolve in CM where to
    make the correct changes to end these Java Heap Space OOME's

    We have a 10 DN cluster each box has 48Gb and 24 cores.

    Tim

    *syslog logs*

    2014-04-09 17:06:57,086 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2014-04-09 17:06:59,289 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk11/mapred/local/taskTracker/distcache/8513129893785507225_-1403615811_1198814330/nn1-cdh4.hadoop.ddc.com/tmp/hive-hiveadmin/hive_2014-04-09_16-47-27_901_1819771958941354837-1/-mr-10003/aafa54a0-a356-4a47-b2be-f294205cb97b <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/HIVE_PLANaafa54a0-a356-4a47-b2be-f294205cb97b
    2014-04-09 17:06:59,311 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk12/mapred/local/taskTracker/hiveadmin/distcache/-6444096287957589437_1994028798_1198814886/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIPCity.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIPCity.dat
    2014-04-09 17:06:59,344 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /disk13/mapred/local/taskTracker/hiveadmin/distcache/1312433130798488467_-2016506626_1198814964/nn1-cdh4.hadoop.ddc.com/user/hiveadmin/.staging/job_201404091642_0029/files/GeoIP.dat <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/GeoIP.dat
    2014-04-09 17:06:59,372 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/job.jar <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/job.jar
    2014-04-09 17:06:59,392 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /disk6/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/jars/.job.jar.crc <- /disk8/mapred/local/taskTracker/hiveadmin/jobcache/job_201404091642_0029/attempt_201404091642_0029_m_000000_0/work/.job.jar.crc
    2014-04-09 17:06:59,571 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2014-04-09 17:06:59,573 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2014-04-09 17:07:00,253 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2014-04-09 17:07:00,292 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@163b4b1e
    2014-04-09 17:07:00,740 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0:0+74659072,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=CC/hostname=web16ddc.cavecreek.net/000015_0:0+1420527,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:0+134217728,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web2.sj.ddc.com/000002_0:134217728+45672186,/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=01/source=adfeed/datacenter=SJ/hostname=web4.sj.ddc.com/000000_0:0+134217728InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
    *2014-04-09 17:07:00,794 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH*
    2014-04-09 17:07:44,798 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,803 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,804 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,805 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,837 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,838 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,839 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
    2014-04-09 17:07:44,842 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_0
    2014-04-09 <http://nn1-cdh4.hadoop.ddc.com:8020/user/hive/warehouse/prod.db/clf/logdate=2014-03-31/loghour=00/source=adfeed/datacenter=VA/hostname=web5.va.ddc.com/000011_02014-04-09> 17:07:44,843 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
    2014-04-09 17:07:44,854 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
    2014-04-09 17:07:44,863 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer*2014-04-09 17:07:44,867 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 715*
    2014-04-09 17:07:46,661 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1*2014-04-09 17:07:46,667 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space*
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

    To unsubscribe from this group and stop receiving emails from it,
    send an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to scm-users+...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedApr 9, '14 at 5:51p
activeApr 15, '14 at 4:11p
posts7
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Tim R. Havens: 4 posts Darren Lo: 3 posts

People

Translate

site design / logo © 2022 Grokbase