FAQ

Search Discussions

140 discussions - 487 posts

  • Hi All I am facing a hard problem. I am running a map reduce job using streaming but it fails and it gives the following error. Caught: java.lang.OutOfMemoryError: Java heap space at ...
    Shuja RehmanShuja Rehman
    Jul 10, 2010 at 5:49 pm
    Jul 13, 2010 at 11:44 am
  • Hi, I have a cluster consisting of 11 slaves and a single master. The thing is that 3 of my slaves have i7 cpu which means that they can have up to 8 simultaneous processes. But other slaves only ...
    Edward choiEdward choi
    Jul 8, 2010 at 9:08 am
    Jul 23, 2010 at 12:26 pm
  • Hi all, A quick note that I'll be the instructor for the next Hadoop Bootcamp training course from Scale Unlimited. It's a two day class on July 22nd and 23rd, which covers the usual high (and low) ...
    Ken KruglerKen Krugler
    Jul 9, 2010 at 3:08 pm
    Sep 16, 2010 at 1:57 pm
  • Hello: I got source code from http://github.com/kevinweil/hadoop-lzo,compiled them successfully,and then 1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master and slave 2,Copy all ...
    Alex LuyaAlex Luya
    Jul 21, 2010 at 12:59 am
    Sep 2, 2010 at 2:10 am
  • Hi, I am using hadoop framework for writing MapReduce jobs. I want to redirect the output of Map into files of my choice and later use those files as input for Reduce phase. Could you please suggest, ...
    Pramy BhatsPramy Bhats
    Jul 1, 2010 at 10:08 pm
    Jul 8, 2010 at 11:40 pm
  • Hey all, I am working on LZO compression for improving MapREduce performance, I just want to know where can i find following files : 3. ||standard-hadoop-native-libs Thanks and regards, Sonali "Legal ...
    SonaliSonali
    Jul 15, 2010 at 10:46 am
    Aug 14, 2010 at 3:12 am
  • Hello all, As a new user of hadoop, I am having some problems with understanding some things. I am writing a program to load a file to the distributed cache and read this file in each mapper. In my ...
    Abc xyzAbc xyz
    Jul 8, 2010 at 7:04 pm
    Jul 11, 2010 at 8:13 am
  • Hi, I am trying to debug the new built hadoop-core-dev.jar in Eclipse. To simplify the debug process, firstly I setup the Hadoop in single-node mode on my localhost. a) configure debug in eclipse, ...
    Pramy BhatsPramy Bhats
    Jul 13, 2010 at 10:09 pm
    Jul 18, 2010 at 2:47 am
  • When I run "hadoop fs –text <my-sequence-file " I getthis WARNING: hadoop fs -text/data/seq/metrics.seq 10/07/13 06:57:25WARN util.NativeCodeLoader: Unable to load native-hadoop library for ...
    Some BodySome Body
    Jul 13, 2010 at 2:18 pm
    Jul 14, 2010 at 4:28 pm
  • Hi I want to add priority to tasks. How can i do it? -- View this message in context: http://lucene.472066.n3.nabble.com/add-priority-to-task-tp995784p995784.html Sent from the Hadoop lucene-users ...
    Saurabhsuman8989Saurabhsuman8989
    Jul 27, 2010 at 4:58 pm
    Aug 3, 2010 at 4:56 am
  • I think my question is ignored, so just post it again: I am a bit confused of how this attribute is used. My understanding is it's related with file read/write. And I can see, in LineReader.java, ...
    Elton skyElton sky
    Jul 29, 2010 at 2:21 pm
    Aug 3, 2010 at 4:34 am
  • Hi, in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 however during monitoring the job in job tracker I see only 1 reducer working first it is reduce copy - can someone please explain what ...
    Vitaliy SemochkinVitaliy Semochkin
    Jul 28, 2010 at 7:25 pm
    Jul 29, 2010 at 4:38 pm
  • Hi All, I am still a newbie to Hadoop so please understand my confusions. I went through many tutorials and through Hadoop's API and yet couldn't figure out how to make REST webserbvice that would ...
    Eluharani zineellabidineEluharani zineellabidine
    Jul 28, 2010 at 9:02 pm
    Jul 29, 2010 at 9:15 am
  • We are trying to load data into hdfs from one of the slaves and when the put command is run from a slave(datanode) all of the blocks are written to the datanode's hdfs, and not distributed to all of ...
    Nathan GriceNathan Grice
    Jul 12, 2010 at 11:22 pm
    Jul 13, 2010 at 5:23 pm
  • Hi all, I recently hosted an "Intro to Hadoop" session at the BigDataCamp unconference last week. I later wrote down questions from the audience that seemed useful to other Hadoop beginners, and the ...
    Ken KruglerKen Krugler
    Jul 8, 2010 at 9:36 pm
    Jul 11, 2010 at 5:32 pm
  • Hi, After a restart of our live cluster today, the name node fails to start with the log message seen below. There is a big file called edits.new in the "current" folder that seems be the only one ...
    Peter FalkPeter Falk
    Jul 7, 2010 at 12:47 pm
    Jul 9, 2010 at 2:39 pm
  • So I am aware of the problem with small files and I have read this article http://www.cloudera.com/blog/2009/02/the-small-files-problem/ I am just wondering if there has been any real change in this? ...
    Ananth SarathyAnanth Sarathy
    Jul 6, 2010 at 7:34 pm
    Jul 7, 2010 at 1:25 am
  • Hi all, I looked through the "Cluster Setup" guide under link http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html and found there's a "fs.inmemory.size.mb" parameter for specifying memory ...
    Yu LiYu Li
    Jul 1, 2010 at 7:44 am
    Jul 1, 2010 at 5:23 pm
  • Hi, I have a newbie question. Scenario: Hadoop version: 0.20.2 MR coding will be done in java. Just starting out with my first Hadoop setup. I would like to know are there any best practice ways to ...
    UrckleUrckle
    Jul 21, 2010 at 4:31 pm
    Jul 24, 2010 at 6:52 am
  • Hi All, I have a job where all processing is done by the mappers, but each mapper produces a small file, which I want to combine into 3-4 large ones. In addition, I only care about the values, not ...
    Leo AlekseyevLeo Alekseyev
    Jul 21, 2010 at 9:01 am
    Jul 22, 2010 at 9:13 pm
  • Hello, We're using Hadoop in a C-oriented architecture ourselves, using libhdfs for storing files and Hadoop.Pipes for map/reduce jobs. Since the data we're storing benefits a lot from compression, ...
    Leon MergenLeon Mergen
    Jul 19, 2010 at 12:57 pm
    Jul 20, 2010 at 11:31 pm
  • I'm seeing this error in my tasktracker's log. FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201007160344_0001_m_000005_1 - Killed : GC overhead limit exceed more detail from my task's ...
    Some BodySome Body
    Jul 16, 2010 at 12:12 pm
    Jul 18, 2010 at 7:07 pm
  • Hi everyone, I hope this is the right place for my question. If not, please, feel free to ignore it ;) and I'm sorry for any inconvenience made :( I'm writing a simple program for enumerating ...
    Nikolay KorovaikoNikolay Korovaiko
    Jul 16, 2010 at 1:20 am
    Jul 17, 2010 at 6:28 am
  • I was looking at the web interface and found that some of my nodes have enormous amount of "Non DFS Used". There is even a node with 800GB of "Non DFS Used" which is just ridiculous. I tried to ...
    Edward choiEdward choi
    Jul 7, 2010 at 8:13 am
    Jul 8, 2010 at 12:22 am
  • Hi, I installed a small Hadoop-Cluster with Cloudera CDH3. 2 Server are running. One is running the NameNode and DataNode. The other server is just a DataNode. A dedicated Client exists too. I want ...
    Christian BaunChristian Baun
    Jul 6, 2010 at 4:53 pm
    Jul 7, 2010 at 9:32 pm
  • Hi, Is it possible move all the data blocks off a cluster node and then decommision the node? I'm asking because, now that my MR job is working, I'd like see how things scale. I.e., less processing ...
    Some BodySome Body
    Jul 6, 2010 at 2:36 pm
    Jul 7, 2010 at 2:01 pm
  • Our team is still new to Hadoop, and a colleague and I are trying to make a decision on file formats. The arguments are: * We should use a SequenceFile (binary) format as it's faster for the machine ...
    David RosenstrauchDavid Rosenstrauch
    Jul 2, 2010 at 9:53 pm
    Jul 6, 2010 at 3:14 pm
  • Dear All, We recently upgraded from CDH3b1 to b2 and ever since, all our mapreduce jobs that use the DistributedCache have failed. Typically, we add files to the cache prior to job startup, using ...
    Jamie CockrillJamie Cockrill
    Jul 16, 2010 at 8:59 am
    Oct 6, 2010 at 9:04 pm
  • http://server:50030/jobtracker.jsp generates the following error message: HTTP ERROR: 500 GC overhead limit exceeded RequestURI=/jobtracker.jsp Caused by: java.lang.OutOfMemoryError: GC overhead ...
    Jiang lichtJiang licht
    Jul 30, 2010 at 7:10 pm
    Aug 2, 2010 at 5:01 am
  • I have setup 2 node clusters and ran many jobs including wordcount. In all the output folders i am getting two mutual exclusive output files as part-00000 and part-00001 instead of single output. A ...
    Deepak DiwakarDeepak Diwakar
    Jul 28, 2010 at 7:17 pm
    Jul 28, 2010 at 7:57 pm
  • Hello: I got source code from http://github.com/kevinweil/hadoop-lzo,compiled them successfully,and then 1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master and slave 2,Copy all ...
    Alex LuyaAlex Luya
    Jul 24, 2010 at 7:41 am
    Jul 25, 2010 at 2:18 pm
  • Hi everyone, I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so I'm getting started with Hadoop streaming and python mapper and reducer. into Hadoop via the "-mapper" and ...
    Moritz KrogMoritz Krog
    Jul 14, 2010 at 7:36 am
    Jul 14, 2010 at 9:20 am
  • Hello everyone, I have a cluster from 8 datanodes and a namenode. When I start teragen program everything works OK, the data is generated. But when I start terasort program, seems like that only 2 ...
    Tonci BuljanTonci Buljan
    Jul 9, 2010 at 9:33 am
    Jul 11, 2010 at 5:41 pm
  • is the next release of Hadoop going to .21 or .22? I was just wondering, cause I am hearing conflicting things about the next release having Kerberos security but looking through some past emails, ...
    Ananth SarathyAnanth Sarathy
    Jul 7, 2010 at 3:10 pm
    Jul 11, 2010 at 5:22 am
  • Hi ALL, Does anyone know HOW Mappers pass their output to Reducers? Is HTTP used for that or there is some other communication protocol used for transferring the output of Mappers to Reducers. ...
    Ahmad ShahzadAhmad Shahzad
    Jul 1, 2010 at 2:55 pm
    Jul 2, 2010 at 1:40 am
  • Hi, Is there a list of configuration parameters that can be set per job. Specifically, can one set: - mapred.tasktracker.map.tasks.maximum - mapred.tasktracker.reduce.tasks.maximum - ...
    Devajyoti SarkarDevajyoti Sarkar
    Jul 29, 2010 at 8:56 am
    Jul 31, 2010 at 5:27 am
  • Hello, I've implemented a program using map reduce for a simple distance calculations between two 2D points. I've set up my input such that all calculations should be the same but they are not. This ...
    Erik TestErik Test
    Jul 28, 2010 at 6:45 pm
    Jul 30, 2010 at 3:04 pm
  • Hello: I got source code from http://github.com/kevinweil/hadoop-lzo,compiled them successfully,and then 1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master and slave 2,Copy all ...
    Alex LuyaAlex Luya
    Jul 28, 2010 at 2:45 pm
    Jul 29, 2010 at 4:11 pm
  • Hello, I'm getting the following messages when I try to run a job I've developed. hadoop jar distanceCalc.jar DistanceCalc distanceCalculations distanceCalculations/output9 10/07/28 09:25:37 WARN ...
    Erik TestErik Test
    Jul 28, 2010 at 3:43 pm
    Jul 28, 2010 at 4:33 pm
  • hi, In my custom partitioner (which may assign a key to more than one partitions), I want to sometimes send a key to more than one reducers. But the default getPartition method provided by hadoop ...
    Abc xyzAbc xyz
    Jul 26, 2010 at 2:09 pm
    Jul 26, 2010 at 7:48 pm
  • The team that manages our Hadoop clusters is currently being pressured to reduce block replication from 3 to 2 in our production cluster. This request is for various reasons -- particularly the ...
    Bobby DennettBobby Dennett
    Jul 22, 2010 at 1:30 am
    Jul 22, 2010 at 6:30 pm
  • Hi all, I just restarted my cluster and now the namenode is not starting up. I get the following error: 10/07/22 09:14:40 INFO namenode.NameNode: STARTUP_MSG: ...
    Denim LiveDenim Live
    Jul 22, 2010 at 8:33 am
    Jul 22, 2010 at 1:20 pm
  • Hi everyone, I was curious if there is any option to use Hadoop in single node mode in a way, that enables the process to use more system ressources. Right now, Hadoop uses one mapper and one ...
    Moritz KrogMoritz Krog
    Jul 16, 2010 at 9:02 am
    Jul 16, 2010 at 6:36 pm
  • Hey all, We just added queue's to our capacity scheduler and now (we did not set a default.. which it appears we might have to change) if i try and run a simple streaming job i get the following ...
    Eric.broseEric.brose
    Jul 14, 2010 at 4:42 pm
    Jul 15, 2010 at 5:03 am
  • Hi, I am trying to use the hadoop's datajoin for joining two relation. According to the Readme file of datajoin, it gives the following syntax: $HADOOP_HOME/bin/hadoop jar ...
    Denim LiveDenim Live
    Jul 10, 2010 at 8:43 am
    Jul 14, 2010 at 4:38 pm
  • Hello to all I'm novice in working with mapreduce and i'm developping a mapreduce function that take xml documents as inputs. How can i make input files and precise it to the map function Thanks for ...
    Khaled BEN BAHRIKhaled BEN BAHRI
    Jul 13, 2010 at 8:17 am
    Jul 13, 2010 at 11:33 am
  • And curiously ( albeit alarmingly ), I don't see network activity ( thru cacti ) on these nodes indicating blocks are being moved to compensate for the excluded nodes. This is in the past 2 hours ...
    Arun RamakrishnanArun Ramakrishnan
    Jul 9, 2010 at 12:14 am
    Jul 9, 2010 at 2:34 am
  • Hi ALL, How can i add a jar file of my own to hadoop directory and than call the classes that are in that jar file from hadoop classes. Regards, Ahmad Shahzad
    Ahmad ShahzadAhmad Shahzad
    Jul 8, 2010 at 2:18 am
    Jul 8, 2010 at 3:13 am
  • Hi ALL, Can anyone tell me that what is the purpose of IPC classes in hadoop. They are in the \src\core\org\apache\hadoop\ipc folder. Regards, Ahmad Shahzad
    Ahmad ShahzadAhmad Shahzad
    Jul 7, 2010 at 6:33 pm
    Jul 7, 2010 at 7:29 pm
  • Hey Folks, I have to mess around with hashing. I want to take two input sources, partition them using hash function, then make the in-memory hash table for each partition of one sources, and compare ...
    Abc xyzAbc xyz
    Jul 3, 2010 at 6:11 am
    Jul 5, 2010 at 1:55 pm
Group Navigation
period‹ prev | Jul 2010 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions140
posts487
users144
websitehadoop.apache.org...
irc#hadoop

144 users for July 2010

Ted Yu: 23 posts Alex Kozlov: 21 posts Shuja Rehman: 16 posts Abc xyz: 15 posts Allen Wittenauer: 15 posts Edward Capriolo: 11 posts Edward choi: 11 posts Vitaliy Semochkin: 11 posts Ahmad Shahzad: 10 posts Ken Goodhope: 10 posts Michael Segel: 10 posts Pramy Bhats: 10 posts Some Body: 10 posts Gang Luo: 9 posts Yu Li: 9 posts Alex Loddengaard: 8 posts Alex Luya: 8 posts Hemanth Yamijala: 8 posts Pierre ANCELOT: 7 posts Denim Live: 6 posts
show more