Search Discussions

117 discussions - 444 posts

  • Greetings, I'm running into a brain-numbing problem on Elastic MapReduce. I'm running a decent-size task (22,000 mappers, a ton of GZipped input blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ...
    Bradford StephensBradford Stephens
    Sep 26, 2010 at 7:56 am
    Sep 27, 2010 at 6:25 pm
  • Hi all, I wrote some new writable files corresponding to my data input. I added them to /src/org/......../io/ where all the writables reside. Similarly, I also wrote input/output format files and a ...
    Matthew JohnMatthew John
    Sep 8, 2010 at 3:14 am
    Sep 8, 2010 at 11:21 am
  • Hi all, I got the Backup node(BN) that includes all the checkpoint responsibilities, and it maintains an up-to-date namespace state, which is always in sync with the active NN. Q1. In which situation ...
    Sep 9, 2010 at 6:37 am
    Sep 28, 2010 at 2:58 am
  • Hi guys, I wanted to take in file with input : <key1 <value1 <key2 <value2 ...... binary sequence file (key and value length are constant) as input for the Sort (examples) . But as I understand the ...
    Matthew JohnMatthew John
    Sep 13, 2010 at 9:16 am
    Sep 14, 2010 at 2:54 pm
  • I am trying to configure out distributed Hadoop setup but for some reason my datanodes can not connect to the namenode. 2010-09-06 04:06:05,040 INFO org.apache.hadoop.ipc.Client: Retrying connect to ...
    Sep 6, 2010 at 6:18 pm
    Oct 20, 2010 at 12:27 am
  • Hello everyone I can not load local file to HDFS. It gave the following errors. WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block ...
    He ChenHe Chen
    Sep 25, 2010 at 8:42 pm
    Sep 28, 2010 at 8:07 pm
  • Hey guys, I'm running into issues when doing a moderate-size EMR job on 12 m1.large nodes. Mappers and Reducers will randomly fail. The EMR defaults are 2 mappers / 2 reducers per node. I've tried ...
    Bradford StephensBradford Stephens
    Sep 18, 2010 at 8:48 am
    Sep 20, 2010 at 8:05 am
  • hi ,my hadoop friends:i have the 3 questions about hadoop.there are .... 1 the speed between the datanodes. Tera data in one datanodes , the data transfers from one datanode to the another datanode. ...
    褚 鵬兵褚 鵬兵
    Sep 6, 2010 at 8:32 am
    Sep 8, 2010 at 6:50 pm
  • Hello, We just recently switched to using lzo compressed file input for our hadoop cluster using Kevin Weil's lzo library. The files are pretty uniform in size at around 200MB compressed. Our block ...
    Sep 24, 2010 at 9:06 pm
    Sep 28, 2010 at 6:52 pm
  • I am getting the following errors from my datanodes when I start the namenode. 2010-09-08 14:17:40,690 INFO org.apache.hadoop.ipc.RPC: Server at hadoop1/10.XXX.XXX.XX:9000 not available yet, Zzzzz... ...
    Sep 9, 2010 at 5:29 am
    Sep 13, 2010 at 9:49 am
  • Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items ...
    Neil GhoshNeil Ghosh
    Sep 10, 2010 at 9:05 pm
    Sep 11, 2010 at 5:00 am
  • Dear all, I am having this exception when starting jobtracker, and I checked by netstat that the port is not in use before running. Could you please point out where might be the problem? Many thanks ...
    Jing TieJing Tie
    Sep 18, 2010 at 12:19 am
    Mar 17, 2011 at 6:44 pm
  • I have a cluster of 30 nodes, and I put data into the cluster on one node I called "NodeA" here. The consequence is that now this node always stores more data than other node, for example other nodes ...
    Sep 28, 2010 at 8:09 am
    Oct 8, 2010 at 9:04 am
  • HI All, I am getting this Exception on a cluster(10 nodes) when I am running simple hadoop map / reduce job. I don't have this Exception while running it on my desktop in hadoop's pseudo distributed ...
    Tali KTali K
    Sep 29, 2010 at 9:45 pm
    Oct 1, 2010 at 9:01 pm
  • Hi, We have a job that writes many small files (using MultipleOutputFormat) and its exceeding the 4000 xcievers that we have configured. What is the effect on the cluster of increasing this count to ...
    Martin ArnandzeMartin Arnandze
    Sep 23, 2010 at 12:05 pm
    Sep 27, 2010 at 12:47 pm
  • Hi, I am a new Hadoop user. I followed the tutorial by Michael Noll on http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29(as well as for single node) with ...
    Medha AtreMedha Atre
    Sep 9, 2010 at 3:49 pm
    Sep 10, 2010 at 6:31 pm
  • ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with ...
    Matt TanquaryMatt Tanquary
    Sep 8, 2010 at 3:41 pm
    Sep 9, 2010 at 4:28 pm
  • We are looking for ways to prevent Hadoop daemon logs from piling up (over time they can reach several tens of GB and become a nuisance). Unfortunately, the log4j DRFA class doesn't seem to provide ...
    Leo AlekseyevLeo Alekseyev
    Sep 27, 2010 at 11:12 pm
    Sep 29, 2010 at 4:19 pm
  • Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - does this need to be done? this.context = context; } ...
    Mark KerznerMark Kerzner
    Sep 17, 2010 at 3:39 am
    Sep 24, 2010 at 11:08 pm
  • Hi, I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ? Any input is helpful. Thanks, Rahul
    Rahul MalviyaRahul Malviya
    Sep 14, 2010 at 9:33 pm
    Sep 17, 2010 at 9:13 pm
  • Hi, in 0.18, it used to be main().... JobConf conf = new JobConf(MyMR.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); but now these are all deprecated, what do I use instead? Thank ...
    Mark KerznerMark Kerzner
    Sep 15, 2010 at 4:49 am
    Sep 16, 2010 at 2:12 pm
  • Hi buddies, I'm a CS student and I would like to ask if you guys have some ideas of research project that can be done with Hadoop or other projects like HBase, Hive, KosmosFS, Pig. In my country, ...
    Luan CestariLuan Cestari
    Sep 7, 2010 at 3:50 am
    Sep 9, 2010 at 2:18 pm
  • my cluster consists of 8 nodes with the namenode in an independent machine,the following info is what I get from namenode web ui: 291 files and directories, 1312 blocks = 1603 total. Heap Size is ...
    Sep 6, 2010 at 7:28 am
    Sep 8, 2010 at 1:34 am
  • Hi, I am trying to sort a list of numbers (one per line) using hadoop mapreduce. Kindly suggest any reference and code. How do I implement custom input format and recordreader so that both key and ...
    Neil GhoshNeil Ghosh
    Sep 5, 2010 at 8:19 pm
    Sep 6, 2010 at 2:36 pm
  • Hi, The master files name in hadoop/conf is called as masters. Wondering if I can configure multiple masters for a single cluster. If yes, how can I use them? Thanks, Bhushan DISCLAIMER ========== ...
    Bhushan MahaleBhushan Mahale
    Sep 29, 2010 at 8:26 pm
    Sep 30, 2010 at 12:50 am
  • Hello, I am trying to run a biagram count on a 12-node cluster setup. For an input file of 135 splits (around 7.5 GB), the job fails for some of the runs. The error that I get on the jobtracker that ...
    Pramy BhatsPramy Bhats
    Sep 27, 2010 at 6:50 pm
    Sep 28, 2010 at 10:15 am
  • Is there a particularly good reason for why the "hadoop fs" command supports -cat and -tail, but not -head? Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com "I do not feel ...
    Keith WileyKeith Wiley
    Sep 27, 2010 at 7:24 am
    Sep 28, 2010 at 12:35 am
  • Hi, I just setup 3 node hadoop cluster using the latest version from website , 0.21.0 I am able to start all the daemons, when I run jps I see datanode, namenode, secondary, tasktracker, but I was ...
    Mike FranonMike Franon
    Sep 15, 2010 at 7:54 pm
    Sep 20, 2010 at 1:06 pm
  • Can anyone share their experience in doing real-time log processing using Chukwa/Scribe + Hadoop ? I am wondering how "real-time" can this be given Hadoop is designed for batch rather than stream ...
    Ricky HoRicky Ho
    Sep 6, 2010 at 5:02 am
    Sep 16, 2010 at 8:57 pm
  • If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath and available to all nodes? The reason I ask this is because I am trying to submit a ...
    Sep 10, 2010 at 6:54 pm
    Sep 11, 2010 at 2:14 am
  • Hi all, My job (written in old 0.18 api, but that's not the issue here) is producing large amounts of map output. Each map() call generates about ~20 output.collects, and each output is pretty big ...
    Oded RosenOded Rosen
    Sep 1, 2010 at 2:30 pm
    Sep 1, 2010 at 4:10 pm
  • I'm doing web crawling using nutch, which runs on hadoop in distributed mode. When the crawldb has tens of millions of urls, I have started to see strange failure in generating new segment and ...
    AJ ChenAJ Chen
    Sep 28, 2010 at 11:40 pm
    Oct 4, 2010 at 6:27 pm
  • This worked at one time, but I now I'm having an issue: I have a basic python script for testing python/hive. The script just does a few simple things: -show tables -describe [a table] -select * from ...
    Matt TanquaryMatt Tanquary
    Sep 28, 2010 at 8:57 pm
    Sep 28, 2010 at 10:37 pm
  • Hi, I continuously run a series of batch job using Hadoop Map Reduce. I also have a managing daemon that moves data around on the hdfs making way for more jobs to be run. I use capacity scheduler to ...
    Aniket rayAniket ray
    Sep 23, 2010 at 4:53 am
    Sep 24, 2010 at 3:28 pm
  • Dear Hadoopers, I am stuck at a probably very simple problem but can't figure it out. In the Hadoop Map/Reduce framework, I want to search a huge file (which is generated by another Reduce task) for ...
    Shi YuShi Yu
    Sep 22, 2010 at 9:20 pm
    Sep 23, 2010 at 11:33 am
  • Hi all, I have a Backup Node (BN) and a active NameNode (NN), but if NN failed, how to recover the NN? Is possible BN replace NN directly? Any resources? Shen
    Sep 17, 2010 at 6:59 am
    Sep 22, 2010 at 12:29 am
  • I am using hadoop 0.21 I have a reducer task wich takes more time to finish that the mapreduce.task.timeout so it's being killed: Task attempt_201009211103_0001_r_000000_0 failed to report status for ...
    Marc SturleseMarc Sturlese
    Sep 21, 2010 at 10:59 am
    Sep 21, 2010 at 3:37 pm
  • Hello, I am new to Hadoop and to this forum. Existing setup: Basically we have an existing set up where data is collected from a JMS Q and written on to hard disk without Hadoop. Typcial I/O using ...
    Chittaranjan HotaChittaranjan Hota
    Sep 17, 2010 at 6:44 pm
    Sep 18, 2010 at 3:22 am
  • Hi, the documentation<http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobClient.html#runJob(org.apache.hadoop.mapred.JobConf) says I should do this: ...
    Mark KerznerMark Kerzner
    Sep 17, 2010 at 4:54 am
    Sep 17, 2010 at 2:38 pm
  • Hi, in the core-site.xml, I have this <property <name fs.default.name</name <value hdfs://localhost:9000</value </property and then I can both see hdfs from command line, like hadoop fs -ls ...
    Mark KerznerMark Kerzner
    Sep 16, 2010 at 4:10 am
    Sep 17, 2010 at 6:30 am
  • Hi, I am running Pig jobs on Hadoop cluster. I just wanted to know whether I can run multiple jobs on hadoop cluster simultaneously. Currently when i start two jobs on hadoop they run in a serial ...
    Rahul MalviyaRahul Malviya
    Sep 13, 2010 at 9:45 pm
    Sep 13, 2010 at 10:40 pm
  • Hi, I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 (64bit Ubuntu 10.04) I have downloaded the tar.gz and can build the project - I tried to apply the patch from ...
    Lewis CrawfordLewis Crawford
    Sep 10, 2010 at 3:55 pm
    Sep 10, 2010 at 9:04 pm
  • Hi all, Can anybody Please tell me the minimum hardware configuration required for Master/Slave in 4 nodes cluster. Thanks in advance
    Adarsh SharmaAdarsh Sharma
    Sep 8, 2010 at 7:22 am
    Sep 8, 2010 at 8:35 am
  • How do I go about uploading content from a remote machine to the hadoop cluster? Do I have to first move the data to one of the nodes and then do a fs -put or is there some client I can use to just ...
    Sep 6, 2010 at 8:13 pm
    Sep 7, 2010 at 7:06 am
  • I am working with DataDrivenOutputFormat from trunk. None of the unit tests seem to test the bounded queries Configuration conf = new Configuration(); Job job = new Job(conf); ...
    Edward CaprioloEdward Capriolo
    Sep 1, 2010 at 2:33 am
    Sep 2, 2010 at 6:38 am
  • Dear all, I have set up a Hadoop cluster of 10 nodes. I want to know that how we can read/write file from HDFS (simple). Yes I know there are commands, i read the whole HDFS commands. bin/hadoop ...
    Adarsh SharmaAdarsh Sharma
    Sep 30, 2010 at 11:57 am
    Sep 30, 2010 at 10:15 pm
  • We have TB worth of XML data in .gz format where each file is about 20 MB. This dataset is not expected to change. My goal is to write a map-only job to read in one .gz file at a time and output the ...
    Steve KuoSteve Kuo
    Sep 28, 2010 at 8:08 pm
    Sep 28, 2010 at 8:13 pm
  • Many times a hadoop job produces a file per reducer and the job has many reducers. Or a map only job one output file per input file and you have many input files. Or you just have many small files ...
    Edward CaprioloEdward Capriolo
    Sep 25, 2010 at 6:41 am
    Sep 27, 2010 at 2:04 pm
  • Hi Need urgent help on using sql server with hadoop am using following code to connect to database ...
    Biju .BBiju .B
    Sep 24, 2010 at 8:38 am
    Sep 26, 2010 at 5:51 am
  • Hi, I am upgrading to 0.20 from 0.18, and right now the setup() gets called, but the map() does not. The log indicates that an input record was found - but it is not processed. 10/09/19 23:56:21 INFO ...
    Mark KerznerMark Kerzner
    Sep 20, 2010 at 4:59 am
    Sep 25, 2010 at 8:24 am
Group Navigation
period‹ prev | Sep 2010 | next ›
Group Overview
groupcommon-user @

131 users for September 2010

Mark: 22 posts Tmatthewj: 19 posts Bradford Stephens: 18 posts Edward Capriolo: 13 posts Mark Kerzner: 12 posts Steve Loughran: 11 posts Allen Wittenauer: 10 posts Ted Yu: 10 posts Jeff Zhang: 9 posts Lance Norskog: 9 posts Neil Ghosh: 8 posts Owen O'Malley: 8 posts Ranjib Dey: 8 posts ChingShen: 7 posts Harsh J: 7 posts He Chen: 7 posts Adarsh Sharma: 6 posts Matt Tanquary: 6 posts Medha Atre: 6 posts Pig: 6 posts
show more