Search Discussions

151 discussions - 579 posts

  • HI, we've setup few days ago a German UG: http://mapredit.blogspot.com/2012/03/hadoop-ug-germany.html Deutsch / german: Wir haben eine UHG gegruendet, erstmal Gruppen in XING / LinkedIn und eine ...
    Alo altAlo alt
    Mar 8, 2012 at 7:45 am
    Mar 16, 2012 at 10:05 am
  • I am doing a general poll on what are the most prevalent pain points that people run into with Hadoop? These could be performance related (memory usage, IO latencies), usage related or anything ...
    Mar 2, 2012 at 4:16 pm
    Mar 4, 2012 at 7:12 pm
  • All We have a master in one region and we are trying to start a slave datanode in another region. When executing the scripts it looks to login to the remote host, but never starts the datanode. When ...
    Ben CuthbertBen Cuthbert
    Mar 30, 2012 at 5:03 pm
    Mar 31, 2012 at 11:02 am
  • Hi, I'm new to hadoop. I'm trying to connect to hbase and accessing these records but I get this issue.. 2012-03-06 16:56:03,923 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, ...
    Mar 7, 2012 at 1:11 pm
    Mar 8, 2012 at 5:54 am
  • Is there an API that returns absolute HDFS paths? So something that would turn "/user/myname/top/sub/.." into "/user/myname/top" and "top/sub" into "/user/mynname/top/sub"? I've been looking for this ...
    W.P. McNeillW.P. McNeill
    Mar 12, 2012 at 9:20 pm
    Mar 12, 2012 at 11:02 pm
  • Is this the right procedure to add nodes? I took some from hadoop wiki FAQ: http://wiki.apache.org/hadoop/FAQ 1. Update conf/slave 2. on the slave nodes start datanode and tasktracker 3. hadoop ...
    Mohit AnchliaMohit Anchlia
    Mar 2, 2012 at 12:29 am
    Mar 2, 2012 at 2:00 am
  • Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS ...
    Ondřej KlimperaOndřej Klimpera
    Mar 29, 2012 at 3:06 pm
    Apr 2, 2012 at 11:00 am
  • Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps ...
    Leonardo UrbinaLeonardo Urbina
    Mar 7, 2012 at 7:37 pm
    Apr 18, 2012 at 8:04 pm
  • Hi, Hadoop 1.0.1 uses hadoop YARN or the tasktracker, jobtracker model? Regards, Arindam
    Arindam choudhuryArindam choudhury
    Mar 14, 2012 at 12:01 pm
    Mar 20, 2012 at 11:05 am
  • I have CDH3 installed in standalone mode. I have install all hadoop components. Now when I start services (namenode,secondary namenode,job tracker,task tracker) I can start gracefully from ...
    Manish BhogeManish Bhoge
    Mar 15, 2012 at 3:51 pm
    Mar 15, 2012 at 7:17 pm
  • i have a RawComparator that i would like to unit test (using mockito and mrunit testing packages). i want to test the method, public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) ...
    Jane WayneJane Wayne
    Mar 31, 2012 at 4:24 am
    Apr 2, 2012 at 5:23 am
  • Hi All, I am using Hadoop 0.20.2 . I am observing a Strange behavior of Java Collection's . I have following code in reducer public void reduce(Text text, Iterator<Text values, OutputCollector<Text, ...
    Madhu phatakMadhu phatak
    Mar 20, 2012 at 5:53 am
    Mar 23, 2012 at 6:43 am
  • Hadoopers!! I am going to restart hadoop cluster in order to enable rack-awareness first time. Currently we're running 0.20.203 with 500TB of data on 250+ nodes (without rack-awareness) I am thinking ...
    Patai SangbutsarakumPatai Sangbutsarakum
    Mar 20, 2012 at 8:20 pm
    Mar 22, 2012 at 5:56 pm
  • Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? Also there was a claim in our company that Pig performs better than Map Reduce jobs? ...
    Subir SSubir S
    Mar 2, 2012 at 4:48 am
    Mar 6, 2012 at 8:17 pm
  • Hi, all I'm newbie to hadoop. I'm trying to compare two large file and get the difference between them ,like the diff cmd in linux, however, the mapred api can only get one record at a time . so how ...
    Botma linBotma lin
    Mar 20, 2012 at 8:32 am
    Mar 21, 2012 at 2:53 am
  • One more observation: usually this job takes 3 to 4 minutes, however when it fails, at that particular time it takes more than 42 to 50 minutes. -Vipul
    Vipul BharakhadaVipul Bharakhada
    Mar 16, 2012 at 6:57 pm
    Mar 19, 2012 at 6:31 pm
  • Hi All, I've been trying to setup Cloudera's Ch4 Beta 1 release of MapReduce 2.0 on a small cluster for testing but i'm not having much luck getting things running. I've been following the guides on ...
    Keith StevensKeith Stevens
    Mar 12, 2012 at 2:21 am
    Mar 19, 2012 at 7:40 am
  • Hi all, are there any capacity scheduler apis that I can use? e.g. adding, removing queues, tuning properties on the fly and so on. Any help is appreciated. Thanks Harshad
    Hdev mlHdev ml
    Mar 14, 2012 at 8:52 pm
    Mar 15, 2012 at 8:41 pm
  • Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages ...
    Lac TrungLac Trung
    Mar 4, 2012 at 10:13 am
    Mar 4, 2012 at 11:08 am
  • Hi, I am new to Hadoop., i install Hadoop as per ...
    Sujit DhamaleSujit Dhamale
    Mar 6, 2012 at 5:57 pm
    Apr 5, 2012 at 5:20 pm
  • Hello, I'm developping a log file anomaly detection system on an hadoop cluster. I'm looking for a way to process query like: "select all values when value threshold for a duration 30 secondes". Do ...
    Mar 29, 2012 at 9:03 am
    Mar 31, 2012 at 4:10 pm
  • Hi guys : I notice the IRC activity is a little low. Just wondering if theres a better chat channel for hadoop other than the official one (#hadoop on freenode)? In any case... Im on there :) come ...
    Jay VyasJay Vyas
    Mar 28, 2012 at 3:06 pm
    Mar 30, 2012 at 12:22 am
  • Hi Guys, I'm starting up an region server and it stalls on initialization. I took a thread dump and found it hanging on this spot: "regionserver60020" prio=10 tid=0x00007fa90c5c4000 nid=0x4b50 in ...
    Nabib El-RahmanNabib El-Rahman
    Mar 28, 2012 at 5:31 pm
    Mar 28, 2012 at 6:09 pm
  • Hi, I'm very new to Hadoop and am working through how we may be able to apply it to our data set. One of the things that I am struggling with is understanding if it is possible to pass tell Hadoop ...
    Franc CarterFranc Carter
    Mar 27, 2012 at 6:03 am
    Mar 27, 2012 at 6:57 am
  • What is the corresponding system property for setNumTasks? Can it be used explicitly as system property like "mapred.tasks."?
    Mohit AnchliaMohit Anchlia
    Mar 14, 2012 at 3:06 pm
    Mar 22, 2012 at 3:50 pm
  • I have an algorithm that runs multiple iterations of a Hadoop job. Each iteration produces two kinds of output: stuff that is "done" and gets written out to the side and stuff that is "not-done" and ...
    W.P. McNeillW.P. McNeill
    Mar 12, 2012 at 8:29 pm
    Mar 12, 2012 at 10:04 pm
  • What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the ...
    Mohit AnchliaMohit Anchlia
    Mar 10, 2012 at 12:42 am
    Mar 10, 2012 at 6:35 am
  • Just want to check how many are using AWS mapreduce and understand the pros and cons of Amazon's MapReduce machines? Is it true that these map reduce machines are really reading and writing from S3 ...
    Mohit AnchliaMohit Anchlia
    Mar 3, 2012 at 11:54 pm
    Mar 5, 2012 at 5:29 pm
  • Hello, I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker and namenode with Hadoop and HBase and have strange performance results. The same map job runs with speed about 300 ...
    Alexander GoryunovAlexander Goryunov
    Mar 29, 2012 at 3:38 pm
    Mar 31, 2012 at 8:46 am
  • So I have a lot of small files on S3 that I need to consolidate, so headed to Google to see the best way to do it in a MapReduce job. Looks like someone's got a different idea, according to Google's ...
    Tony BurtonTony Burton
    Mar 28, 2012 at 4:40 pm
    Mar 29, 2012 at 9:30 am
  • Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE ...
    Abhishek Pratap SinghAbhishek Pratap Singh
    Mar 26, 2012 at 9:21 pm
    Mar 26, 2012 at 9:54 pm
  • I have Hadoop running on Standalone box. When I am starting deamon for namenode, secondarynamenode, job tracker, task tracker and data node, it is starting gracefully. But soon after it start job ...
    Manish BhogeManish Bhoge
    Mar 23, 2012 at 7:51 am
    Mar 26, 2012 at 8:08 pm
  • public int getPartition(IntWritable key, Chromosome value, int numOfPartitions) { int partition = key.get(); if (partition < 0 || partition = numOfPartitions) { partition = numOfPartitions-1; } ...
    Harun Raşit ERHarun Raşit ER
    Mar 25, 2012 at 3:25 pm
    Mar 26, 2012 at 9:22 am
  • i have a matrix that i am performing operations on. it is 10,000 rows by 5,000 columns. the total size of the file is just under 30 MB. my HDFS block size is set to 64 MB. from what i understand, the ...
    Jane WayneJane Wayne
    Mar 21, 2012 at 6:08 am
    Mar 21, 2012 at 5:13 pm
  • Hi, I have installed Hadoop 1.0 using .deb package. I tried to configure superuser groups but it somehow fail. I do not know what's wrong: I expect root to be able to run hadoop dfsadmin -report ...
    Olivier SallouOlivier Sallou
    Mar 19, 2012 at 5:32 pm
    Mar 20, 2012 at 8:34 am
  • When I start a job to read data from HDFS I start getting these errors. Does anyone know what this means and how to resolve it? 2012-03-15 10:41:31,402 [Thread-5] INFO ...
    Mohit AnchliaMohit Anchlia
    Mar 15, 2012 at 7:06 pm
    Mar 19, 2012 at 2:27 pm
  • Greetings All !!! I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in which 5 are used for a fully distributed cluster, 1 for pseudo-distributed & 1 as management-node. Fully ...
    Manu SManu S
    Mar 15, 2012 at 12:04 pm
    Mar 15, 2012 at 3:59 pm
  • i am wondering if hadoop always respect Job.setNumReduceTasks(int)? as i am emitting items from the mapper, i expect/desire only 1 reducer to get these items because i want to assign each key of the ...
    Jane WayneJane Wayne
    Mar 9, 2012 at 2:30 am
    Mar 14, 2012 at 11:45 pm
  • I know that by design all unmarked jobs goes to that pool, however I am doing some testing and I am interested if is possible to disable it.. Thanks
    Merto MertekMerto Mertek
    Mar 13, 2012 at 5:50 pm
    Mar 13, 2012 at 9:51 pm
  • I currently have java.opts.mapred set to 512MB and I am getting heap space errors. How should I go about debugging heap space issues?
    Mohit AnchliaMohit Anchlia
    Mar 6, 2012 at 1:04 am
    Mar 9, 2012 at 6:51 am
  • i have a Mapper and Reducer as a part of a job. all my data transformation occurs in the mapper, and there is absolutely nothing that needs to be done in the reducer. when i set the reducer on the ...
    Jane WayneJane Wayne
    Mar 8, 2012 at 4:28 am
    Mar 8, 2012 at 5:29 am
  • currently, i have my main jar and then 2 depedent jars. what i do is 1. copy dependent-1.jar to $HADOOP/lib 2. copy dependent-2.jar to $HADOOP/lib then, when i need to run my job, MyJob inside ...
    Jane WayneJane Wayne
    Mar 6, 2012 at 3:38 pm
    Mar 6, 2012 at 4:06 pm
  • Hello, My question is posted in the link below: http://stackoverflow.com/q/9708427/1269809?sem=2 Any help or feedback would be very helpful. Regards, Shailesh
    Mar 14, 2012 at 7:11 pm
    Jul 6, 2012 at 5:22 pm
  • Hi guys ! This is very strange - I have formatted my namenode (psuedo distributed mode) and now Im getting some kind of namespace error. Without further ado : here is the interesting output of my ...
    Jay VyasJay Vyas
    Mar 30, 2012 at 9:41 pm
    Mar 31, 2012 at 12:31 pm
  • if i have a hadoop cluster of 10 nodes, do i have to modify the /hadoop/conf/log4j.properties files on ALL 10 nodes to be the same? currently, i ssh into the master node to execute a job. this node ...
    Jane WayneJane Wayne
    Mar 28, 2012 at 1:42 am
    Mar 28, 2012 at 4:49 pm
  • I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper ...
    Mar 26, 2012 at 11:57 pm
    Mar 27, 2012 at 7:22 pm
  • Hi, I have this use case - I need to spawn as many mappers as the number of lines in a file in HDFS. This file isn't big (only 10-50 lines). Actually each line represents the path of another data ...
    Deepak NettemDeepak Nettem
    Mar 16, 2012 at 1:14 am
    Mar 16, 2012 at 9:32 pm
  • I have a client program that creates sequencefile, which essentially merges small files into a big file. I was wondering how is sequence file splitting the data accross nodes. When I start the ...
    Mohit AnchliaMohit Anchlia
    Mar 15, 2012 at 12:47 am
    Mar 15, 2012 at 2:59 pm
  • Hi there, I hope that you can help-me, I started a single node for test following this post ( http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ ) now I have ...
    Targino SilveiraTargino Silveira
    Mar 13, 2012 at 1:54 am
    Mar 13, 2012 at 8:36 pm
  • I installed hbase in my hadoop cluster. when i started hbase and i got an exception about ClockOutOfSyncException, somthing like this: org.apache.hadoop.hbase.ClockOutOfSyncException: ...
    Mar 8, 2012 at 6:50 am
    Mar 13, 2012 at 6:48 am
Group Navigation
period‹ prev | Mar 2012 | next ›
Group Overview
groupcommon-user @

176 users for March 2012

Harsh J: 52 posts Mohit Anchlia: 31 posts Bejoy KS: 22 posts Jane Wayne: 21 posts Madhu phatak: 19 posts Joey Echeverria: 13 posts Masoud: 12 posts W.P. McNeill: 9 posts Austin Chungath: 8 posts Deepak Nettem: 8 posts Keith Wiley: 8 posts Michel Segel: 8 posts Tousif: 8 posts Manu S: 7 posts Anil gupta: 6 posts Arun C Murthy: 6 posts Ben Cuthbert: 6 posts Jay Vyas: 6 posts Jie Li: 6 posts Merto Mertek: 6 posts
show more