Search Discussions

40 discussions - 130 posts

  • Hello guys, I would like to know how to do file uploads in HDFS using java,is it to be done using map reduce what if i have a large number of small files should i use sequence file along with map ...
    Visioner sadakVisioner sadak
    Oct 4, 2011 at 6:54 pm
    Oct 13, 2011 at 6:12 pm
  • Hi all, I am currently encountering a tough problem, my job use MultipleOutputFormat to output result into different folder, and I have to use a combiner to enhance performance. In this situation, ...
    Xin JingXin Jing
    Oct 28, 2011 at 4:36 am
    Oct 28, 2011 at 1:44 pm
  • Hi, Does streaming jar create 1 reducer by default ? We have reduce tasks per task tracker configured to be more than 1 but my job has about 150 mappers and only 1 reducer: reducer.py basically just ...
    Mapred LearnMapred Learn
    Oct 21, 2011 at 10:00 pm
    Oct 22, 2011 at 3:13 pm
  • Hi everybody, we have the following scenario: our clustered web application needs to write records to hbase, we need to support a very high throughput, we expect up to 10-30 thousends requests per ...
    Andreas ReiterAndreas Reiter
    Oct 28, 2011 at 2:08 pm
    Nov 16, 2011 at 7:40 pm
  • Hi, Hadoop tasks are always stacked to form a linear user-managed workflow (a reduce step cannot start before all previous mappers have stopped etc). This may be problematic in recursive tasks: for ...
    Yaron GonenYaron Gonen
    Oct 4, 2011 at 3:46 pm
    Oct 5, 2011 at 1:51 pm
  • Hi, found that several people have run into this issue, but I was not able to find a solution yet. We have reduce tasks that leave a hanging "child" process. The implementation uses a lot of third ...
    Henning BlohmHenning Blohm
    Oct 27, 2011 at 9:59 am
    Nov 17, 2011 at 4:18 pm
  • Hello, I have a situation where I am reading a big file from HDFS and then comparing all the data in that file with each input to the mapper. Now since my mapper is trying to read the entire HDFS ...
    Arko Provo MukherjeeArko Provo Mukherjee
    Oct 31, 2011 at 11:45 pm
    Nov 1, 2011 at 3:52 am
  • Hi, I have a situation where I have to read a large file into every mapper. Since its a large HDFS file that is needed to work on each input to the mapper, it is taking a lot of time to read the data ...
    Arko Provo MukherjeeArko Provo Mukherjee
    Oct 27, 2011 at 8:22 am
    Oct 31, 2011 at 11:41 pm
  • Hi, i am having problem implementing unsort for crawler in map/reduce. I have list of URLs waiting to fetch, they needs to be reordered for maximum distance between URLs from one domain. idea is to ...
    Radim KolarRadim Kolar
    Oct 25, 2011 at 10:46 am
    Oct 27, 2011 at 9:38 am
  • Hi. I'm currently running a Hadoop cluster on Amazon's EMR service, which appears to be the 0.20.2 codebase plus several patches from the (deprecated?) 0.20.3 branch. I'm interested in switching from ...
    Kai Ju LiuKai Ju Liu
    Oct 26, 2011 at 9:55 pm
    Oct 26, 2011 at 10:43 pm
  • We have a small cluster with HDFS running on only 8 nodes - I believe that the partition assigned to hdfs might be getting full and wonder if the web tools or java api havew a way to look at free ...
    Steve LewisSteve Lewis
    Oct 15, 2011 at 12:16 am
    Oct 17, 2011 at 3:55 pm
  • Can anyone confirm whether the skip options work for MR jobs using the new API? I have a job using the new API and I cannot get the job to skip corrupted records. I tried configuring job properties ...
    Justin WoodyJustin Woody
    Oct 12, 2011 at 4:37 pm
    Oct 14, 2011 at 12:00 pm
  • Hi, The map method in the Mapper gets as a parameter a single line from the split. Is there a way for Mappers to get the whole split as input? I'd like to scan the whole split before I decide which ...
    Yaron GonenYaron Gonen
    Oct 12, 2011 at 9:13 am
    Oct 12, 2011 at 7:14 pm
  • My map task needs to handle a large gzipped file and site at 0% forever until it hits 100% there is no way to split the file but it would be nice if there were some indication of progress - any way ...
    Steve LewisSteve Lewis
    Oct 26, 2011 at 12:56 am
    Nov 3, 2011 at 6:22 pm
  • Hi Experts I'm really interested in understanding the end to end flow,functionality,components and protocols in MRv2.Currently I don't know any thing on MRv2,so I require some document that would ...
    Bejoy KSBejoy KS
    Oct 13, 2011 at 2:24 pm
    Oct 14, 2011 at 3:18 pm
  • We are planning to enable secure Hadoop using Kerberos. Our users reside in the active directory. We read that there are two options to use Kerberos for securing Hadoop. 1) You run Kerberos on ...
    Bigbibguy fatherBigbibguy father
    Oct 1, 2011 at 2:20 am
    Oct 3, 2011 at 8:45 pm
  • Hello everybody, I am working on Terrier (www.terrier.org) an IR toolkit that leverages hadoop for indexing large amount of data (ie documents). I am working both local with a small subset of the ...
    Marco DidonnaMarco Didonna
    Oct 27, 2011 at 4:43 pm
    Oct 29, 2011 at 7:39 am
  • Hi, Could somebody point me to chained map- red example ? I m trying to run another map only job after a map red job. Thanks, JJ Sent from my iPhone
    Mapred LearnMapred Learn
    Oct 28, 2011 at 2:34 pm
    Oct 29, 2011 at 4:49 am
  • Hey, I set up a hadoop cluster on EC2 using this documentation: http://wiki.apache.org/hadoop/AmazonEC2 OS: Linux Fedora 8 Hadoop version is java version "1.7.0_01" heap size: 1Gb (stats ...
    Artem YankovArtem Yankov
    Oct 25, 2011 at 5:56 pm
    Oct 27, 2011 at 8:42 am
  • I am relatively new here and starting the CDH3u1 (on vmware). The nameserver is not coming up due to the following error: 2011-10-25 22:47:00,547 INFO org.apache.hadoop.hdfs.server.common.Storage: ...
    Stephen BoeschStephen Boesch
    Oct 26, 2011 at 6:19 am
    Oct 26, 2011 at 10:48 am
  • I have an MR task which runs well with a single input file or an input directory with dozens of 50MB input files. When the data is in a single input file of 1 GB of more the mapper never gets to 0%. ...
    Steve LewisSteve Lewis
    Oct 14, 2011 at 3:24 pm
    Oct 14, 2011 at 4:03 pm
  • Hi, all, I am trying to write an application which needs the mapper to split its output to file and reducer. For example, if a mapper produces two key-value pairs (a, 1) and (b, 2), how can I write ...
    Ke ZhaiKe Zhai
    Oct 12, 2011 at 7:01 pm
    Oct 12, 2011 at 7:10 pm
  • Hi, Is there a way to stop an entire job when a certain condition is met in the map/reduce function? Like looking for a particular key or value. Thanks, Praveen
    Praveen SripatiPraveen Sripati
    Oct 1, 2011 at 3:39 am
    Oct 1, 2011 at 5:06 am
  • While I can see file sizes with the web interface, it is very difficult to tell which directories are taking up space especially when nested by several levels -- Steven M. Lewis PhD 4221 105th Ave NE ...
    Steve LewisSteve Lewis
    Oct 26, 2011 at 12:52 am
    Oct 26, 2011 at 1:00 am
  • Hello All, I have a few questions concerning the TaskTracker's JVM re-use that I couldn't unearth some details about: Is the configured amount of tasks for reuse a suggestion or will it actually use ...
    Adam ShookAdam Shook
    Oct 25, 2011 at 5:19 pm
    Oct 25, 2011 at 7:05 pm
  • Hello Mark, Moving to mapreduce-dev@ (bcc'd common-user@). You need to control whatever calls Reporter#setProgress(…). Mostly its just the RecordReader implementation doing it via ...
    Harsh JHarsh J
    Oct 20, 2011 at 4:31 am
    Oct 20, 2011 at 4:34 am
  • hi,guys I'm debuging pipes program on mapreduce, and trying debug script to print some debug info. I used the default pipes script under src/c++/pips/debug, and put it on hdfs, create a symlink in ...
    Seven garfeeSeven garfee
    Oct 14, 2011 at 7:27 am
    Oct 15, 2011 at 3:02 pm
  • Hello Everyone, I have a particular situation, where I am trying to run Iterative Map-Reduce, where the output files for one iteration are the input files for the next. It stops when there are no new ...
    Arko Provo MukherjeeArko Provo Mukherjee
    Oct 12, 2011 at 6:55 am
    Oct 12, 2011 at 8:00 am
  • Hi, We have class hierarchy for output value for both mapper as well as reducer class as parent (abstract class) , child1,child2,… We have mapper class which is specified with its outputvalue class ...
    Anuja KulkarniAnuja Kulkarni
    Oct 4, 2011 at 4:47 pm
    Oct 5, 2011 at 9:39 am
  • Hi all, I am combing through the tasktracker logs and I found many exceptions traces in the log. I am wondering if anyone with experience can help me eliminate it. Thanks, Felix 1) Exception in ...
    Felix gaoFelix gao
    Oct 29, 2011 at 7:28 pm
    Oct 29, 2011 at 7:28 pm
  • I think my hdfs may be sick but when I run jobs on out 8 node cluster I have started seeing 11/10/26 15:42:30 WARN mapred.JobClient: Error reading task outputhttp:// ...
    Steve LewisSteve Lewis
    Oct 26, 2011 at 10:53 pm
    Oct 26, 2011 at 10:53 pm
  • Hi, I am trying to create output files of fixed size by using : -Dmapred.max.split.size=6442450812 (6 Gb) But the problem is that the input Data size and metadata varies and I have to adjust above ...
    Mapred LearnMapred Learn
    Oct 26, 2011 at 12:27 am
    Oct 26, 2011 at 12:27 am
  • I read online (I think which refers to the mapred.tasktracker.map.tasks.maximum and reduce proprerties)* *: " If you have 1 CPU with 4 cores then setting map to 3 and reduce to 3 would be good ...
    Antonio PaolacciAntonio Paolacci
    Oct 23, 2011 at 10:42 am
    Oct 23, 2011 at 10:42 am
  • Stuti, No need to copy the job jar file to every machine in the cluster. You can copy the jar file to the job tracker and execute it using the 'jar' command. If you want to submit the job from the ...
    Devaraj KDevaraj K
    Oct 18, 2011 at 11:24 am
    Oct 18, 2011 at 11:24 am
  • Moving to MR-user. You will need to make a composite value class that contains the real value and a nonce that indicates which behavior is intended for the given emit. Cheers, Anthony
    Anthony UrsoAnthony Urso
    Oct 16, 2011 at 1:12 am
    Oct 16, 2011 at 1:12 am
  • In iterated map-reduce, a series of code-identical jobs where the reduce output of one is the map input of the next, there are two synchronization barriers per iteration: one in the middle of each ...
    Mike SpreitzerMike Spreitzer
    Oct 13, 2011 at 5:12 pm
    Oct 13, 2011 at 5:12 pm
  • Hi, What is the difference between specifying the jar file using JobConf API and the 'hadoop jar' command? JobConf conf = new JobConf(getConf(), getClass()); bin/hadoop jar ...
    Praveen SripatiPraveen Sripati
    Oct 12, 2011 at 4:58 am
    Oct 12, 2011 at 4:58 am
  • Hi all, I'm trying to use Hadoop MapReduce (new api) in a particular way. What I would like to do is to make it work with a external executable not made for mapreduce (but able to read from hdfs), ...
    Stefano Alberto RussoStefano Alberto Russo
    Oct 11, 2011 at 1:08 pm
    Oct 11, 2011 at 1:08 pm
  • I have code to talk to a remote cluster where host = "myhost:" and port = 9000 String connectString = "hdfs://" + host + ":" + port + "/"; try { Configuration config = new Configuration(); ...
    Steve LewisSteve Lewis
    Oct 6, 2011 at 9:20 pm
    Oct 6, 2011 at 9:20 pm
  • I try adding following properties to conf/log4j.properties in hadoop 0.20.2 for testing to get log info in my job log4j.appender.INFO=org.apache.log4j.ConsoleAppender ...
    Thomas AndersonThomas Anderson
    Oct 1, 2011 at 9:38 am
    Oct 1, 2011 at 9:38 am
Group Navigation
period‹ prev | Oct 2011 | next ›
Group Overview
groupmapreduce-user @

52 users for October 2011

Harsh J: 13 posts Mapred Learn: 7 posts Steve Lewis: 7 posts Visioner sadak: 7 posts Arko Provo Mukherjee: 6 posts Joey Echeverria: 6 posts Brock Noland: 5 posts Justin Woody: 5 posts Yaron Gonen: 5 posts Bejoy KS: 4 posts Xin Jing: 4 posts Arun Murthy: 3 posts Kai Ju Liu: 3 posts Praveen Sripati: 3 posts Radim Kolar: 3 posts Artem Yankov: 2 posts Bigbibguy father: 2 posts Devaraj Das: 2 posts Friso van Vollenhoven: 2 posts GOEKE, MATTHEW (AG/1000): 2 posts
show more