FAQ

Search Discussions

120 discussions - 471 posts

  • According to: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29 ...
    Ted YuTed Yu
    Jan 7, 2010 at 10:17 pm
    Jan 23, 2010 at 7:01 pm
  • I've got some text files in my input directory and I want to pass each single text file (whole file not just a line) to a map (one file per one map). How can I do this ? TextInputFormat splits text ...
    StolikpStolikp
    Jan 23, 2010 at 2:49 pm
    Apr 23, 2010 at 10:00 pm
  • Dear, all, We setup 23 nodes and one master server in a cluster, but we suffered some problem of DNS. At the begin of Fetch, we found out hadoop generate so many DNS requests like a storm. If anyway ...
    Bill zhuBill zhu
    Jan 11, 2010 at 9:22 pm
    Apr 22, 2010 at 9:16 pm
  • Hi! I'm currently trying to wrap my head around the different schedulers available. Running Cloudera 0.18.3, there's both the Fair Scheduler and the Capacity Scheduler for me to play with. I have a ...
    Erik ForsbergErik Forsberg
    Jan 22, 2010 at 3:23 pm
    Feb 1, 2010 at 2:02 pm
  • Hi, As an user of hadoop, Is there anything to worry about Google obtaining the patent over mapreduce? Thanks.
    Udaya LakshmiUdaya Lakshmi
    Jan 20, 2010 at 4:18 pm
    Jan 22, 2010 at 7:38 am
  • Hi, I am trying to run Hadoop 0.19.2 under cygwin as per directions on the hadoop "quickstart" web page. I know sshd is running and I can "ssh localhost" without a password. This is from my ...
    Brian WolfBrian Wolf
    Jan 30, 2010 at 8:28 am
    Mar 18, 2010 at 12:38 am
  • Hi, I am running a MR job that requires usage of some java.awt.* classes, that can't be run in headless mode. Right now, I am running Hadoop in a single node cluster (my laptop) which has X11 server ...
    Tarandeep SinghTarandeep Singh
    Jan 17, 2010 at 9:42 pm
    Jan 19, 2010 at 9:54 pm
  • I will do that like this: at each map task, I get the input file to this mapper in the configure(), and manually read the first line of that file to get the user ID. Then start running the map ...
    Gang LuoGang Luo
    Jan 8, 2010 at 9:46 pm
    Mar 26, 2010 at 1:30 am
  • I just have a conceptual question. My understanding is that all the mappers have to complete their job for the reducers to start working because mappers dont know about each other so we need values ...
    AdeelmahmoodAdeelmahmood
    Jan 26, 2010 at 10:27 pm
    Feb 1, 2010 at 11:03 pm
  • "hadoop fs -rmr /op" That command always fails. I am trying to run sequential hadoop jobs. After the first run all subsequent runs fail while cleaning up ( aka removing the hadoop dir created by ...
    Prasenjit mukherjeePrasenjit mukherjee
    Jan 19, 2010 at 4:46 am
    Jan 19, 2010 at 8:13 am
  • Hi, I've been running some tests on some new hardware we have acquired. As a baseline, I ran the Hadoop sort[1] with 10GB and 100GB of data. As an experiment, I ran it on 4 systems (1 configured as ...
    Stephen mulcahyStephen mulcahy
    Jan 22, 2010 at 11:58 am
    Jan 29, 2010 at 12:30 pm
  • Hi, I am new to hadoop. Presently, I am reading Hadoop Streaming related documents. Anyone has sample program Hadoop Streaming using shell script used for Map/Reduce. Please help me on this. ---- ...
    Sunil KulkarniSunil Kulkarni
    Jan 21, 2010 at 6:54 am
    Jan 26, 2010 at 5:41 pm
  • Hi all.. I have searched the documentation but could not find a input file format which will give line number as the key and line as the value. Did I miss something? Can someone give me a clue of how ...
    Udaya LakshmiUdaya Lakshmi
    Jan 28, 2010 at 10:00 am
    Jan 28, 2010 at 12:21 pm
  • Hi, I am writing a second step to run after my first Hadoop job step finished. It is to pick up the results of the previous step and to do further processing on it. Therefore, I have two questions ...
    Mark KerznerMark Kerzner
    Jan 18, 2010 at 1:11 am
    Jan 18, 2010 at 11:25 am
  • Hi all I'm trying to deploy pseudo-distributed cluster on my devbox which runs under WinXP. I did following steps: 1. Installed cygwin with ssh, configured ssh 2. Downloaded hadoop and extracted it, ...
    Yura TarasYura Taras
    Jan 27, 2010 at 4:41 pm
    Jan 28, 2010 at 11:46 pm
  • Hello, Does anyone have up-to-date instructions for installing hadoop-core in a local Maven repository? The instructions at http://wiki.apache.org/hadoop/HowToContribute do not work (the mvn-install ...
    Stuart SierraStuart Sierra
    Jan 27, 2010 at 4:39 pm
    Jan 28, 2010 at 3:23 pm
  • Hi - combiner performs on a chunk of mapper output data, but what exactly is the chunk cut off, or when exactly will the chunk be fed to the combiner? 1. Will it be after the mapper finishes ...
    Le ZhaoLe Zhao
    Jan 27, 2010 at 4:57 pm
    Jan 28, 2010 at 2:13 pm
  • One of our datanodes went bye bye. We added a bunch more data nodes, but when I do a fsck i get a report that a bunch of files are only replicated on 2 server, which makes sense, because we had 3, ...
    Ananth T. SarathyAnanth T. Sarathy
    Jan 28, 2010 at 2:29 am
    Jan 28, 2010 at 3:06 am
  • Hello, I'm using Hadoop 0.20.1. I just added a new node to a 5 node cluster(for a total of 6), there is already about 500GB across 5 nodes. In order to distributed the data across the entire cluster ...
    Saptarshi GuhaSaptarshi Guha
    Jan 9, 2010 at 5:45 pm
    Jan 12, 2010 at 3:08 am
  • Hey all, I saw a special on discovery about bible code. http://en.wikipedia.org/wiki/Bible_code I am designing something in hadoop to do bible code on any text (not just the bible). I have a rough ...
    Edward CaprioloEdward Capriolo
    Jan 11, 2010 at 7:52 pm
    Jan 31, 2010 at 6:55 pm
  • I'm looking for some help. I'm Nutch user, everything was working fine, but now I get the following error when indexing. I have a single note pseudo distributed set up. Some people on the Nutch list ...
    MilleBiiMilleBii
    Jan 29, 2010 at 7:46 pm
    Jan 30, 2010 at 9:46 am
  • Hi, When framework splits a file, will it happen that some part of a line falls in one split and the other part in some other split? Or is the framework going to take care that it always splits at ...
    Udaya LakshmiUdaya Lakshmi
    Jan 29, 2010 at 3:34 am
    Jan 29, 2010 at 7:19 am
  • Hi all.. I have searched the documentation but could not find a input file format which will give line number as the key and line as the value. Did I miss something? Can someone give me a clue of how ...
    Udaya LakshmiUdaya Lakshmi
    Jan 28, 2010 at 2:52 pm
    Jan 29, 2010 at 3:30 am
  • Hi all, I'm using org.apache.hadoop.fs.FileSystem in my code for some HDFS operations. I would like to timeout and abort the operation if it takes too long (for example, copyFromLocal of a huge ...
    Yiping HanYiping Han
    Jan 22, 2010 at 9:52 am
    Jan 22, 2010 at 10:53 am
  • Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One ...
    Zhang, ZhangZhang, Zhang
    Jan 20, 2010 at 12:32 am
    Jan 20, 2010 at 9:13 pm
  • Hi.. I downloaded and installed hadoop. When i was setting up the nodes, following the instructions given in Apache hadoop's- Quickstart, i got one problem. Am not able to proceed futher. Pl help me ...
    Jayalakshmi sandhyaJayalakshmi sandhya
    Jan 15, 2010 at 9:03 am
    Jan 16, 2010 at 4:09 am
  • This is probably a question better for common-user rather than hbase. But to answer your problem, your JobTracker is able to talk to your Namenode but there's something wrong with the Datanode, your ...
    Jean-Daniel CryansJean-Daniel Cryans
    Jan 13, 2010 at 5:38 pm
    Jan 15, 2010 at 5:52 pm
  • Hi, I'm having a slight issue with my Hadoop cluster. There are 32 nodes. I have: /usr/lib/hadoop/bin/stop-mapred.sh /usr/lib/hadoop/bin/stop-dfs.sh /usr/lib/hadoop/bin/start-dfs.sh ...
    Rob StewartRob Stewart
    Jan 14, 2010 at 1:42 pm
    Jan 14, 2010 at 2:37 pm
  • Hi, Does anybody know whether sorted Mapper output will decrease the Sort in the reduce phase? I'm teaching a class, and am curious to know how much of a difference will sorted vs. unsorted mapper ...
    Le ZhaoLe Zhao
    Jan 8, 2010 at 3:21 am
    Jan 12, 2010 at 7:04 pm
  • Hi, I want a particular "section of code" to run only in any "ONE" of the mappers . So I employed the following procedure. Main-Class { public boolean flag = true; Map-Class { if(flag) { flag=false; ...
    Bharath vBharath v
    Jan 3, 2010 at 4:05 am
    Jan 4, 2010 at 1:07 pm
  • I am a newbie to hadoop so please bear with me if this is naive. I have defined a Mapper/Reducer and I desire to run it on a hadoop cluster My question is * Do I need to specify the Mapper/Reducer in ...
    VishalsantVishalsant
    Jan 22, 2010 at 9:12 pm
    Apr 28, 2010 at 3:31 pm
  • Hi all, I have a use case for collecting several rows from MySQL of compressed/unstructured data (n rows), expanding the data set, and storing the expanded results back into a MySQL DB (100,000n ...
    Nick JonesNick Jones
    Jan 28, 2010 at 7:39 pm
    Feb 1, 2010 at 6:04 pm
  • hello, all, As a newbie, I have been used to the (k1,v1,k2,v2) format parameter list for map and reduce methods in mapper and reducer(as is written in many books), but after several failures, I found ...
    Steven zhuangSteven zhuang
    Jan 28, 2010 at 1:15 pm
    Jan 30, 2010 at 11:59 am
  • Hello, I am using Hadoop 0.19.2 and DataJoin (contrib/datajoin), and I'd like to know if this is still maintained by anyone, of if there is a wiki page or something where I could get more info. I was ...
    Alex ParvulescuAlex Parvulescu
    Jan 28, 2010 at 9:00 am
    Jan 29, 2010 at 7:02 pm
  • Hi all, One of my machine fall into the black list, then I want to restart it. But exception happens as following: 2010-01-29 10:32:23,247 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: ...
    Jeff ZhangJeff Zhang
    Jan 29, 2010 at 2:56 am
    Jan 29, 2010 at 2:41 pm
  • Hi, Our application stores GBs of data in Lucene Solr index. It reads from Solr index and does some processing on the data and stores it back in Solr as index. It is stored in Solr index so that ...
    Ranganathan, SharmilaRanganathan, Sharmila
    Jan 19, 2010 at 10:16 pm
    Jan 28, 2010 at 8:37 pm
  • Hi, I wrote a program to read a file in HDFS. The codes are following: import java.io.IOException; import java.net.MalformedURLException; import java.net.URI; import ...
    ZtesoftZtesoft
    Jan 19, 2010 at 5:48 am
    Jan 26, 2010 at 7:09 am
  • Hi there, I'm using Hadoop 0.20.1 and I'm trying to use the Join application within the hadoop-*examples.jar . I can't seem to figure it out, where am I going wrong? It isn't grouping the keys ...
    Rob StewartRob Stewart
    Jan 26, 2010 at 1:44 am
    Jan 26, 2010 at 6:27 am
  • Hey folks, We're running a 100 node cluster on Hadoop 0.18.3 using Amazon Elastic MapReduce. We've been uploading data to this cluster via SCP and using hadoop fs -copyFromLocal to get it into HDFS. ...
    Ben HardyBen Hardy
    Jan 25, 2010 at 7:02 pm
    Jan 26, 2010 at 3:16 am
  • Hi, I was trying to run a mapreduce job with some jars but failed. It seems that jars specified in command line -libjars was not shipped to mapreduce worker together. After digging into the code, I ...
    Victor HsiehVictor Hsieh
    Jan 20, 2010 at 4:20 am
    Jan 22, 2010 at 2:25 am
  • Hi all, I was just looking around and I stumbled across the Eclipse plugin for Hadoop. Have any of you guys used this plug in ? Any thoughts on this ? Best Regards from Buffalo Abhishek Agrawal SUNY- ...
    Aa225Aa225
    Jan 17, 2010 at 7:49 am
    Jan 18, 2010 at 4:57 pm
  • We use dfs.exclude to point to a file containing a list of nodes with problems, and HDFS does not use those nodes. We have mapred.exclude point to the same file, but the jobtracker still allows ...
    David B. RitchDavid B. Ritch
    Jan 14, 2010 at 12:16 pm
    Jan 15, 2010 at 12:31 pm
  • I have LongWritable, IncidentWritable key-value pair as output from one job, that I want to read as input in my second job, where IncidentWritable is custom Writable(see code below). How do I read ...
    Valentina kroshilinaValentina kroshilina
    Jan 8, 2010 at 8:05 pm
    Jan 12, 2010 at 5:33 pm
  • try a wide audience... the number from Reduce output records Counter doesn't match its actually # of records in the output files. although after reran it, it did match. any idea what could be wrong? ...
    Yonggang QiaoYonggang Qiao
    Jan 5, 2010 at 9:12 pm
    Jan 6, 2010 at 5:56 am
  • Hi all, Two of my nodes are in the blacklist, and I want to reuse them again. How can I do that ? Thank you. Jeff Zhang
    Jeff ZhangJeff Zhang
    Jan 5, 2010 at 8:58 am
    Jan 5, 2010 at 9:13 am
  • I have a scheduling problem, and I'm not sure how to address it with the currently available tools. We're using Hadoop-0.20.1. For most of our needs, I like the Fair Scheduler. It allocates resources ...
    David B. RitchDavid B. Ritch
    Jan 29, 2010 at 1:24 pm
    Jan 30, 2010 at 1:42 am
  • when hadoop running multi jobs concurrently, that is when hadoop is busy, always have killed tasks in some jobs, although the jobs success finally. anybody tell me why? -- Regards Junyong
    John liJohn li
    Jan 29, 2010 at 6:53 am
    Jan 29, 2010 at 7:28 am
  • I'am trying to mount hdfs to my system using fuse-dfs hadoop:hadoop-0.20.1+152 fuse:fuse-2.8.1 os:SUSE Linux Enterprise Server 10 SP2 (x86_64) I compile fuse-dfs like this: ant compile ...
    Eason.LeeEason.Lee
    Jan 22, 2010 at 5:38 am
    Jan 22, 2010 at 6:05 am
  • Hi, mapred.output.compress is set to true in hadoop-site.xml My question is how can I specify different compression codecs programmatically ? For example, normally the output is gzip compressed. But ...
    Ted YuTed Yu
    Jan 18, 2010 at 8:54 pm
    Jan 18, 2010 at 9:12 pm
  • I need open and save image using hadoop and python, i'm tried two way to do this: 1. Using WholeFileInputFormat.class for infile in sys.stdin: data = str(infile) data = ...
    Mayra MendozaMayra Mendoza
    Jan 14, 2010 at 1:11 pm
    Jan 14, 2010 at 8:02 pm
Group Navigation
period‹ prev | Jan 2010 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions120
posts471
users148
websitehadoop.apache.org...
irc#hadoop

148 users for January 2010

Edward Capriolo: 21 posts Jeff Zhang: 19 posts Amogh Vasekar: 18 posts Ted Yu: 16 posts Todd Lipcon: 14 posts Gang Luo: 13 posts Allen Wittenauer: 12 posts Raymond Jennings III: 11 posts Ravi: 9 posts Steve Loughran: 9 posts Rob Stewart: 8 posts Alex Kozlov: 7 posts Eli Collins: 7 posts Farhan Husain: 7 posts Prasenjit mukherjee: 7 posts Rekha Joshi: 7 posts David B. Ritch: 6 posts Erik Forsberg: 6 posts Le Zhao: 6 posts Mark Kerzner: 6 posts
show more