Search Discussions

196 discussions - 706 posts

  • Hi, Is there a Hadoop utility that takes a directory and dumps the block locations for each file in that directory to a text output? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, ...
    Jun RaoJun Rao
    Jul 30, 2008 at 3:24 pm
    Nov 11, 2008 at 4:45 pm
  • Hello all. Has anybody ever tried/considered using the Bean Scripting Framework within Hadoop? BSF seems nice since it allows "two-way" communication between ruby and java. I'd love to hear your ...
    Lincoln RitterLincoln Ritter
    Jul 24, 2008 at 7:40 pm
    Jul 28, 2008 at 12:57 pm
  • Hello, I am trying to use S3 with Hadoop 0.17.0 on EC2. Using this style of configuration: <property <name fs.default.name</name <value s3://$HDFS_BUCKET</value </property <property <name ...
    Lincoln RitterLincoln Ritter
    Jul 1, 2008 at 11:35 pm
    Jul 17, 2008 at 6:53 pm
  • Hi, I was trying to parse text input with line-based information in mapper and this problem becomes an issue. I wonder if lines are preserved or broken when a file is cut into blocks by dfs. Also, it ...
    Jul 16, 2008 at 12:07 am
    Aug 7, 2008 at 9:09 pm
  • Hey all, I'm trying to chain multiple mapreduce jobs together to accomplish a complex task. I believe that the way to do it is as follows: JobConf conf = new JobConf(getConf(), MyClass.class); ...
    Mori BellamyMori Bellamy
    Jul 9, 2008 at 8:29 pm
    Jul 16, 2008 at 9:46 pm
  • Dear All, When i using Hadoop, I noticed that the reducer step is started immediately when the mappers are still running. According to my project requirement, the reducer step should not start until ...
    Jul 28, 2008 at 10:12 am
    Jul 30, 2008 at 10:51 pm
  • I have been attempting to get Hadoop metrics in Ganliga and have been unsuccessful thus far. I have see this thread ...
    Joe WilliamsJoe Williams
    Jul 23, 2008 at 8:51 pm
    Aug 1, 2008 at 10:16 am
  • Greetings, I have what I think is a pretty straight-forward, noobie question. I would like to write one file per key in the reduce (or map) phase of a mapreduce job. I have looked at the ...
    Lincoln RitterLincoln Ritter
    Jul 23, 2008 at 12:05 am
    Jul 29, 2008 at 5:00 pm
  • We have a 10 million row table exported from AS400 mainframe every day, the table is exported as a csv text file, which is about 30GB in size, then the csv file is imported into a RDBMS table which ...
    Jul 23, 2008 at 2:34 pm
    Jul 29, 2008 at 2:36 am
  • Hi all, I want to use hadoop for some streaming text processing on text documents like: <doc id=... ... ... text text text ... </doc Just xml-like notation but not real xml files. I have to work on ...
    Francesco TamberiFrancesco Tamberi
    Jul 9, 2008 at 10:27 am
    Jul 10, 2008 at 6:12 pm
  • Hello, Is it possible to have more than one output collector for one map? My input are records of html pages. I am mapping each url to its html-content and want to have two output collectors. One ...
    Khanh NguyenKhanh Nguyen
    Jul 14, 2008 at 6:20 pm
    Jul 31, 2008 at 4:21 pm
  • Hi, My source folder has a single folder and a single file inside that. /user/<user /distcpsrc/1/2 <r 3 4 2008-07-22 04:22 In the destination, it is creating the folder '1' but not the file '2'. The ...
    Murali KrishnaMurali Krishna
    Jul 22, 2008 at 11:40 am
    Jul 29, 2008 at 1:58 pm
  • I'm trying to install hadoop on our linux machine but after start-all.sh none of the slaves can connect: 2008-07-22 16:35:27,534 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: ...
    Jose VidalJose Vidal
    Jul 22, 2008 at 9:04 pm
    Jul 26, 2008 at 12:41 am
  • while i search a text in a pdf file using hadoop, the results are not coming properly. i tried to debug my program, i could see the lines red from pdf file is not formatted. please help me to resolve ...
    Jul 23, 2008 at 8:52 am
    Jul 30, 2008 at 8:48 pm
  • Hi, I am running a Hadoop DFS on a cluster of 5 data nodes with a name node and one secondary name node. I have 1788874 files and directories, 1465394 blocks = 3254268 total. Heap Size max is 3.47 ...
    Gert PfeiferGert Pfeifer
    Jul 16, 2008 at 12:33 pm
    Jul 29, 2008 at 6:48 am
  • Please let me know if you would be interested in joining NY Hadoop user group if one existed. I know about 5-6 people in New York City running Hadoop. I am sure there are many more. Let me know. If ...
    Alex DormanAlex Dorman
    Jul 18, 2008 at 1:10 pm
    Jul 23, 2008 at 3:01 am
  • I only use it to do something in parallel,but the reduce step will cost me additional several days, is it possible to make hadoop do not use a reduce step? Thanks
    Zhou, YunqingZhou, Yunqing
    Jul 21, 2008 at 9:03 am
    Jul 21, 2008 at 11:09 am
  • Hi, I have to run a small MR job while there is a bigger job already running. The first job takes around 20 hours to finish and the second 1 hour. The second job will be given a higher priority. The ...
    Murali KrishnaMurali Krishna
    Jul 16, 2008 at 1:46 pm
    Jul 17, 2008 at 3:51 am
  • I'm getting the following WARNINGs that seem to slow down my nutch processes on a 3 node and 1 frontend cluster: 2008-07-15 18:53:19,048 WARN dfs.DataNode - to transfer ...
    Jul 15, 2008 at 5:03 pm
    Jul 16, 2008 at 9:55 pm
  • Hi, I am pretty naive to hadoop. I ran a modification of wordcount on almost a TB data on single server, but found that it takes too much time. Actually i found that at a time only one core is ...
    Deepak DiwakarDeepak Diwakar
    Jul 7, 2008 at 8:29 am
    Jul 12, 2008 at 4:09 pm
  • Hello, I have been posting on the forums for a couple of weeks now, and I really appreciate all the help that I've been receiving. I am fairly new to Java, and even newer to the Hadoop framework. ...
    Jul 10, 2008 at 9:47 pm
    Jul 11, 2008 at 9:03 pm
  • I'm coming up to speed on the Hadoop APIs. I need to be able to invoke a job from within a Java application (as opposed to running from the command-line "hadoop" executable). The JobConf and ...
    Larry ComptonLarry Compton
    Jul 11, 2008 at 3:33 pm
    Jul 11, 2008 at 7:57 pm
  • It only seems like full outer or full inner joins are supported. I was hoping to just do a left outer join. Is this supported or planned? On the flip side doing the Outer Join is about 8x faster than ...
    Jason VennerJason Venner
    Jul 1, 2008 at 4:25 am
    Jul 3, 2008 at 8:29 pm
  • Hi, I have a couple of *basic* questions about Hadoop internals. 1) If I understood correctly the ideal number of Reducers is equal to number of distinct keys (or custom Partitioners) emitted from ...
    Lukas VlcekLukas Vlcek
    Jul 14, 2008 at 9:11 pm
    Oct 29, 2008 at 8:45 am
  • Hi I am running nutch on hadoop 0.17.1. I launch 5 nodes to perform crawling. When I look at the job statistics I see that only 1 reduce task is stared for all steps and hence I do a conclusion that ...
    Alexander AristovAlexander Aristov
    Jul 31, 2008 at 7:07 pm
    Jul 31, 2008 at 11:29 pm
  • HI All, Trying to run the wordcount example on single node hadoop setup. Could anyone please point me the location from where I could download hadoop-0.17.1-examples.jar? Thankyou Srilatha
    Us lathaUs latha
    Jul 30, 2008 at 1:34 pm
    Jul 31, 2008 at 8:32 am
  • We have a need to access data found in the JobTracker History link. Specifically in the "Analyse This Job" analysis. Must be run in Java, between jobs, in the same code which calls ToolRunner and ...
    Jul 28, 2008 at 2:07 am
    Jul 29, 2008 at 5:42 am
  • I'm running hadoop version 0.17.0 on a Red Hat Enterprise Linux 4.4 box. I'm using an IBM provided JDK 1.5. I've configured Hadoop for a localhost. I've written a simple test to open and write to ...
    Keith FisherKeith Fisher
    Jul 23, 2008 at 2:14 pm
    Jul 23, 2008 at 6:13 pm
  • Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop application. Hadoop 0.17.1 is running on standart ports This is the code I use: FileSystem fileSystem = null; String hdfsurl = ...
    Jul 10, 2008 at 9:34 pm
    Jul 16, 2008 at 8:07 am
  • Hi, I searched a bit but could not find the answer. What is the right way to add (and remove) new slave nodes on run time? Thank you. -Kevin
    Jul 11, 2008 at 10:44 pm
    Jul 15, 2008 at 2:13 pm
  • Hi all, I have a query regarding the functionality of combiner. Is it possible to ignore combiner code for some of the outputs of mapper and directly being sent to reducer though combiner is ...
    Novice userNovice user
    Jul 1, 2008 at 11:05 am
    Jul 7, 2008 at 5:57 pm
  • Hallo All, We have this Exception in our Logs: anybody know how i can find the problem here? -- mit freundlichen Grüßen Christian Saar ................................................ Adacor Hosting ...
    Christian SaarChristian Saar
    Jul 1, 2008 at 3:23 pm
    Jul 3, 2008 at 9:40 am
  • Hi All, I have been using hadoop archives programmatically to generate har archives from some logfiles which are being dumped into the hdfs. When the input directory to Hadoop Archiving program has ...
    Pratyush BanerjeePratyush Banerjee
    Jul 21, 2008 at 12:57 pm
    Aug 19, 2009 at 4:43 pm
  • Hi all, Could anyone suggest any efficient way to move files from one location to another on Hadoop. Please note that both the locations are on HDFS. I tried looking for inbuilt file system APIs but ...
    Rutuja JoshiRutuja Joshi
    Jul 30, 2008 at 9:06 pm
    Sep 12, 2008 at 5:29 pm
  • Hi All, I´m setting up a cluster with 4 disks per server. Is there any way to make Hadoop aware of this setup and take benefits from that? *** I´m not planning to set RAID in each node (only on the ...
    Rafael TurkRafael Turk
    Jul 30, 2008 at 1:37 am
    Jul 31, 2008 at 1:08 am
  • Just a bit of a feedback here. One of our hadoop 0.16.4 namenodes had gotten a disk full incident today. No second backup namenode was in place. Both files fsimage and edits seem to have gotten ...
    Torsten CurdtTorsten Curdt
    Jul 30, 2008 at 6:09 pm
    Jul 30, 2008 at 10:48 pm
  • Dear Hadoop Community -- I am wondering if it is already possible or in the plans to add capability for multiple master nodes. I'm in a situation where I have a master node that may potentially be in ...
    Ryan ShihRyan Shih
    Jul 29, 2008 at 5:55 pm
    Jul 30, 2008 at 9:56 pm
  • Dear All, I need use Hadoop to read all files in a given directory,I wonder how to know the path is a directory not a file and if it is how can I get all the files in the directory? Thanks Very Much.
    Jul 28, 2008 at 11:33 am
    Jul 28, 2008 at 1:33 pm
  • Did anyone try to get hadoop running on the Gnu java environment? Does that work? Cheers, Gert
    Gert PfeiferGert Pfeifer
    Jul 17, 2008 at 11:46 am
    Jul 26, 2008 at 1:09 pm
  • Hi! I'm experiencing hung reducers, with the following symptoms: Notice how it needs 6 map outputs, all map tasks have finished, and it still just hangs there. The second speculative copy of that ...
    Andreas KostyrkaAndreas Kostyrka
    Jul 24, 2008 at 8:13 am
    Jul 24, 2008 at 8:03 pm
  • Hello Again! I'm running into a NullPointerException from the following code (taken from a recordreader). None of the other variables are returning null, I've check them all. I've checked the ...
    Kylie McCormickKylie McCormick
    Jul 16, 2008 at 5:48 am
    Jul 16, 2008 at 8:29 pm
  • See attached screenshot, wonder how that could happen? Andreas
    Andreas KostyrkaAndreas Kostyrka
    Jul 9, 2008 at 12:32 am
    Jul 16, 2008 at 1:51 pm
  • Hi all I have created a search engine using lucene to search on the file system and it is working fine right now . I heard somewhere that using hadoop we can increase the performance of the search ...
    Jul 14, 2008 at 5:38 am
    Jul 15, 2008 at 5:11 am
  • HI, My requirement is to compare the contents of one very large file (GB to TB size) with a bunch of smaller files (100s of MB to GB sizes). Is there a way I can give the mapper the 1st file ...
    Muhammad Ali AmerMuhammad Ali Amer
    Jul 11, 2008 at 7:35 pm
    Jul 14, 2008 at 1:48 pm
  • Hi, Let's say I want to run a map reduce job on a series of text files (let's say x.txt y.txt and z.txt) Given the following mapper function in python (from WordCount.py): class WordCountMap(Mapper, ...
    Jul 8, 2008 at 6:15 pm
    Jul 9, 2008 at 2:21 am
  • Hi All: I've got 0.17.0 set up on a 7 node grid (6 slaves w/datanodes, 1 master running namenode). I'm trying to process a small (180G) dataset. I've done this succesfully and painlessly running ...
    C GC G
    Jul 6, 2008 at 3:32 pm
    Jul 8, 2008 at 4:44 pm
  • Hi, Using Hadoop 0.16.2, I am seeing seeing the following in the NN log: 2008-07-03 19:46:26,715 ERROR dfs.NameNode - java.io.EOFException at ...
    Otis GospodneticOtis Gospodnetic
    Jul 4, 2008 at 12:01 am
    Jul 6, 2008 at 10:49 pm
  • hey all, i've got a mapreduce task that works on small (~1G) input. when i try to run the same task on large (~100G) input, i get the following error around when the map tasks are almost done (~98%) ...
    Mori BellamyMori Bellamy
    Jul 1, 2008 at 10:21 pm
    Jul 3, 2008 at 7:56 am
  • Hi, I need to know inside my mapper, the name of the file that contains the current record. I saw that I can access the name of the input directories inside mapper.config(), but my input contains ...
    Deyaa AdranaleDeyaa Adranale
    Jul 31, 2008 at 9:51 am
    Aug 5, 2008 at 12:35 am
  • The motivation is to control the max # of mappers of a job. For example, the input data is 246MB, divided by 64M is 4. If by default there will be 4 mappers launched on the 4 blocks. What I want is ...
    Gopal GandhiGopal Gandhi
    Jul 30, 2008 at 11:08 pm
    Jul 31, 2008 at 11:55 pm
Group Navigation
period‹ prev | Jul 2008 | next ›
Group Overview
groupcommon-user @

192 users for July 2008

Andreas Kostyrka: 21 posts Jason Venner: 21 posts Raghu Angadi: 19 posts Chris Douglas: 16 posts Heyongqiang: 16 posts Shengkai Zhu: 15 posts Lincoln Ritter: 13 posts Steve Loughran: 13 posts Amar Kamat: 12 posts Kylie McCormick: 12 posts Arun C Murthy: 11 posts Arv Mistry: 11 posts Mori Bellamy: 11 posts Alejandro Abdelnur: 10 posts Goel, Ankur: 10 posts Khanh Nguyen: 10 posts Lohit: 10 posts Sandy: 10 posts Joman Chu: 9 posts Kevin: 9 posts
show more