Search Discussions

128 discussions - 430 posts

  • Hi, I've been tweaking our cluster roll-out process to refine it. While doing so, I decided to check if XFS gives any performance benefit over EXT4. As per a comment I read somewhere on the hbase ...
    Stephen mulcahyStephen mulcahy
    Apr 22, 2010 at 8:03 am
    May 11, 2010 at 5:53 pm
  • Hello, I recently setup a 5 node cluster (1 master, 4 slaves) and am looking to use it to process high volumes of patient physiologic data. As an initial exercise to gain a better understanding, I ...
    Andrew NguyenAndrew Nguyen
    Apr 12, 2010 at 4:44 pm
    Apr 13, 2010 at 6:49 pm
  • Hello guys, Can you please tell me how I can use external libraries which my jobs link to in a MapReduce job? I added the following lines in mapred-site.xml in all my nodes and put the external ...
    Farhan HusainFarhan Husain
    Apr 22, 2010 at 10:22 pm
    Apr 23, 2010 at 5:29 pm
  • Dear All, I want to test HDFS inside Amazon EC2. Two Ubuntu instances are running inside EC2. One server is namenode and jobtracker. The other server is the datanode. Cloudera (hadoop-0.20) is ...
    Christian BaunChristian Baun
    Apr 22, 2010 at 10:00 am
    Apr 26, 2010 at 9:00 pm
  • Hello all, I got the time out error as mentioned below -- after 600 seconds, that attempt was killed and the attempt would be deemed a failure. I searched around about this error, and one of the ...
    Raghava MutharajuRaghava Mutharaju
    Apr 8, 2010 at 5:31 pm
    Apr 18, 2010 at 8:25 am
  • Hi everyone I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have a 30GB file. I plan to test differenct number of mappers' performance. For example, for a wordcount job, I plan ...
    He ChenHe Chen
    Apr 22, 2010 at 4:50 pm
    Apr 22, 2010 at 8:20 pm
  • Hi, I'm commissioning a new Hadoop cluster with the following spec. 45 x data nodes: - 2 x Quad-Core AMD Opteron(tm) Processor 2378 - 16GB ram - 4 x WDC WD1002FBYS 1TB SATA drives (configured as ...
    Stephen mulcahyStephen mulcahy
    Apr 8, 2010 at 4:38 pm
    Apr 16, 2010 at 6:57 am
  • Hi folks, We write a lot of lzo-compressed files to HDFS -- some via scribe, some using internal tools. Occasionally, we discover that the created lzo files cannot be read from HDFS -- they get ...
    Dmitriy RyaboyDmitriy Ryaboy
    Apr 1, 2010 at 7:16 am
    Apr 8, 2010 at 6:28 pm
  • Hi all, I configured the Hadoop in a cluster and the NameNode and JobTracker are running ok, but the DataNode and TaskTracker Doesn't start, they stop and keep waiting when they are going to start ...
    Edson RamiroEdson Ramiro
    Apr 6, 2010 at 7:45 pm
    Apr 7, 2010 at 2:17 pm
  • I'm trying to use the distributed cache in a MapReduce job written to the new API (org.apache.hadoop.mapreduce.*). In my "Tool" class, a file path is added to the distributed cache as follows: public ...
    Larry ComptonLarry Compton
    Apr 15, 2010 at 7:57 pm
    Jul 9, 2010 at 10:20 pm
  • Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need to log statements from the running job, e.g. use Apache commons logging to print debug messages ...
    Alexander SemenovAlexander Semenov
    Apr 27, 2010 at 8:53 am
    Apr 27, 2010 at 5:03 pm
  • Hi, I'm trying to find a way to control the output file names. I need this because I have a situation where I need to run a Job and then use it's output in the DistributedCache. So far the only way ...
    Tiago VelosoTiago Veloso
    Apr 26, 2010 at 6:23 pm
    Apr 26, 2010 at 8:24 pm
  • Dear all, I have a problem here. HOD is good, and can manage a large virtual cluster on a huge physical cluster. but the problem is, it doesnt apply more than one core for each machine, and I have ...
    Song LiuSong Liu
    Apr 15, 2010 at 3:02 pm
    Apr 21, 2010 at 7:04 pm
  • I'm encountering a completely bizarre failure mode in my Hadoop cluster. A week ago, I switched from vanilla apache Hadoop 0.20.1 to CDH 2. Ever since then, my tasktracker/ datenode machines have ...
    David HowellDavid Howell
    Apr 3, 2010 at 1:17 am
    Apr 6, 2010 at 4:42 pm
  • Hi, I have recently rewritten the hadoop contrib index, udpating it to hadoop-0.20.1 (removing deprecated hadoop methods, upgrading to new hadoop API) and lucene-core 3.1-dev. I would like to know if ...
    Renaud DelbruRenaud Delbru
    Apr 20, 2010 at 3:56 pm
    Apr 22, 2010 at 12:20 pm
  • I have two clusters upgraded to CDH2. One is performing fine, and the other is EXTREMELY slow. Some jobs that formerly took 90 seconds, take 20 to 50 minutes. It is an HDFS issue from what I can ...
    Scott CareyScott Carey
    Apr 17, 2010 at 12:31 am
    Apr 17, 2010 at 5:58 pm
  • Hi all, As I realize hadoop is mainly used for tasks that take long time to execute. I'm considering to use hadoop for task whose lower bound in distributed execution is like 5 to 10 seconds. Am ...
    Aleksandar StuparAleksandar Stupar
    Apr 8, 2010 at 12:55 pm
    Apr 9, 2010 at 6:15 am
  • I am new to this group, and relatively new to hadoop. I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? As well, has ...
    James SeigelJames Seigel
    Apr 8, 2010 at 5:50 am
    Apr 8, 2010 at 8:41 pm
  • Hey all, I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs now & yet to come out of Safe Mode. Does it normally take that long ? The DataNode logs on Node running NameNode ...
    Manish NManish N
    Apr 7, 2010 at 5:17 am
    Apr 7, 2010 at 5:17 pm
  • I'm starting to evaluate Hadoop. We are currently running Sensage and store a lot of log files in our current environment. I've been looking at the Hadoop forums and googling (of course) but haven't ...
    Apr 3, 2010 at 5:45 pm
    Apr 5, 2010 at 4:56 am
  • Hi Everyone, I want to ask about Hbase and Hive. Q1 Is there any dialect available which can be used with Hibernate to create persistence with Hbase. Has somebody written one. I came across HBql at ...
    Amit KumarAmit Kumar
    Apr 30, 2010 at 3:44 pm
    May 1, 2010 at 4:47 am
  • Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member ...
    Farhan HusainFarhan Husain
    Apr 27, 2010 at 5:54 pm
    Apr 28, 2010 at 1:13 am
  • Hi I need to build Hadoop installation from the latest source code of hadoop/common; I checked out the latest source and ran ant target that makes a distribution tar (ant tar) when I try to run the ...
    Asif JanAsif Jan
    Apr 19, 2010 at 8:31 am
    Apr 20, 2010 at 8:32 am
  • Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks ...
    Abhishek sharmaAbhishek sharma
    Apr 11, 2010 at 7:45 pm
    Apr 14, 2010 at 6:21 am
  • My C++ pipes program needs to use a shared library. What are my options? Can I installed this on the cluster in a way that permits HDFS to access it from each node as needed? Can I put it in the ...
    Keith WileyKeith Wiley
    Apr 9, 2010 at 8:23 pm
    Apr 9, 2010 at 10:25 pm
  • Hi, I am trying to set up a Hadoop cluster so that any of our users can access HDFS and submit jobs and I am having trouble with this. I added a HDFS path for mapred.system.dir in mapred-site.xml as ...
    Ryan RosarioRyan Rosario
    Apr 3, 2010 at 9:37 pm
    Apr 5, 2010 at 2:42 pm
  • What is the algorithm used in "Shuffle and Sort" step?
    Dan FundatureanuDan Fundatureanu
    Apr 28, 2010 at 6:17 pm
    Apr 29, 2010 at 6:13 pm
  • Hi, I've decided to refactor some of my Hadoop jobs and implement them using MultithreadedMapper.class but I got puzzled because of some unexpected error messages at run time. Here are some relevant ...
    Jim TwenskyJim Twensky
    Apr 27, 2010 at 10:46 pm
    Apr 28, 2010 at 4:08 pm
  • Hello, Is it possible to know the unique id of a reducer inside the reduce or setup method of a reducer class? I tried to find any method of the context class which might help in this regard but ...
    Farhan HusainFarhan Husain
    Apr 26, 2010 at 11:13 pm
    Apr 27, 2010 at 4:57 pm
  • Hello, Is there any way to determine the number of reducers present in the cluster dynamically? I need to determine it when the job parameters are set up. Thanks, Farhan
    Farhan HusainFarhan Husain
    Apr 23, 2010 at 5:31 pm
    Apr 23, 2010 at 9:25 pm
  • I have set-up Hadoop on OpenSuse 11.2 VM using Virtualbox. I ran Hadoop examples in the standalone mode successfully. Now, I want to run in distributed mode using 2 nodes. Hadoop starts fine and jps ...
    Apr 17, 2010 at 10:00 am
    Apr 22, 2010 at 9:59 am
  • Hi, I have an issue where my client's code (which links to libhdfs.a) is throwing out the following errors : ipc.Clnt Retrying connect to server name-node-A/ Already tried 0 time ...
    Sridhar ChellappaSridhar Chellappa
    Apr 19, 2010 at 1:53 pm
    Apr 21, 2010 at 9:00 pm
  • Hi all, I am experimenting with a number of different tools to find the best fit for my current problem. To simplify, I have a table with 12 columns (small numbers and booleans) that gets 1M rows ...
    Colin YatesColin Yates
    Apr 16, 2010 at 9:27 pm
    Apr 19, 2010 at 7:49 pm
  • How can I have my mapper output a HashMap as the value in the OutputCollector (so my reducer can work directly on the HashMap key/value pairs)? I tried just setting things up as HashMap in ...
    M BM B
    Apr 15, 2010 at 10:04 pm
    Apr 16, 2010 at 3:11 pm
  • Hi, I'd like to implement a feed loader with Hadoop and most likely HBase. I've got around 1 million feeds, that should be loaded and checked for new entries. However the feeds have different ...
    Thomas KochThomas Koch
    Apr 12, 2010 at 8:21 am
    Apr 12, 2010 at 6:23 pm
  • So I ^C a job from the command line and get my prompt back, but sometimes the job remains on the cluster, I can see it on the admin web UI, and sometimes it lingers there for hours before finally ...
    Keith WileyKeith Wiley
    Apr 12, 2010 at 5:17 pm
    Apr 12, 2010 at 6:12 pm
  • [sorry for the double posting (to general), but I think this list is the appropriate place for this message] Hello, I'm trying to setup hadoop on demand (HOD) on my cluster. I'm currently unable to ...
    Kevin Van WorkumKevin Van Workum
    Apr 6, 2010 at 3:32 pm
    Apr 8, 2010 at 9:39 pm
  • Hi, I need a good java example to get me started with some joining we need to do, any examples would be appreciated. File A: Field1 Field2 A 12 B 13 C 22 A 24 File B: Field1 Field2 Field3 A Car ... B ...
    M BM B
    Apr 5, 2010 at 9:11 pm
    Apr 7, 2010 at 2:34 am
  • I'm seeing this error message sequence over and over again on all of my map-reducing nodes hadoop-msp-tasktracker-hadoop-<nodename .log file: 2010-04-05 14:58:52,047 INFO ...
    David SwiftDavid Swift
    Apr 5, 2010 at 7:14 pm
    Jun 4, 2010 at 12:33 am
  • I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers. The job runs properly and the output is correct, but I get only one mapper task, so I lose all ...
    Keith WileyKeith Wiley
    Apr 29, 2010 at 9:24 pm
    Apr 30, 2010 at 7:59 am
  • Hi The call for presentations for Hadoop summit has been extended until *May 10th* Leverage this extended opportunity to share your experience on Hadoop and submit abstracts here: ...
    Dekel TankelDekel Tankel
    Apr 30, 2010 at 1:14 am
    Apr 30, 2010 at 1:50 am
  • What is the sorting algorithm used at the Shuffle step?
    Dan FundatureanuDan Fundatureanu
    Apr 28, 2010 at 5:43 pm
    Apr 29, 2010 at 3:18 pm
  • Hello, Is it possible to output in Mapper.cleanup method since the Mapper.context object is still available there? Thanks, Farhan
    Farhan HusainFarhan Husain
    Apr 27, 2010 at 10:58 pm
    Apr 27, 2010 at 11:34 pm
  • Hi guys, I see the exception below when I launch a job 0/04/27 10:54:16 INFO mapred.JobClient: map 0% reduce 0% 10/04/27 10:54:22 INFO mapred.JobClient: Task Id : ...
    Apr 27, 2010 at 4:04 pm
    Apr 27, 2010 at 4:23 pm
  • Hey everyone, i deal with hadoop since a few weeks to build up a cluster with hdfs. I was looking for several Monitoring tools to observe my cluster and find a good solution with ganglia+nagios. To ...
    Patrick DatkoPatrick Datko
    Apr 23, 2010 at 12:39 pm
    Apr 26, 2010 at 6:55 pm
  • A trivial question. So, here it is: according to the hadoop documentation, "dfs.replication" defines the number of replications of a block. So, if "dfs.replication" is set to N, then that means each ...
    Jiang lichtJiang licht
    Apr 21, 2010 at 7:19 pm
    Apr 21, 2010 at 9:28 pm
  • Hi all, I intend to test Hadoop and compare its execution in two different environments. The first is a virtualized cluster, built on KVM, and the second is a real cluster. I wanna see if exists any ...
    Edson RamiroEdson Ramiro
    Apr 16, 2010 at 1:48 pm
    Apr 19, 2010 at 1:08 pm
  • How do I use a nondefault Java InputFormat/RecordReader with a Pipes program. I realize I can set: <property <name hadoop.pipes.java.recordreader</name <value true</value </property or alterntively ...
    Keith WileyKeith Wiley
    Apr 14, 2010 at 7:28 pm
    Apr 15, 2010 at 5:00 pm
  • And, I'm getting the following errors: 10/04/15 06:00:50 INFO mapred.JobClient: Task Id : attempt_201004150557_0001_m_000000_1, Status : FAILED java.io.IOException: Cannot open filename ...
    Andrew NguyenAndrew Nguyen
    Apr 15, 2010 at 6:02 am
    Apr 15, 2010 at 7:19 am
Group Navigation
period‹ prev | Apr 2010 | next ›
Group Overview
groupcommon-user @

143 users for April 2010

Todd Lipcon: 26 posts Farhan Husain: 18 posts Keith Wiley: 18 posts Allen Wittenauer: 12 posts Ted Yu: 12 posts Eric Sammer: 10 posts Raghava Mutharaju: 10 posts Steve Loughran: 10 posts Edson Ramiro: 9 posts Abhishek sharma: 8 posts Andrew Nguyen: 8 posts Scott Carey: 8 posts Stephen mulcahy: 8 posts Amareshwari Sri Ramadasu: 7 posts Brian Bockelman: 7 posts Michael Segel: 7 posts Song Liu: 7 posts Alex Kozlov: 6 posts Christian Baun: 6 posts He Chen: 6 posts
show more