Search Discussions

176 discussions - 728 posts

  • Hi, For a few days I'm trying to make hadoop work with the Ganglia monitoring software. I'm using hadoop 0.18.3 with ganglia 3.0.6, I've changed the hadoop-metrics file as described in the wiki and ...
    Tamir KamaraTamir Kamara
    Mar 17, 2009 at 1:49 pm
    Mar 19, 2009 at 1:52 pm
  • This is somewhat of a noob question I know, but after learning about Hadoop, testing it in a small cluster and running Map Reduce jobs on it, I'm still not sure if Hadoop is the right distributed ...
    Phil cryerPhil cryer
    Mar 26, 2009 at 4:29 pm
    Apr 6, 2009 at 2:20 pm
  • Hi all, I'm trying to write a JUnit test case that extends ClusterMapReduceTestCase to test some code I've written to ease job submission and monitoring between some existing code. Unfortunately, I ...
    Brian ForneyBrian Forney
    Mar 10, 2009 at 6:09 pm
    Apr 15, 2009 at 7:22 pm
  • I'd like to implement some coordination between Mapper tasks running on the same node. I was thinking of using ZooKeeper to provide this coordination. I think I remember hearing that MapReduce and/or ...
    Stuart WhiteStuart White
    Mar 18, 2009 at 5:27 pm
    Mar 28, 2009 at 3:48 pm
  • Hello all, For the sake of benchmarking, I ran the standard hadoop wordcount example on an input file using 2, 4, and 8 mappers and reducers for my job. In other words, I do: time -p bin/hadoop jar ...
    Mar 4, 2009 at 10:46 pm
    Mar 11, 2009 at 3:36 pm
  • Hi, How do I allow multiple nodes to write to the same index file in HDFS? Thank you, Mark
    Mark KerznerMark Kerzner
    Mar 13, 2009 at 4:38 am
    Oct 7, 2009 at 4:10 pm
  • Hi! Whatever code I run on hadoop, reduce starts a few seconds after map finishes. And worse, when I run 10 jobs parallely (using threads and sending one after another) all maps finish sequentially, ...
    Rasit OZDASRasit OZDAS
    Mar 1, 2009 at 5:24 pm
    Mar 24, 2009 at 4:16 pm
  • Has anyone explored using HDFS/HBase as the underlying storage for an RDF store? Most solutions (all are single node) that I have found till now scale up only to a couple of billion rows in the ...
    Amandeep KhuranaAmandeep Khurana
    Mar 23, 2009 at 11:07 pm
    Mar 24, 2009 at 1:51 pm
  • I am running a large streaming job that processes that about 3TB of data I am seeing large jumps in hard drive space usage in the reduce part of the jobs I tracked the problem down. The job is set to ...
    Billy PearsonBilly Pearson
    Mar 17, 2009 at 5:13 am
    Mar 20, 2009 at 6:55 am
  • Hi all, I'm conducting some initial tests with Hadoop to better understand how well it will handle and scale with some of our specific problems. As a result, I've written some M/R jobs that are ...
    Sean LaurentSean Laurent
    Mar 3, 2009 at 12:47 am
    Mar 4, 2009 at 9:36 pm
  • Is it possible to output multiple key value pairs from a single map function run? For example, the mapper outputing <name,phone and <name, address simultaneously... Can I write multiple ...
    Amandeep KhuranaAmandeep Khurana
    Mar 27, 2009 at 11:35 am
    May 22, 2009 at 5:43 am
  • Hi, I am SreeDeepya doing MTech in IIIT.I am working on a project named cost effective and scalable storage server.I configured a small hadoop cluster with only two nodes one namenode and one ...
    Mar 29, 2009 at 5:29 am
    Apr 21, 2009 at 5:06 am
  • Hi, We need to implement a Join with a between operator instead of an equal. What we are trying to do is search a file for a key where the key falls between two fields in the search file like this: ...
    Tamir KamaraTamir Kamara
    Mar 24, 2009 at 11:33 am
    Apr 2, 2009 at 2:48 pm
  • Dear developers, Is there any detailed example of how Hadoop processes input? Article http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.htmlgives a good idea, but I want to see input data ...
    Mar 31, 2009 at 11:02 pm
    Apr 1, 2009 at 8:22 am
  • I am currently working on a RecordReader to read a custom time series data binary file format and was wondering about ways to be most efficient in designing the InputFormat/RecordReader process. ...
    Patterson, JoshPatterson, Josh
    Mar 17, 2009 at 8:39 pm
    Mar 18, 2009 at 5:48 pm
  • Hi! I have a question about fine-tunining hadoop performance on 8-core machines. I have 2 machines I am testing. One is 8-core Xeon and another is 8-core Opteron. 16Gb RAM each. They both run ...
    Vadim ZalivaVadim Zaliva
    Mar 11, 2009 at 5:16 pm
    Mar 16, 2009 at 9:43 pm
  • Hello, I'd like to invite you to take a look at the recently released first beta of Hadoop UI, a graphical Flex/Java based client for Hadoop Core. Hadoo UI currently includes a HDFS file explorer and ...
    Stefan PodkowinskiStefan Podkowinski
    Mar 31, 2009 at 11:12 am
    Apr 22, 2009 at 5:37 pm
  • Hi all, I am using hbase-0.19.1 and hadoop-0.19. My cluster have 5+1 nodes, and there are about 512 regions in HBase (256MB per region). But I found the blocks in HDFS is very unbalanced. Following ...
    Schubert zhangSchubert zhang
    Mar 25, 2009 at 9:12 am
    Mar 27, 2009 at 7:51 am
  • Hi, We are running a website with quiet a lot of traffic. At the moment we are using about 20 sql servers and about 60 application servers/file servers. We are thinking of porting everything to ...
    Mar 17, 2009 at 4:39 am
    Mar 23, 2009 at 12:41 am
  • There is an established procedure for upgrading from one release of Hadoop to a newer release. Is there something similar to move back to an lower-numered release? Specifically, we have data in a ...
    David RitchDavid Ritch
    Mar 18, 2009 at 5:08 pm
    Mar 20, 2009 at 5:18 pm
  • Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for whole Job or they matter for each node independently? For example, if one of them (or both) contains: <property <name ...
    Mar 8, 2009 at 3:57 am
    Mar 9, 2009 at 7:41 pm
  • Hi, We have just released 1.2.1 version of CloudBase on sourceforge- http://cloudbase.sourceforge.net [ CloudBase is a data warehouse system built on top of Hadoop's Map-Reduce architecture. It uses ...
    Tarandeep SinghTarandeep Singh
    Mar 2, 2009 at 7:34 pm
    Mar 3, 2009 at 8:03 pm
  • Hi, I have been exploring the feasibility of using Hadoop/HDFS to analyze terabyte-scale scientific simulation output datasets. After a set of initial experiments, I have a number of questions ...
    Tu, TiankaiTu, Tiankai
    Mar 28, 2009 at 11:10 pm
    Apr 6, 2009 at 2:10 pm
  • What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front ...
    Amandeep KhuranaAmandeep Khurana
    Mar 28, 2009 at 5:07 am
    Mar 31, 2009 at 10:52 am
  • Normally I dislike writing about problems without being able to provide some more information, but unfortunately in this case I just can't find anything. Here is the situation - DFS cluster running ...
    Igor BolotinIgor Bolotin
    Mar 5, 2009 at 6:11 pm
    Mar 18, 2009 at 5:11 am
  • I have large number of key,value pairs. I don't actually care if data goes in value or key. Let me be more exact. (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each pair. I can ...
    Mar 11, 2009 at 2:44 am
    Mar 12, 2009 at 1:32 pm
  • If this is not the correct place to ask Hadoop + EC2 questions please let me know. I am trying to get a handle on how to use Hadoop on EC2 before committing any money to it. My question is, how do I ...
    Malcolm MatalkaMalcolm Matalka
    Mar 11, 2009 at 1:31 pm
    Mar 12, 2009 at 12:04 pm
  • Hi - I'm not sure yet, but I think I might be hitting a race condition in Hadoop 18.3. What seems to happen is that in the reduce phase, some of my tasks perform speculative execution but when the ...
    Ryan ShihRyan Shih
    Mar 2, 2009 at 7:01 pm
    Mar 3, 2009 at 3:29 am
  • Hi, I am doing a project scalable storage server to store images.Can Hadoop efficiently support this purpose???Our image size will be around 250 to 300 KB each.But we have many such images.Like the ...
    Mar 29, 2009 at 10:50 am
    Apr 2, 2009 at 2:10 pm
  • Recently we are seeing lot of Socket closed exception in our cluster. Many task's open/create/getFileInfo calls get back 'SocketException' with message 'Socket closed'. We seem to see many tasks fail ...
    Mar 29, 2009 at 6:56 pm
    Apr 1, 2009 at 11:00 pm
  • Hi, Can anyone tell me if Hadoop is appropriate for the following application. I need to perform optimization using a single, small input data set. To get a good result I must make many independent ...
    John BergstromJohn Bergstrom
    Mar 19, 2009 at 6:49 pm
    Mar 20, 2009 at 11:52 pm
  • I need to append file A to file B in HDFS without downloading/uploading them to local disk. Is there a way?
    Steve GaoSteve Gao
    Mar 17, 2009 at 11:22 pm
    Mar 18, 2009 at 12:28 am
  • I was trying to put a 1 gig file onto HDFS and I got the following error: 09/03/10 18:23:16 WARN hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 5000 millis timeout while ...
    Amandeep KhuranaAmandeep Khurana
    Mar 11, 2009 at 1:27 am
    Mar 11, 2009 at 8:16 pm
  • I've been running hadoop-0.19.0 for several weeks successfully. Today, for the first time, I tried to run the balancer, and I'm receiving: java.lang.RuntimeException: Not a host:port pair: ...
    Stuart WhiteStuart White
    Mar 11, 2009 at 4:43 pm
    Mar 11, 2009 at 7:14 pm
  • Hello, it seems the HDFS in my cluster is corrupt. This is the output from hadoop fsck: Total size: 9196815693 B Total dirs: 17 Total files: 157 Total blocks: 157 (avg. block size 58578443 B) ...
    Mayuran YogarajahMayuran Yogarajah
    Mar 10, 2009 at 12:21 am
    Mar 11, 2009 at 7:10 pm
  • Hi folks, I've recently upgraded to Hadoop 0.19.1 from a much, much older version of Hadoop. Most things in my application (a highly modified version of Nutch) are working just fine, but one of them ...
    Doug CookDoug Cook
    Mar 9, 2009 at 3:45 pm
    Mar 10, 2009 at 2:01 am
  • On a ~100 node cluster running HDFS (we just use HDFS + fuse, no job/ task trackers) I've noticed many datanodes get 'stuck'. The nodes themselves seem fine with no network/memory problems, but in ...
    Garhan AtteburyGarhan Attebury
    Mar 9, 2009 at 3:17 pm
    Mar 9, 2009 at 8:22 pm
  • Hadoop streaming question: If I am forming a matrix M by summing a number of elements generated on different mappers, is it better to emit tons of lines from the mappers with small key,value pairs ...
    Peter SkomorochPeter Skomoroch
    Mar 28, 2009 at 8:51 am
    Apr 8, 2009 at 1:00 am
  • I am using Hadoop - HBase 0.18 and my eclipse supports hadoop-0.18.0-eclipse-plugin. When I switch to Hadoop 0.19.1 and use hadoop-0.19.0-eclipse-plugin then my eclipse doesn't show mapreduce ...
    Puri, AseemPuri, Aseem
    Mar 18, 2009 at 3:32 pm
    Apr 1, 2009 at 3:19 pm
  • Hi, I do a test about the datanode crash. I stop the networking on one of the datanode. The Web app and fsck report that datanode dead after 10 mins. But dfsadmin -report are not report that over 25 ...
    Mar 25, 2009 at 10:01 am
    Mar 27, 2009 at 5:46 pm
  • I am seeing on one of my long running jobs about 50-60 hours that after 24 hours all active reduce task fail with the error messages java.io.IOException: Task process exit with nonzero status of 255. ...
    Billy PearsonBilly Pearson
    Mar 26, 2009 at 2:24 am
    Mar 27, 2009 at 5:44 am
  • Hey all, In looking at the stats for a number of our jobs, the amount of data that the UI claims we've read from or written to HDFS is vastly larger than the amount of data that should be involved in ...
    Bryan DuxburyBryan Duxbury
    Mar 18, 2009 at 12:27 am
    Mar 18, 2009 at 5:12 pm
  • This is slightly off-topic, and I realize this question is not specific to Hadoop, but what is the best way to search the mailing list archives? Here's where I'm looking: ...
    Stuart WhiteStuart White
    Mar 8, 2009 at 8:10 pm
    Mar 9, 2009 at 12:10 am
  • When we try to mount the dfs from fuse we are getting the following errors. Has anyone seen this issues in the past? This is on version 0.19.0 [root@socdvmhdfs1]# fuse_dfs dfs://socdvmhdfs1:9000 ...
    Hyatt, Matthew GHyatt, Matthew G
    Mar 2, 2009 at 10:19 pm
    Mar 3, 2009 at 8:34 pm
  • java.lang.ArrayIndexOutOfBoundsException: 4096 at org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:129) at ...
    Nick CenNick Cen
    Mar 2, 2009 at 1:39 am
    Mar 2, 2009 at 11:40 am
  • Hello, I'm using some JNI interfaces, via a R. My classpath contains all the jar files in $HADOOP_HOME and $HADOOP_HOME/lib My class is public SeqKeyList() throws Exception { config = new ...
    Saptarshi GuhaSaptarshi Guha
    Mar 24, 2009 at 2:20 am
    Mar 30, 2009 at 11:24 am
  • Hi, By default Hadoop does ASCII sort the mapper's output, not numeric sort. However, I often want the framework to sort records in numeric order. Can I make the framework to do numeric sort? (I use ...
    Akira KitadaAkira Kitada
    Mar 21, 2009 at 10:41 pm
    Mar 23, 2009 at 6:25 pm
  • Hi, I am trying to find a way to change key-value field separator of streaming. Streaming documentation says it can be configured with "stream.map.output.field.separator" and I tried but it had no ...
    Akira KitadaAkira Kitada
    Mar 21, 2009 at 11:46 am
    Mar 21, 2009 at 8:10 pm
  • I want to apply this patch https://issues.apache.org/jira/browse/HADOOP-1700 to my hadoop 0.17.0 . Would anybody tell me how to do it? Thanks!
    Steve GaoSteve Gao
    Mar 17, 2009 at 11:49 pm
    Mar 18, 2009 at 3:12 am
  • Hey all Have some users reporting intermittent spawning of Reducers when the job.xml shows mapred.reduce.tasks=0 in 0.19.0 and .1. This is also confirmed when jobConf is queried in the (supposedly ...
    Chris K WenselChris K Wensel
    Mar 12, 2009 at 5:12 pm
    Mar 16, 2009 at 3:49 am
Group Navigation
period‹ prev | Mar 2009 | next ›
Group Overview
groupcommon-user @

195 users for March 2009

Jason hadoop: 24 posts Aaron Kimball: 20 posts Brian Bockelman: 20 posts Nick Cen: 20 posts Amandeep Khurana: 18 posts Raghu Angadi: 18 posts Rasit OZDAS: 18 posts Steve Loughran: 16 posts Tamir Kamara: 15 posts Owen O'Malley: 14 posts Richa Khandelwal: 14 posts Billy Pearson: 13 posts Schubert zhang: 12 posts Saptarshi Guha: 11 posts Stuart White: 11 posts Amareshwari Sriramadasu: 10 posts Mark Kerzner: 10 posts Tim robertson: 10 posts Patterson, Josh: 9 posts Scott Carey: 9 posts
show more