Search Discussions

96 discussions - 443 posts

  • Does anyone have any thoughts/experiences on running Hadoop in AWS? What are some pros/cons? Are there any good AMI's out there for this? Thanks for any advice.
    Dec 9, 2010 at 4:18 pm
    Dec 29, 2010 at 4:32 pm
  • Hi, I am trying to write a MapReduce code to find friends of friends in a social network with MapReduce. my data snippet : 1 41 1 7 1 100 2 64 2 65 2 42 2 86 3 54 3 24 3 16 3 43 4 39 4 52 Here map() ...
    Praveen BathalaPraveen Bathala
    Dec 20, 2010 at 2:29 am
    Dec 22, 2010 at 4:58 pm
  • Hello, Is the implementation of Hadoop documented somewhere? especially that part where the output of mappers is partitioned, sorted and spilled to the disk. I tried to understand it, but it's rather ...
    Da ZhengDa Zheng
    Dec 29, 2010 at 6:21 pm
    Jan 3, 2011 at 5:05 pm
  • Hi, I'm planning to crawl a certain web site every 30 minutes. How would I get it done in Hadoop? In pure Java, I used Thread.sleep() method, but I guess this won't work in Hadoop. Or if it could ...
    Edward choiEdward choi
    Dec 7, 2010 at 8:55 am
    Dec 28, 2010 at 11:50 pm
  • Hi, I'm trying to crawl numerous news sites. My plan is to make a file containing a list of all the news rss feed urls, and the path to save the crawled news article. So it would be like this: ...
    Edward choiEdward choi
    Dec 10, 2010 at 7:28 am
    Dec 12, 2010 at 3:10 am
  • Hi, I am a newbie of hadoop. Today I was struggling with a hadoop problem for several hours. I initialize a parameter by setting job configuration in main. E.g. Configuration con = new ...
    Peng, WeiPeng, Wei
    Dec 17, 2010 at 6:58 am
    Dec 17, 2010 at 10:34 pm
  • Hi All, I'm browsing the RPC code since quite a while now trying to find any entry point / interceptor slot that allows me to handle a RPC call response writable after it was send over the wire. Does ...
    Stefan GroschupfStefan Groschupf
    Dec 28, 2010 at 4:08 am
    Dec 29, 2010 at 6:48 am
  • You can do whatever your want (including spawning threads) in the Mapper process (which is fork/exec by the TaskTracker). But this doesn't help I think you need to understand the fundamental ...
    Ricky HoRicky Ho
    Dec 22, 2010 at 6:46 pm
    Dec 27, 2010 at 2:34 am
  • Hi all,. Is there any valid Hadoop Certification available ? Something which adds credibility to your Hadoop expertise. Matthew
    Matthew JohnMatthew John
    Dec 9, 2010 at 3:41 am
    Dec 15, 2010 at 8:19 pm
  • Folks, I'm a Hadoop newbie, and I hope this is an appropriate place to post this question. I'm trying to work through the initial examples. When I try to copy files into HDFS, hadoop throws ...
    Sanford RockowitzSanford Rockowitz
    Dec 12, 2010 at 6:42 am
    Dec 14, 2010 at 5:55 am
  • Hi, I have a problem with a MapReduce job I am trying to run on a 32 node cluster. The final few reducers take a *lot* longer than the rest. e.g. If I specify 100 reducers, the first 90 will complete ...
    Rob StewartRob Stewart
    Dec 11, 2010 at 11:05 am
    Dec 11, 2010 at 6:10 pm
  • Hi, guys, I see that there is MountableHDFS<http://wiki.apache.org/hadoop/MountableHDFS , and I know that it works, but my questions are as follows: - How reliable is it for large storage?; - Is it ...
    Mark KerznerMark Kerzner
    Dec 2, 2010 at 3:02 am
    Dec 7, 2010 at 3:19 am
  • Hi, all Whether Hadoop supports the map function running different code? If yes, how to realize this? Thanks in advance! -- Regards, Jander
    Jander gJander g
    Dec 28, 2010 at 10:54 pm
    Dec 29, 2010 at 6:47 am
  • Hi, I process this command: ./hadoop jar /home/userme/hd.jar org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output and get this why? Because I have org.postdirekt.hadoop.Map in the jar File. ...
    Cavus,M.,Fa. Post DirektCavus,M.,Fa. Post Direkt
    Dec 28, 2010 at 2:58 pm
    Dec 29, 2010 at 8:36 am
  • Hi everyone, I would like to simulate network delay on 1 node in my cluster, perhaps by putting the thread to sleep every time it transfers data non-locally. I'm looking at the source but am not sure ...
    Dec 26, 2010 at 12:26 pm
    Dec 27, 2010 at 5:10 am
  • Excuse me for asking a general Java question here. I tried to find Java mailing list from Google but none of them were active. There is a problem that's been driving me crazy for a while. I am trying ...
    Edward choiEdward choi
    Dec 9, 2010 at 11:05 am
    Dec 16, 2010 at 6:15 am
  • I think I am seeing a behavior in which if a mapper task fails (crashes) on one input key/value, the entire task is rescheduled and rerun, starting over again from the first input key/value even if ...
    Keith WileyKeith Wiley
    Dec 14, 2010 at 12:51 am
    Dec 14, 2010 at 6:05 pm
  • Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: ...
    Richard ZhangRichard Zhang
    Dec 8, 2010 at 9:38 pm
    Dec 9, 2010 at 3:22 am
  • There is a proper decommissioning process to remove dead nodes. See the FAQ link here: http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_ ...
    Sudhir VallamkonduSudhir Vallamkondu
    Dec 8, 2010 at 3:55 am
    Dec 8, 2010 at 7:06 am
  • Hi all, Why the following lines would work in the main class (WordCount) and not in Mapper ? even though " myconf " is set in WordCount to point to the getConf() returned object. try{ FileSystem hdfs ...
    Dec 16, 2010 at 9:06 pm
    Dec 20, 2010 at 8:18 am
  • Dear all, Did anyone encounter the below error while running job in Hadoop. It occurs in the reduce phase of the job. attempt_201012061426_0001_m_000292_0: ...
    Adarsh SharmaAdarsh Sharma
    Dec 8, 2010 at 12:14 pm
    Dec 9, 2010 at 8:52 am
  • Hi I have a very large file of size 1.4 GB. Each line of the file is a number . I want to find the sum all those numbers. I wanted to use NLineInputFormat as a InputFormat but it sends only one line ...
    Madhu phatakMadhu phatak
    Dec 17, 2010 at 3:59 pm
    Dec 20, 2010 at 11:01 am
  • Hello everyone, I 've got a problem when I write some Jcuda program based on Hadoop MapReduce. I use the jcudaUtill. The KernelLauncherSample can be successfully executed on my worker node. However, ...
    He ChenHe Chen
    Dec 9, 2010 at 10:01 pm
    Dec 10, 2010 at 12:36 am
  • I would like to use Hadoop's Log4j infrastructure to do logging from my map/reduce application. I think I've got everything set up correctly, but I am still unable to specify the logging level I ...
    W.P. McNeillW.P. McNeill
    Dec 13, 2010 at 6:06 pm
    Dec 23, 2010 at 10:36 pm
  • If you would like MR-1938 patch (see link below), "Ability for having user's classes take precedence over the system classes for tasks' classpath", to be included in CDH3b4 release, please put in a ...
    Roger SmithRoger Smith
    Dec 15, 2010 at 6:44 pm
    Dec 15, 2010 at 7:01 pm
  • Hi, When trying to compare Hadoop against other parallel paradigms, it is important to consider heterogeneous systems. Some may have 100 nodes, each single core. Some may have 100 nodes, with 8 cores ...
    Rob StewartRob Stewart
    Dec 11, 2010 at 11:10 am
    Dec 13, 2010 at 9:04 pm
  • Hi all, I extended my project path with the hadoop-0.20.2-core.jar file, but I can see that some of the classes I need aren't there, so for example an error I get: " The type ...
    Maha A. AlabduljalilMaha A. Alabduljalil
    Dec 12, 2010 at 12:00 am
    Dec 12, 2010 at 7:20 pm
  • Hi, I am running Mapreduce job to get some emails out of a huge text file. I used to use hadoop 0.19 version and I had no issues, now I am using the hadoop 0.20.2 and when I run my hadoop mapreduce ...
    Praveen BathalaPraveen Bathala
    Dec 10, 2010 at 2:10 am
    Dec 12, 2010 at 2:10 pm
  • Dear all, I am facing below problem while running Hadoop on VM's. I am using hadoop0-.20.2 with JDK6 My jobtracker log says that :-2010-12-06 15:16:06,618 INFO org.apache.hadoop.mapred.JobTracker: ...
    Adarsh SharmaAdarsh Sharma
    Dec 6, 2010 at 11:02 am
    Dec 7, 2010 at 6:20 am
  • I am using MultipleOutputs to split a mapper input into about 20 different files. Adding this split has had an extremely adverse effect on performance. Is MultipleOutputs known for performing slowly? ...
    Matt TanquaryMatt Tanquary
    Dec 2, 2010 at 5:10 pm
    Jan 13, 2011 at 10:08 pm
  • Hi all, Got to know about a hdfs with raid implementation from the following documentation : http://wiki.apache.org/hadoop/HDFS-RAID In the documentation, it says u can find the hadoop-*-raid.jar ...
    Matthew JohnMatthew John
    Dec 22, 2010 at 4:20 pm
    Jan 11, 2011 at 3:39 am
  • Is setting dfs.replication to 1 sufficient to stop replication? How do I verify that? I have a pseudo cluster running 0.21.0. It seems that the hdfs disk consumption triples the amount of data ...
    Jane ChenJane Chen
    Dec 29, 2010 at 12:22 am
    Dec 30, 2010 at 1:10 am
  • I have a loop that runs over a large number of iterations (order of 100,000) very quickly. It is nice to do context.setStatus() with an indication of where I am in the loop. Currently I'm only ...
    W.P. McNeillW.P. McNeill
    Dec 23, 2010 at 7:15 pm
    Dec 23, 2010 at 9:14 pm
  • I know there is a configuration parameter that can be used to specify number of replicas. I wonder whether I can specify different values for some files in my program by using HDFS APIs. Thanks Gerald
    Zhenhua GuoZhenhua Guo
    Dec 22, 2010 at 8:41 am
    Dec 23, 2010 at 4:33 pm
  • Hi everyone, Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed to put each file from the input directory in a SEPARATE split. So the number of Maps is equal to the number ...
    Dec 15, 2010 at 10:14 am
    Dec 16, 2010 at 2:53 am
  • I've got a smallish cluster of 12 nodes up from 6, that we're using to dip our feet into hadoop. One of my users has a few directories in his HDFS home which he was using to test, and which exist, ...
    Seth LepzelterSeth Lepzelter
    Dec 13, 2010 at 4:52 pm
    Dec 14, 2010 at 5:33 pm
  • Hi all, I have been working with Hadoop0.20.2 in linux nodes. Now I want to try the same version with eclipse on a windows xp machine. Could someone provide a tutorial/guidelines on how to install ...
    Matthew JohnMatthew John
    Dec 14, 2010 at 3:53 am
    Dec 14, 2010 at 4:59 pm
  • Hi, "hadoop" user has some advantages for running Hadoop. For example, if HDFS is mounted as a local file system, then only user "hadoop" has write/delete permissions. Can this privilege be given to ...
    Mark KerznerMark Kerzner
    Dec 9, 2010 at 5:34 am
    Dec 9, 2010 at 12:43 pm
  • hello there, im trying to compile libhdfs in order but there are some problems. According to http://wiki.apache.org/hadoop/MountableHDFS i have already installes fuse. With ant compile-c++-libhdfs ...
    Petrucci AndreasPetrucci Andreas
    Dec 7, 2010 at 7:47 pm
    Dec 8, 2010 at 12:07 am
  • We are happy to announce that Cascading 1.2 is now publicly available for download. http://www.cascading.org/2010/12/cascading-12-now-available.html This release features many performance and ...
    Chris K WenselChris K Wensel
    Dec 1, 2010 at 10:43 pm
    Dec 1, 2010 at 10:53 pm
  • Hi, (1) I declared a global variable in my hadoop mainClass which gets initialized in the 'run' function of this mainClass. When I try to access this global static variable from the MapperClass, it ...
    Dec 31, 2010 at 1:28 am
    Dec 31, 2010 at 4:59 am
  • I wrote a script to map the IP's to a rack. The script is as follows. : for i in $* ; do topo=`echo $i | cut -d"." -f1,2,3 | sed 's/\./-/g'` topo=/rack-$topo" " final=$final$topo done echo $final I ...
    Rajgopal VaithiyanathanRajgopal Vaithiyanathan
    Dec 28, 2010 at 8:52 pm
    Dec 29, 2010 at 3:20 pm
  • 1) On each server, install the core HBase RPMs: hbase, hbase-native, hbase-master, hbase-regionserver, hbase-zookeeper, hbase-conf-pseudo, hbase-docs. *I do this: yum list | grep cloudera | grep ...
    Mark KerznerMark Kerzner
    Dec 28, 2010 at 6:13 pm
    Dec 28, 2010 at 6:22 pm
  • Hi guys, I am having some inconsistent timing in the web interface. The job finish time as below is 47 secs but the Map & Reduce took significantly longer. I don't think I did anything that could ...
    Dec 27, 2010 at 5:15 am
    Dec 27, 2010 at 8:36 am
  • Let's say I want to ditch an input record the very first time it fails (because I know it is a deterministic data-dependent failure) instead of retrying it the default four times. I have already ...
    Keith WileyKeith Wiley
    Dec 23, 2010 at 9:33 pm
    Dec 25, 2010 at 5:35 pm
  • hi there, i want to aski if hdfs api supports reading just a specific block of a file (of course if file exceeds the default block size). for example is it possible to read/fetch just the first of ...
    Petrucci AndreasPetrucci Andreas
    Dec 16, 2010 at 5:31 pm
    Dec 18, 2010 at 12:47 pm
  • Hi all, I am googled a lot about the below error but can't able to find the root cause. I am selecting data from Hive table website_master but it results in below error : Hibernate: select ...
    Adarsh SharmaAdarsh Sharma
    Dec 16, 2010 at 10:09 am
    Dec 17, 2010 at 6:25 am
  • Hi all, I'm a little confused about how to configure hadoop in a heterogeneous cluster. For example, if I have one machine(m1) with a two-core processor, another(m2) with a four-core processor, and ...
    Yu LiYu Li
    Dec 16, 2010 at 9:18 am
    Dec 17, 2010 at 1:46 am
  • Hi, Does any one know how to speed up datanode decommissioning and what are all the configurations related to the decommissioning. How to Speed Up Data Transfer from the Datanode getting ...
    Dec 16, 2010 at 7:13 am
    Dec 16, 2010 at 7:42 am
  • HI , I am trying to upgrade hadoop ,as part of this i have set Two environment variables NEW_HADOOP_INSTALL and OLD_HADOOP_INSTALL . After this i have executed the following command % ...
    Dec 16, 2010 at 6:05 am
    Dec 16, 2010 at 6:32 am
Group Navigation
period‹ prev | Dec 2010 | next ›
Group Overview
groupcommon-user @

115 users for December 2010

Harsh J: 29 posts Ted Dunning: 25 posts Edward Choi: 21 posts Adarsh Sharma: 18 posts Maha A. Alabduljalil: 18 posts Peng, Wei: 18 posts James Seigel: 14 posts Mark Kerzner: 12 posts Li ping: 11 posts Sudhir Vallamkondu: 11 posts Keith Wiley: 8 posts Rob Stewart: 8 posts Todd Lipcon: 8 posts Aman: 7 posts Konstantin Boudnik: 7 posts Steve Loughran: 7 posts Matthew John: 6 posts Petrucci Andreas: 6 posts Praveen Bathala: 6 posts Ted Yu: 6 posts
show more