FAQ

Search Discussions

143 discussions - 630 posts

  • Hi - I'd like to create a job that pulls small files from a remote server (using FTP, SCP, etc.) and stores them directly to sequence files on HDFS. Looking at the sequence file APi, I don't see an ...
    Scott WhitecrossScott Whitecross
    Mar 12, 2010 at 1:23 pm
    Mar 17, 2010 at 10:09 pm
  • Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at ...
    Jacob R RideoutJacob R Rideout
    Mar 6, 2010 at 4:32 pm
    May 9, 2010 at 4:42 am
  • I may be having a setup issue with classpaths, would appreciate some help. I created a jar with all the Sample* classes in contrib/DataJoin. Here is the listing of my samplejoin.jar file: " zip.vim ...
    M BM B
    Mar 26, 2010 at 10:25 pm
    Mar 30, 2010 at 9:00 am
  • Hello everyone, I'm thinking of using Hadoop as a subject in my master's thesis in Computer Science. I'm supposed to solve some kind of a problem with Hadoop, but can't think of any :)). We have a ...
    Tonci BuljanTonci Buljan
    Mar 1, 2010 at 2:02 pm
    Mar 4, 2010 at 5:12 pm
  • I'm confused as to how to run a C++ pipes program on a full HDFS system. First off, I have everything working in pseudo-distributed mode so that's a good start...but full HDFS has no concept of an ...
    Keith WileyKeith Wiley
    Mar 30, 2010 at 4:10 pm
    Mar 31, 2010 at 6:16 pm
  • Hi, I just moved from pseudo distributed hadoop to a four machine full distributed hadoop setup. But, after I start the dfs, there is no live node showing up. If I make master a slave too, then the ...
    William KangWilliam Kang
    Mar 17, 2010 at 6:44 am
    Mar 22, 2010 at 11:36 am
  • Hi all, I'm trying to install Hadoop on a cluster, but I'm getting this error. I'm using java version "1.6.0_17" and hadoop-0.20.1+169.56.tar.gz from Cloudera. Its running in a NFS home shared ...
    Edson RamiroEdson Ramiro
    Mar 29, 2010 at 7:01 pm
    Apr 6, 2010 at 7:35 pm
  • Hi all, I am running HDFS in Pseudo-distributed mode. Every time after I restarted the machine, I have to format the namenode otherwise the localhost:50070 wont show up. It is quite annoying to do so ...
    William KangWilliam Kang
    Mar 8, 2010 at 5:29 am
    Mar 10, 2010 at 5:21 am
  • I have been researching ways to handle de-dupping data while running a map/reduce program (so as to not re-calculate/re-aggregate data that we have seen before[possibly months before]). The data sets ...
    Joseph SteinJoseph Stein
    Mar 25, 2010 at 6:09 pm
    Mar 31, 2010 at 3:36 pm
  • Hi all, I have two questions about HOD 1. I confiured and setup a HOD on one cluster, it works fine, but when I finished jobs and deallocated the nodes, I found my jobID can still be seen using ...
    Song LiuSong Liu
    Mar 15, 2010 at 2:56 pm
    Mar 22, 2010 at 7:34 pm
  • I am considering the following problem: if someone knows the master and ports of a hadoop cluster, is he able to run hadoop fs shell to read/write/update/delete data in the cluster without any ...
    Jiang lichtJiang licht
    Mar 5, 2010 at 9:58 pm
    Mar 8, 2010 at 8:22 pm
  • I am using ubuntu Linux. I was able to get the standalone hadoop cluster running and run the wordcount example. before i start writing hadoop programs i wanted to compile the wordcount example on my ...
    Varun ThackerVarun Thacker
    Mar 4, 2010 at 7:42 pm
    Mar 6, 2010 at 6:25 am
  • Our hadoop cluster went down last night when the namenode ran out of hard drive space. Trying to restart fails with this exception (see below). Since I don't really care that much about losing a days ...
    Mike andersonMike anderson
    Mar 4, 2010 at 4:57 pm
    May 19, 2010 at 9:33 am
  • Dear Hadoopers, i'm trying to find out how and where hadoop splits a file into blocks and decides to send them to the datanodes. My specific problem: i have two types of data files. One large file is ...
    Yuri K.Yuri K.
    Mar 24, 2010 at 3:24 pm
    Mar 26, 2010 at 6:27 pm
  • Dear All, I'm trying to run tests using MySQL as some kind of a datasource, so I thought cloudera's sqoop would be a nice project to have in the production. However, I'm not using the cloudera's ...
    Utku Can TopçuUtku Can Topçu
    Mar 17, 2010 at 10:59 am
    Mar 19, 2010 at 6:00 pm
  • Hi all, I 've noticed swapping for a single terasort job on a small 8-node cluster using hadoop-0.20.1. The swapping doesn't happen repeatably; I can have back to back runs of the same job from the ...
    Vasilis LiaskovitisVasilis Liaskovitis
    Mar 30, 2010 at 5:16 pm
    Apr 2, 2010 at 6:17 pm
  • The question probably sounds silly. It's weird that I got the following issues. Namenode and datanode can start w/o any problem and the hdfs reports healthy. But tasktracker on slaves cannot start. ...
    Jiang lichtJiang licht
    Mar 9, 2010 at 3:25 am
    Mar 10, 2010 at 12:47 am
  • I am considering a basic task of loading data to hadoop cluster in this scenario: hadoop cluster and bulk data reside on different boxes, e.g. connected via LAN or wan. An example to do this is to ...
    Jiang lichtJiang licht
    Mar 2, 2010 at 7:31 am
    Mar 2, 2010 at 10:37 pm
  • Hi, I have a LAN in which the IPs of the machines will be changed dynamically by the DHCP sever. So for namenode, jobtracker, master and slave configurations we could not give the IP. can the machine ...
    Gokulakannan MGokulakannan M
    Mar 26, 2010 at 2:32 pm
    Mar 26, 2010 at 5:34 pm
  • A previous post to core-user mentioned some formula to determine job time. I was wondering if anyone out there is trying to tackle designing a formula that can calculate the job run time of a ...
    Edward CaprioloEdward Capriolo
    Mar 1, 2010 at 5:27 pm
    Mar 3, 2010 at 6:16 am
  • Hi, Did all key-value pairs of the map output, which have the same key, will be sent to the same reducer tasknode?
    Cui tonyCui tony
    Mar 31, 2010 at 1:56 am
    Mar 31, 2010 at 2:54 am
  • Dear All, I have been trying to get HOD working on a cluster running Scyld. But there are some problems. I configured the minimum configurations. 1. I executed the command: $ bin/hod allocate -d ...
    Boyu ZhangBoyu Zhang
    Mar 22, 2010 at 3:52 pm
    Mar 23, 2010 at 11:42 pm
  • Hi all, I install Hadoop in three machines, my pc is the namenode, two other pc are the datanodes, but when I execute bin/start-dfs.sh, it displays these two line as follows: datanode1: /usr/bin/env: ...
    毛宏毛宏
    Mar 23, 2010 at 11:48 am
    Mar 23, 2010 at 3:20 pm
  • I'm getting this error when I try to copy files to the dfs using this command: hadoop@10:/home/ubuntu/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg I tried this to see what might be ...
    Katie legereKatie legere
    Mar 20, 2010 at 6:33 pm
    Mar 22, 2010 at 5:33 am
  • Hi , Can we pipeline the map output directly into reduce phase without storing it in the local filesystem (avoiding disk IOs). If yes , how to do that ? Any help is highly appreciated. Thanks
    Bharath vBharath v
    Mar 4, 2010 at 5:01 pm
    Mar 5, 2010 at 1:27 am
  • Hi, We've got a new batch of servers that we're looking to configure our new cluster with. We're anticipating this will be about 5-10 nodes to start and potentially another 15 or so fairly soon after ...
    Paul InglesPaul Ingles
    Mar 2, 2010 at 5:41 pm
    Mar 3, 2010 at 4:49 pm
  • Hello Everybody, I have a small question. I want to know how would one implement divide and conquer algorithms in Hadoop. For example suppose I want to implement merge sort 100 lines in hadoop. There ...
    Aa225Aa225
    Mar 1, 2010 at 7:28 am
    Mar 1, 2010 at 8:42 am
  • Hi, I am copying certain data from a client machine (which is not part of the cluster) using DFSClient to HDFS. During this process, I am encountering some issues and the error/info logs are going to ...
    Pallavi PalletiPallavi Palleti
    Mar 30, 2010 at 6:25 am
    Mar 31, 2010 at 6:38 pm
  • Hi fellows Below code segment add a shutdown hook to JVM, but when I got a strange exception, java.lang.IllegalStateException: Shutdown in progress at ...
    SilllllenceSilllllence
    Mar 10, 2010 at 5:40 am
    Mar 19, 2010 at 6:32 pm
  • Is there a way to create symlink in hdfs? And does LOAD function in Pig follows such a link? Thanks! Michael
    Jiang lichtJiang licht
    Mar 10, 2010 at 12:11 am
    Mar 10, 2010 at 10:04 pm
  • I want to write a script that pulls data (flat files) from a remote machine and pushes that into its hadoop cluster. At the moment, it is done in two steps: 1 - Secure copy the remote files 2 - Put ...
    zenMonkeyzenMonkey
    Mar 6, 2010 at 7:23 pm
    Mar 8, 2010 at 12:56 am
  • Hi everyone, I am running a perl script through streaming at the beginning of which I print a test identifier to STDERR: print STDERR "whereisthis\n"; However, I can NOT find such "whereisthis" after ...
    Deqiang sunDeqiang sun
    Mar 3, 2010 at 12:05 am
    Mar 5, 2010 at 5:53 am
  • Hello Everyone I want ask about Hbase and Hive. What the different Hbase and Hive? and then what the consideration for choose Hbase or Hive? Kind regards
    Fitrah Elly FirdausFitrah Elly Firdaus
    Mar 3, 2010 at 4:56 pm
    Mar 4, 2010 at 5:41 pm
  • Hi, Was curious if anyone else thought it would be useful to have a separate mail list for discussion/issues specific to Hadoop Streaming? Thanks, Michael
    Michael KintzerMichael Kintzer
    Mar 3, 2010 at 8:10 pm
    Mar 3, 2010 at 9:06 pm
  • Hej I've checking the API and on internet but I have not found any method for listing the subdirectories of a given directory in the HDFS. Can anybody show me how to get the list of subdirectories or ...
    Santiago PérezSantiago Pérez
    Mar 30, 2010 at 3:24 pm
    Jun 20, 2010 at 12:12 pm
  • hi, guys, we have some machine with 1T disk, some with 100GB disk, I have this question that is there any means we can limit the disk usage of datanodes on those machines with smaller disk? thanks!
    Steven zhuangSteven zhuang
    Mar 31, 2010 at 3:12 am
    Apr 2, 2010 at 4:59 pm
  • I am interested in Hadoop usage and internal mechanism investigation. Hope can contribute to this community and learn more from all of you. Thanks! starlee
    LishidahappyLishidahappy
    Mar 29, 2010 at 4:58 am
    Apr 1, 2010 at 4:27 am
  • Hi All, I am trying to get DFS IO performance. I used TestDFSIO from hadoop jars. The results were abt 100Mbps read and write . I think it should be more than this Pl share some stats to compare ...
    Sagar naikSagar naik
    Mar 30, 2010 at 10:42 pm
    Apr 1, 2010 at 1:44 am
  • I realized that I made a mistake in my earlier post. So here is the correct one. I have a job ("loadgen") with only 1 input (say) part-00000 of size 1368654 bytes. So when I submit this job, I get ...
    Abhishek sharmaAbhishek sharma
    Mar 25, 2010 at 2:27 am
    Mar 25, 2010 at 4:07 pm
  • Hi, I want to know which Cloudera AMI supports which Hadoop version. For example, ami-2932d440:cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090602-i386.manifest.xml ami-ed59bf84: ...
    Sonal GoyalSonal Goyal
    Mar 14, 2010 at 2:04 pm
    Mar 25, 2010 at 1:58 pm
  • Hi, The wiki's "Powered By" page ( http://wiki.apache.org/hadoop/PoweredBy ) lists dozens of companies using Hadoop in production, some of them for mission-critical operations, but is anyone using it ...
    Marcos Medrado RubinelliMarcos Medrado Rubinelli
    Mar 23, 2010 at 11:05 am
    Mar 24, 2010 at 3:35 pm
  • Hey All, Currently in a project I'm involved, we're about to make design choices regarding the use of Hadoop as a scalable and distributed data analytics framework. Basically the application would be ...
    Utku Can TopçuUtku Can Topçu
    Mar 22, 2010 at 11:51 am
    Mar 22, 2010 at 10:46 pm
  • Dear All, sorry to bother again. I overcame the Uncaught Exception : need more than 2 values to unpack by export HOD_PYTHON_HOME. But now I had a new error. $ bin/hod allocate -d /home/zhang/cluster ...
    Boyu ZhangBoyu Zhang
    Mar 22, 2010 at 4:35 pm
    Mar 22, 2010 at 7:35 pm
  • Hi. Our disks are getting full and I have found that it is the trashbin that is getting full. I've tried dfs -expunge but it does not clean the /user/hadoop/.Trash I have lowered the ...
    Marcus HerouMarcus Herou
    Mar 15, 2010 at 9:15 am
    Mar 17, 2010 at 3:39 pm
  • Preparing a Hadoop presentation here. For demonstration I start up a 5 machine m1.large cluster in EC2 via cloudera scripts ($hadoop-ec2 launch-cluster my-hadoop-cluster 5). Then I sent a 500 MB xml ...
    Reik SchatzReik Schatz
    Mar 17, 2010 at 9:05 am
    Mar 17, 2010 at 2:48 pm
  • Hi all, We are going to hold the second Hive User Group Meeting at 7PM on 3/18/2010 Thursday. The agenda will be: * Hive Tutorial: 20 min * Hive User Case Study: 20 min * New Features and API: 25 min ...
    Zheng ShaoZheng Shao
    Mar 2, 2010 at 7:55 pm
    Mar 15, 2010 at 9:00 pm
  • I am learning how fair scheduler manage the jobs to allow each job share resource over time; but don't know if my understanding is correct or not. My scenario is that I have 3 data nodes and the ...
    Neo AndersonNeo Anderson
    Mar 10, 2010 at 3:39 pm
    Mar 10, 2010 at 5:24 pm
  • Is there any way ( like hadoop-commandline or files ) to know ip address of all the cluster nodes ( from master )
    Prasenjit mukherjeePrasenjit mukherjee
    Mar 6, 2010 at 3:14 am
    Mar 7, 2010 at 3:56 am
  • Hi, Suppose I do need to sort a big file(in GB). How would I accomplish this task using hadoop. My main problem is how to merge the output of individual reduce phases? thanks
    Aayush GargAayush Garg
    Mar 4, 2010 at 10:02 pm
    Mar 6, 2010 at 1:09 am
  • I set up a simple cluster with one master (namenode@50001 and jobtracker@50002) and one slave. The problem is that although namenode/datanode and jobracker/tasktracker are running but there is no ...
    Jiang lichtJiang licht
    Mar 3, 2010 at 11:39 pm
    Mar 5, 2010 at 3:33 am
Group Navigation
period‹ prev | Mar 2010 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions143
posts630
users179
websitehadoop.apache.org...
irc#hadoop

179 users for March 2010

Jiang licht: 29 posts Ted Yu: 24 posts Steve Loughran: 20 posts Gang Luo: 19 posts Edward Capriolo: 17 posts Allen Wittenauer: 16 posts Boyu Zhang: 14 posts Todd Lipcon: 14 posts Raymond Jennings III: 12 posts Edson Ramiro: 11 posts Song Liu: 11 posts William Kang: 10 posts Keith Wiley: 9 posts Nick Jones: 9 posts Sonal Goyal: 9 posts 毛宏: 9 posts Alex Kozlov: 7 posts Amogh Vasekar: 7 posts Ed Mazur: 7 posts Prasenjit mukherjee: 7 posts
show more