HI lohit
I am using 0.16.0. The test scenario was: 37GB of data in files of
several sizes between 15MB and 120MB. I uploaded around 1000 files.
When the bin/hadoop copy command exited. I run the java application
which retrieves that info. The way I check all the files is
1 open the local directory
2 using a filter for zip files
3 call list() returns all the zip files in the directory.
So when I start iterating through the list, it works fine till I reach
1/3 of the file names. Then it starts returning empty matrices. Then
again returns the hostnames for the last 1/4 of all the elements. I
cannot tell you exactly the numbers right now. I could check this on
monday.
I have 5 nodes running for my experiment, each node with 20GB for the
HDFS. While coping the files I used the web app for monitoring the
HDFS nodes. The node from where I am coping the files is the one that
is more used (% of HD) although the files are spread through all the
nodes. This one is very loaded.
When the copy command finishes, It is when I run my java application
and I get the empty String [][].
I checked again the webb app after several minutes 5/10min and the
cluster was almost balanced. So data had being moved from this node to
the others in the cluster. When all the nodes had similar percentages
of use (space). I run again the java app and it seemed to work.
I am not SURE of this because I couldn't check the output. I will run
a test again next Monday.
But is seems that while rebalancing the cluster the files that are
being reallocated. The String[][] fileCacheHints =
fs.getFileCacheHints(...) method cannot return a value. Am I right??
I have 2 more questions. what is then start, end means for the
parameters? from byte 0 to byte 100 for instance? the javadoc does
not say a word about them.
And why does return a matrix??? I am using replication level 2. For
all the files that returned a value the matrix just contained an array
fileCacheHints[0][] == {hostNameA,hostNameB}
Thanks
Alfonso
On 20/03/2008, lohit wrote:I tried to get location of a file which is 100 bytes and also first 100 bytes of huge file. Both returned me set of hosts.
This is against trunk.
FileSystem fs = FileSystem.get(conf);
String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"), 0, 100L);
for (String[] tmp : fileCacheHints) {
System.out.println("");
for(String tmp1 : tmp)
System.out.print(tmp1);
}
----- Original Message ----
From: lohit <
[email protected]>
To:
[email protected]Sent: Thursday, March 20, 2008 11:14:49 AM
Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
Hi Alfonso,
which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.
Thanks,
Lohit
----- Original Message ----
From: Alfonso Olias Sanz <
[email protected]>
To:
[email protected];
[email protected]Sent: Wednesday, March 19, 2008 10:51:08 AM
Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
HI there,
I am trying to get the hostnames where a file is contained
dfs.getFileCacheHints(inFile, 0, 100);
But for a reason I cannot guess, some files that are in the HDFS the
returning String[][] is empty.
if I list the file using bin/hadoop -ls path | grep fileName The file appears.
Also I am able to get the FileStatus dfs.getFileStatus(inFile);
What I am trying to do is for a list of files, get the hostnames were
the files are phisically stored.
Thanks
alfonso