FAQ
HI there,

I am trying to get the hostnames where a file is contained
dfs.getFileCacheHints(inFile, 0, 100);

But for a reason I cannot guess, some files that are in the HDFS the
returning String[][] is empty.

if I list the file using bin/hadoop -ls path | grep fileName The file appears.

Also I am able to get the FileStatus dfs.getFileStatus(inFile);


What I am trying to do is for a list of files, get the hostnames were
the files are phisically stored.

Thanks
alfonso

Search Discussions

  • Lohit at Mar 20, 2008 at 6:15 pm
    Hi Alfonso,

    which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.

    Thanks,
    Lohit

    ----- Original Message ----
    From: Alfonso Olias Sanz <[email protected]>
    To: [email protected]; [email protected]
    Sent: Wednesday, March 19, 2008 10:51:08 AM
    Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    HI there,

    I am trying to get the hostnames where a file is contained
    dfs.getFileCacheHints(inFile, 0, 100);

    But for a reason I cannot guess, some files that are in the HDFS the
    returning String[][] is empty.

    if I list the file using bin/hadoop -ls path | grep fileName The file appears.

    Also I am able to get the FileStatus dfs.getFileStatus(inFile);


    What I am trying to do is for a list of files, get the hostnames were
    the files are phisically stored.

    Thanks
    alfonso
  • Lohit at Mar 20, 2008 at 9:21 pm
    I tried to get location of a file which is 100 bytes and also first 100 bytes of huge file. Both returned me set of hosts.
    This is against trunk.

    FileSystem fs = FileSystem.get(conf);

    String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"), 0, 100L);
    for (String[] tmp : fileCacheHints) {
    System.out.println("");
    for(String tmp1 : tmp)
    System.out.print(tmp1);
    }


    ----- Original Message ----
    From: lohit <[email protected]>
    To: [email protected]
    Sent: Thursday, March 20, 2008 11:14:49 AM
    Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    Hi Alfonso,

    which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.

    Thanks,
    Lohit

    ----- Original Message ----
    From: Alfonso Olias Sanz <[email protected]>
    To: [email protected]; [email protected]
    Sent: Wednesday, March 19, 2008 10:51:08 AM
    Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    HI there,

    I am trying to get the hostnames where a file is contained
    dfs.getFileCacheHints(inFile, 0, 100);

    But for a reason I cannot guess, some files that are in the HDFS the
    returning String[][] is empty.

    if I list the file using bin/hadoop -ls path | grep fileName The file appears.

    Also I am able to get the FileStatus dfs.getFileStatus(inFile);


    What I am trying to do is for a list of files, get the hostnames were
    the files are phisically stored.

    Thanks
    alfonso
  • Alfonso Olias Sanz at Mar 20, 2008 at 11:09 pm
    HI lohit

    I am using 0.16.0. The test scenario was: 37GB of data in files of
    several sizes between 15MB and 120MB. I uploaded around 1000 files.
    When the bin/hadoop copy command exited. I run the java application
    which retrieves that info. The way I check all the files is
    1 open the local directory
    2 using a filter for zip files
    3 call list() returns all the zip files in the directory.

    So when I start iterating through the list, it works fine till I reach
    1/3 of the file names. Then it starts returning empty matrices. Then
    again returns the hostnames for the last 1/4 of all the elements. I
    cannot tell you exactly the numbers right now. I could check this on
    monday.

    I have 5 nodes running for my experiment, each node with 20GB for the
    HDFS. While coping the files I used the web app for monitoring the
    HDFS nodes. The node from where I am coping the files is the one that
    is more used (% of HD) although the files are spread through all the
    nodes. This one is very loaded.

    When the copy command finishes, It is when I run my java application
    and I get the empty String [][].

    I checked again the webb app after several minutes 5/10min and the
    cluster was almost balanced. So data had being moved from this node to
    the others in the cluster. When all the nodes had similar percentages
    of use (space). I run again the java app and it seemed to work.

    I am not SURE of this because I couldn't check the output. I will run
    a test again next Monday.

    But is seems that while rebalancing the cluster the files that are
    being reallocated. The String[][] fileCacheHints =
    fs.getFileCacheHints(...) method cannot return a value. Am I right??

    I have 2 more questions. what is then start, end means for the
    parameters? from byte 0 to byte 100 for instance? the javadoc does
    not say a word about them.

    And why does return a matrix??? I am using replication level 2. For
    all the files that returned a value the matrix just contained an array
    fileCacheHints[0][] == {hostNameA,hostNameB}

    Thanks
    Alfonso
    On 20/03/2008, lohit wrote:
    I tried to get location of a file which is 100 bytes and also first 100 bytes of huge file. Both returned me set of hosts.
    This is against trunk.

    FileSystem fs = FileSystem.get(conf);

    String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"), 0, 100L);
    for (String[] tmp : fileCacheHints) {
    System.out.println("");
    for(String tmp1 : tmp)
    System.out.print(tmp1);

    }


    ----- Original Message ----
    From: lohit <[email protected]>
    To: [email protected]
    Sent: Thursday, March 20, 2008 11:14:49 AM
    Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    Hi Alfonso,

    which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.

    Thanks,
    Lohit

    ----- Original Message ----
    From: Alfonso Olias Sanz <[email protected]>
    To: [email protected]; [email protected]
    Sent: Wednesday, March 19, 2008 10:51:08 AM
    Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    HI there,

    I am trying to get the hostnames where a file is contained
    dfs.getFileCacheHints(inFile, 0, 100);

    But for a reason I cannot guess, some files that are in the HDFS the
    returning String[][] is empty.

    if I list the file using bin/hadoop -ls path | grep fileName The file appears.

    Also I am able to get the FileStatus dfs.getFileStatus(inFile);


    What I am trying to do is for a list of files, get the hostnames were
    the files are phisically stored.

    Thanks
    alfonso





  • Lohit at Mar 21, 2008 at 12:30 am

    So when I start iterating through the list, it works fine till I reach
    1/3 of the file names. Then it starts returning empty matrices. Then
    again returns the hostnames for the last 1/4 of all the elements. I
    cannot tell you exactly the numbers right now. I could check this on
    monday.
    By any chance do you have zero byte files?

    As soon as the file are closed the block locations information should be updated and any calls to getFileCacheHints() would give you back those locations.
    But is seems that while rebalancing the cluster the files that are
    being reallocated. The String[][] fileCacheHints =
    fs.getFileCacheHints(...) method cannot return a value. Am I right??
    When you describe this scenario, are you explicitly invoking therebalancer? Ideally, if you are using hadoop dfs -copyFromLocal or -putor if a map reduce job is writing a file onto HDFS and it terminateswith success. Later invocations of getFileCacheHints on these non-zerofiles should not return you an empty matrix.
    I have 2 more questions. what is then start, end means for the
    parameters? from byte 0 to byte 100 for instance? the javadoc does
    not say a word about them.
    start, is the start offset within the file and second parameter is the length. In essence you are providing the range within the file and trying to find out locations of blocks corresponding to them. I agree that javadoc should be more descriptive. We could fix this.
    And why does return a matrix??? I am using replication level 2. For
    all the files that returned a value the matrix just contained an array
    fileCacheHints[0][] == {hostNameA,hostNameB}
    A file can have multiple blocks. In the matrix, each row correspond to one block of a file. And columns within each row list all the hosts which host the block. (this depends on number of replicas you have, for a replication factor of 3, you would have 3 columns).

    Let us know when the file is created, closed and when your java app calls getFileCacheHints().

    Thanks
    Alfonso
    On 20/03/2008, lohit wrote:
    I tried to get location of a file which is 100 bytes and also first 100 bytes of huge file. Both returned me set of hosts.
    This is against trunk.

    FileSystem fs = FileSystem.get(conf);

    String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"), 0, 100L);
    for (String[] tmp : fileCacheHints) {
    System.out.println("");
    for(String tmp1 : tmp)
    System.out.print(tmp1);

    }


    ----- Original Message ----
    From: lohit <[email protected]>
    To: [email protected]
    Sent: Thursday, March 20, 2008 11:14:49 AM
    Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    Hi Alfonso,

    which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.

    Thanks,
    Lohit

    ----- Original Message ----
    From: Alfonso Olias Sanz <[email protected]>
    To: [email protected]; [email protected]
    Sent: Wednesday, March 19, 2008 10:51:08 AM
    Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    HI there,

    I am trying to get the hostnames where a file is contained
    dfs.getFileCacheHints(inFile, 0, 100);

    But for a reason I cannot guess, some files that are in the HDFS the
    returning String[][] is empty.

    if I list the file using bin/hadoop -ls path | grep fileName The file appears.

    Also I am able to get the FileStatus dfs.getFileStatus(inFile);


    What I am trying to do is for a list of files, get the hostnames were
    the files are phisically stored.

    Thanks
    alfonso





  • Alfonso Olias Sanz at Mar 24, 2008 at 11:34 am

    On 21/03/2008, lohit wrote:
    So when I start iterating through the list, it works fine till I reach
    1/3 of the file names. Then it starts returning empty matrices. Then
    again returns the hostnames for the last 1/4 of all the elements. I
    cannot tell you exactly the numbers right now. I could check this on
    monday.

    By any chance do you have zero byte files?
    No, all the files contain data.
    As soon as the file are closed the block locations information should be updated and any calls to getFileCacheHints() would give you back those locations.

    But is seems that while rebalancing the cluster the files that are
    being reallocated. The String[][] fileCacheHints =
    fs.getFileCacheHints(...) method cannot return a value. Am I right??

    When you describe this scenario, are you explicitly invoking therebalancer? Ideally, if you are using hadoop dfs -copyFromLocal or -putor if a map reduce job is writing a file onto HDFS and it terminateswith success. Later invocations of getFileCacheHints on these non-zerofiles should not return you an empty matrix.
    Yes I have called explicit the balancer because I want the data
    balanced before I run the application

    I am running again the same test. I called explicit the balancer.
    This is the actual output of the log file
    tail -f /home/aolias/software/Hadoop/hadoop-0.16.0/bin/../logs/hadoop-aolias-balancer-gaiawl03.net4.lan.out
    Time Stamp Iteration# Bytes Already Moved Bytes Left
    To Move Bytes Being Moved
    Mar 24, 2008 12:11:47 PM 0 0 KB
    24.33 MB 787.42 MB
    Mar 24, 2008 12:15:55 PM 1 461.45 MB
    2.66 GB 787.42 MB
    Mar 24, 2008 12:23:14 PM 2 761.36 MB
    3.53 GB 787.42 MB


    I suppose that while data is being balanced, there is no output for
    those blocks/files. I will run the test twice, one before the
    balancer finishes, and the second one after it finishesh balancing the
    cluster.

    Is it there any way to query the running balancer from java in order
    to make the aplication wait til the system is balanced?
    I have 2 more questions. what is then start, end means for the
    parameters? from byte 0 to byte 100 for instance? the javadoc does
    not say a word about them.

    start, is the start offset within the file and second parameter is the length. In essence you are providing the range within the file and trying to find out locations of blocks corresponding to them. I agree that javadoc should be more descriptive. We could fix this.
    Ok thanks! :)
    And why does return a matrix??? I am using replication level 2. For
    all the files that returned a value the matrix just contained an array
    fileCacheHints[0][] == {hostNameA,hostNameB}

    A file can have multiple blocks. In the matrix, each row correspond to one block of a file. And columns within each row list all the hosts which host the block. (this depends on number of replicas you have, for a replication factor of 3, you would have 3 columns).

    Let us know when the file is created, closed and when your java app calls getFileCacheHints().


    Thanks
    Alfonso
    On 20/03/2008, lohit wrote:
    I tried to get location of a file which is 100 bytes and also first 100 bytes of huge file. Both returned me set of hosts.
    This is against trunk.

    FileSystem fs = FileSystem.get(conf);

    String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"), 0, 100L);
    for (String[] tmp : fileCacheHints) {
    System.out.println("");
    for(String tmp1 : tmp)
    System.out.print(tmp1);

    }


    ----- Original Message ----
    From: lohit <[email protected]>
    To: [email protected]
    Sent: Thursday, March 20, 2008 11:14:49 AM
    Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    Hi Alfonso,

    which version of hadoop are you using. Yesterday a change was checked into trunk which changes getFileCacheHints.

    Thanks,
    Lohit

    ----- Original Message ----
    From: Alfonso Olias Sanz <[email protected]>
    To: [email protected]; [email protected]
    Sent: Wednesday, March 19, 2008 10:51:08 AM
    Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file

    HI there,

    I am trying to get the hostnames where a file is contained
    dfs.getFileCacheHints(inFile, 0, 100);

    But for a reason I cannot guess, some files that are in the HDFS the
    returning String[][] is empty.

    if I list the file using bin/hadoop -ls path | grep fileName The file appears.

    Also I am able to get the FileStatus dfs.getFileStatus(inFile);


    What I am trying to do is for a list of files, get the hostnames were
    the files are phisically stored.

    Thanks
    alfonso






Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 19, '08 at 5:51p
activeMar 24, '08 at 11:34a
posts6
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Alfonso Olias Sanz: 3 posts Lohit: 3 posts

People

Translate

site design / logo © 2023 Grokbase