FAQ
Hello, how can one determine the names of the files in a particular hadoop
directory, programmatically?

Search Discussions

  • Steve Lewis at Aug 25, 2010 at 4:05 pm
    @Override
    public HDFSFile[] getFiles(String directory) {
    String result = executeCommand("hadoop fs -ls " + directory);
    String[] items = result.split("\n");
    List<HDFSFile> holder = new ArrayList<HDFSFile>();
    for (int i = 1; i < items.length; i++) {
    String item = items[i];
    if (item.length() > MIN__FILE_LENGTH) {
    try {
    holder.add(new HDFSFile(item));
    }
    catch (Exception e) {
    }
    }
    }
    HDFSFile[] ret = new HDFSFile[holder.size()];
    holder.toArray(ret);
    return ret;

    }
    On Wed, Aug 25, 2010 at 12:36 AM, Denim Live wrote:

    Hello, how can one determine the names of the files in a particular hadoop
    directory, programmatically?





    --
    Steven M. Lewis PhD
    Institute for Systems Biology
    Seattle WA
  • Raj V at Aug 25, 2010 at 9:01 pm
    I would use the FileSystem API.

    Here is a Q&D example

    import java.io.*;
    import java.util.*;
    import java.lang.*;
    import java.net.URI;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.FileStatus;


    public class dirc {
    public static void main ( String args[])
    {
    try {
    String dirname = args[0];
    Configuration conf = new Configuration(true);
    FileSystem fs = FileSystem.get(conf);
    Path path = new Path(dirname);
    FileStatus fstatus[] = fs.listStatus(path);
    for ( FileStatus f: fstatus ) {
    System.out.println(f.getPath().toUri().getPath());
    }
    }catch ( IOException e ) {
    System.out.println("Usage dirc <directory> ");
    return ;
    } catch (ArrayIndexOutOfBoundsException e) {
    System.out.println("Usage dirc <directory> ");
    return ;
    }
    }
    }







    ________________________________
    From: Steve Lewis <lordjoe2000@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Wed, August 25, 2010 9:04:41 AM
    Subject: Re: How to enumerate files in the directories?



    @Override
    public HDFSFile[] getFiles(String directory) {
    String result = executeCommand("hadoop fs -ls " + directory);
    String[] items = result.split("\n");
    List<HDFSFile> holder = new ArrayList<HDFSFile>();
    for (int i = 1; i < items.length; i++) {
    String item = items[i];
    if (item.length() > MIN__FILE_LENGTH) {
    try {
    holder.add(new HDFSFile(item));
    }
    catch (Exception e) {
    }
    }
    }
    HDFSFile[] ret = new HDFSFile[holder.size()];
    holder.toArray(ret);
    return ret;

    }

    On Wed, Aug 25, 2010 at 12:36 AM, Denim Live wrote:

    Hello, how can one determine the names of the files in a particular hadoop
    directory, programmatically?



    --
    Steven M. Lewis PhD
    Institute for Systems Biology
    Seattle WA
  • Sudhir Vallamkondu at Aug 25, 2010 at 7:28 pm
    You should use the FileStatus API to access file metadata. See below a
    example.

    http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSt
    atus.html

    Configuration conf = new Configuration(); // takes default conf
    FileSystem fs = FileSystem.get(conf);
    Path dir = new Path("/dir");
    FileStatus[] stats = fs.listStatus(dir);
    foreach(FileStatus stat : stats)
    {
    stat.getPath().toUri().getPath(); // gives directory name
    stat.getModificationTime();
    stat.getReplication();
    stat.getBlockSize();
    stat.getOwner();
    stat.getGroup();
    stat.getPermission().toString();
    }


    From: Denim Live <denim.live@yahoo.com>
    Date: Wed, 25 Aug 2010 07:36:11 +0000 (GMT)
    To: <common-user@hadoop.apache.org>
    Subject: How to enumerate files in the directories?

    Hello, how can one determine the names of the files in a particular hadoop
    directory, programmatically?

    iCrossing Privileged and Confidential Information
    This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
  • Path2727 at Oct 12, 2010 at 10:56 am
    I think this might be a better answer to your question. I took a lot of the
    code out of the web interface they made.

    $HADOOP_HOME/hdfs/src/java/org/apache/hadoop/hdfs/server/common/JspHelper.java

    and

    $HADOOP_HOME/hdfs/src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java


    import java.io.File;
    import java.io.IOException;
    import java.net.InetAddress;
    import java.net.InetSocketAddress;
    import java.security.PrivilegedExceptionAction;
    import java.util.Date;
    import java.util.List;

    import org.apache.hadoop.hdfs.DFSClient;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.security.UserGroupInformation;
    import org.apache.hadoop.hdfs.protocol.HdfsFileStatus;
    import org.apache.hadoop.hdfs.protocol.DirectoryListing;

    public class FileTest {


    /**
    *
    * @param server the name of the namenode server. 'namenode.example.com'
    * @param port the port of the name node.
    * mine is 54310 right now.
    * this is the info port not the other port that the slaves connect to.
    * @param dir the directory you wish to enumerate. I used '/' in this
    example.
    *
    */
    public FileTest( String server, String port, String dir ) {

    String tDir = validatePath( dir );

    int namenodePort = Integer.parseInt(port);

    if( tDir != null ) {
    Configuration conf = new Configuration(true);

    UserGroupInformation ugi = null;

    try {
    ugi = UserGroupInformation.getCurrentUser();
    } catch ( IOException ioe ) { ioe.printStackTrace(); }


    InetSocketAddress inet = new InetSocketAddress( server,
    namenodePort );



    if( ugi != null && inet != null && conf != null ) {
    try {
    DFSClient dfs = getDFSClient(
    ugi, inet, conf);

    String target = dir;
    final HdfsFileStatus targetStatus = dfs.getFileInfo(target);

    if( targetStatus.isDir() ) {
    //System.out.println("it is a directory");
    DirectoryListing thisListing =
    dfs.listPaths(target, HdfsFileStatus.EMPTY_NAME);
    if (thisListing == null ||
    thisListing.getPartialListing().length == 0) {
    System.out.println("Empty directory");
    } else {
    //System.out.println("directory not empty");
    HdfsFileStatus[] files =
    thisListing.getPartialListing();
    for (int i = 0; i < files.length; i++) {
    if( files[i].isDir() ) {
    System.out.println(" dir " +
    files[i].getLocalName() );
    } else {
    System.out.println(" file " +
    files[i].getLocalName() + files[i].getReplication()+files[i].getBlockSize()
    );
    }
    }
    }
    } else {
    System.out.println("it is not a directory");
    }
    } catch ( Exception e ) { // Could be IOException or
    InterruptedException
    e.printStackTrace();
    }
    } else {
    System.out.println("a requirement is null");
    }
    }
    }


    private static DFSClient getDFSClient(final UserGroupInformation user,
    final InetSocketAddress addr,
    final Configuration conf
    ) throws IOException,
    InterruptedException {
    return
    user.doAs(new PrivilegedExceptionAction<DFSClient>() {
    public DFSClient run() throws IOException {
    return new DFSClient(addr, conf);
    }
    });
    }


    public static String validatePath(String p) {
    return p == null || p.length() == 0?
    null: new Path(p).toUri().getPath();
    }


    public static void main( String[] args ) {
    if( args.length == 3 && args[2].contains("/") && args[0].contains(".")
    ) {
    FileTest ft = new FileTest( args[0], args[1], args[2] );
    } else {
    System.out.println("Usage: java FileTest <serverName>
    <nameNodeInfoPort> <dir> ");
    System.out.println(
    "a valid dir must have '/' in the string somewhere");
    System.out.println("a valid server must have '.' in the string
    somewhere");

    }
    }

    }

    --
    View this message in context: http://lucene.472066.n3.nabble.com/How-to-enumerate-files-in-the-directories-tp1325738p1685723.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
  • Path2727 at Oct 12, 2010 at 10:57 am
    Configuration conf = new Configuration(true);
    conf.set( "fs.default.name", "hdfs://<namenode>:<port>");

    I noticed that simply adding this line to a few of the previous posts solved
    my problems. I was frustrated because I was trying to use their examples
    and it only printed my LOCAL file system. I was using the examples as java
    <program_name> and not via the '$HADOOP_HOME/bin/hadoop jar'. Since i was
    trying to execute them in this way, my program wasn't loading the
    Configuration object correctly.


    Just thought I would add that here since it caused me frustration.
    --
    View this message in context: http://lucene.472066.n3.nabble.com/How-to-enumerate-files-in-the-directories-tp1325738p1686040.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 25, '10 at 7:36a
activeOct 12, '10 at 10:57a
posts6
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase