FAQ
We have a compression utility that tries to grab all subdirs to a directory
on HDFS. It makes a call like this:
FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

and handles files vs dirs accordingly.

We tried to run our utility against a dir containing a computed SOLR shard,
which has files that look like this:
-rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
-rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
-rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
-rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/_ox.frq
-rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
-rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.prx
-rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.tii
-rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/_ox.tis
-rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/segments.gen
-rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/segments_2


The globStatus call seems only able to pick up those last 2 files; the
several files that start with _ don't register.

I've skimmed the FileSystem and GlobExpander source to see if there's
anything related to this, but didn't see it. Google didn't turn up anything
about underscores. Am I misunderstanding something about the regex patterns
needed to pick these up or unaware of some filename convention in HDFS?

Search Discussions

  • Edward Capriolo at Sep 2, 2011 at 9:21 pm

    On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao wrote:

    We have a compression utility that tries to grab all subdirs to a directory
    on HDFS. It makes a call like this:
    FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

    and handles files vs dirs accordingly.

    We tried to run our utility against a dir containing a computed SOLR shard,
    which has files that look like this:
    -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2


    The globStatus call seems only able to pick up those last 2 files; the
    several files that start with _ don't register.

    I've skimmed the FileSystem and GlobExpander source to see if there's
    anything related to this, but didn't see it. Google didn't turn up anything
    about underscores. Am I misunderstanding something about the regex patterns
    needed to pick these up or unaware of some filename convention in HDFS?
    Files starting with '_' are considered 'hidden' like unix files starting
    with '.'. I did not know that for a very long time because not everyone
    follows this rule or even knows about it.
  • Meng Mao at Sep 2, 2011 at 9:38 pm
    Is there a programmatic way to access these hidden files then?
    On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo wrote:
    On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao wrote:

    We have a compression utility that tries to grab all subdirs to a directory
    on HDFS. It makes a call like this:
    FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

    and handles files vs dirs accordingly.

    We tried to run our utility against a dir containing a computed SOLR shard,
    which has files that look like this:
    -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2


    The globStatus call seems only able to pick up those last 2 files; the
    several files that start with _ don't register.

    I've skimmed the FileSystem and GlobExpander source to see if there's
    anything related to this, but didn't see it. Google didn't turn up anything
    about underscores. Am I misunderstanding something about the regex patterns
    needed to pick these up or unaware of some filename convention in HDFS?
    Files starting with '_' are considered 'hidden' like unix files starting
    with '.'. I did not know that for a very long time because not everyone
    follows this rule or even knows about it.
  • Harsh J at Sep 3, 2011 at 3:47 am
    Meng,

    What version of hadoop are you on? I'm able to use globStatus(Path)
    for '_' listing successfully, with a '*' glob. Although the same
    doesn't apply to what FsShell's ls utility provide (which is odd
    here!).

    Here's my test code which can validate that the listing is indeed
    done: http://pastebin.com/vCbd2wmK

    $ hadoop dfs -ls
    Found 4 items
    drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 09:09
    /user/harshchouraria/_abc
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/_def
    drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 08:10
    /user/harshchouraria/abc
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/def


    $ hadoop dfs -ls '*'
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/_def
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/def

    $ # No dir results! ^^

    $ hadoop jar myjar.jar # (My code)
    hdfs://localhost/user/harshchouraria/_abc
    hdfs://localhost/user/harshchouraria/_def
    hdfs://localhost/user/harshchouraria/abc
    hdfs://localhost/user/harshchouraria/def

    I suppose that means globStatus is fine, but the FsShell.ls(…) code
    does something more than a simple glob status, and filters away
    directory results when used with a glob.
    On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao wrote:
    Is there a programmatic way to access these hidden files then?
    On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo wrote:
    On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao wrote:

    We have a compression utility that tries to grab all subdirs to a directory
    on HDFS. It makes a call like this:
    FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

    and handles files vs dirs accordingly.

    We tried to run our utility against a dir containing a computed SOLR shard,
    which has files that look like this:
    -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2


    The globStatus call seems only able to pick up those last 2 files; the
    several files that start with _ don't register.

    I've skimmed the FileSystem and GlobExpander source to see if there's
    anything related to this, but didn't see it. Google didn't turn up anything
    about underscores. Am I misunderstanding something about the regex patterns
    needed to pick these up or unaware of some filename convention in HDFS?
    Files starting with '_' are considered 'hidden' like unix files starting
    with '.'. I did not know that for a very long time because not everyone
    follows this rule or even knows about it.


    --
    Harsh J
  • Meng Mao at Sep 3, 2011 at 6:35 pm
    I get the opposite behavior --

    [this is more or less how I listed the files in the original email]
    hadoop dfs -ls /test/output/solr-20110901165238/part-00000/data/index/*
    -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2

    Whereas my globStatus doesn't capture them.

    I thought we were on Cloudera's CDH3, but now I'm not sure. This is what
    version reports:
    $ hadoop version
    Hadoop 0.20.1+169.56
    Subversion -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
    Compiled by root on Tue Feb 9 13:40:08 EST 2010




    On Fri, Sep 2, 2011 at 11:45 PM, Harsh J wrote:

    Meng,

    What version of hadoop are you on? I'm able to use globStatus(Path)
    for '_' listing successfully, with a '*' glob. Although the same
    doesn't apply to what FsShell's ls utility provide (which is odd
    here!).

    Here's my test code which can validate that the listing is indeed
    done: http://pastebin.com/vCbd2wmK

    $ hadoop dfs -ls
    Found 4 items
    drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 09:09
    /user/harshchouraria/_abc
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/_def
    drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 08:10
    /user/harshchouraria/abc
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/def


    $ hadoop dfs -ls '*'
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/_def
    -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10
    /user/harshchouraria/def

    $ # No dir results! ^^

    $ hadoop jar myjar.jar # (My code)
    hdfs://localhost/user/harshchouraria/_abc
    hdfs://localhost/user/harshchouraria/_def
    hdfs://localhost/user/harshchouraria/abc
    hdfs://localhost/user/harshchouraria/def

    I suppose that means globStatus is fine, but the FsShell.ls(…) code
    does something more than a simple glob status, and filters away
    directory results when used with a glob.
    On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao wrote:
    Is there a programmatic way to access these hidden files then?

    On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <edlinuxguru@gmail.com
    wrote:
    On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao wrote:

    We have a compression utility that tries to grab all subdirs to a directory
    on HDFS. It makes a call like this:
    FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

    and handles files vs dirs accordingly.

    We tried to run our utility against a dir containing a computed SOLR shard,
    which has files that look like this:
    -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2


    The globStatus call seems only able to pick up those last 2 files; the
    several files that start with _ don't register.

    I've skimmed the FileSystem and GlobExpander source to see if there's
    anything related to this, but didn't see it. Google didn't turn up anything
    about underscores. Am I misunderstanding something about the regex patterns
    needed to pick these up or unaware of some filename convention in
    HDFS?
    Files starting with '_' are considered 'hidden' like unix files starting
    with '.'. I did not know that for a very long time because not everyone
    follows this rule or even knows about it.


    --
    Harsh J
  • Harsh J at Sep 3, 2011 at 8:40 pm
    Meng,

    - Moving this discussion to cdh-user@cloudera.org since it may be CDH
    specific at this point. (Link:
    https://groups.google.com/a/cloudera.org/group/cdh-user)
    - I've bcc'd common-user@ for this mail alone.
    - Added you on cc in case you aren't subscribed.

    Reading your version output, that version is CDH2, the older version
    of CDH. Would you be able to upgrade your cluster to CDH3?

    I haven't tried running against your _exact_ version yet, but running
    against the latest CDH2 version of HDFS from
    http://archive.cloudera.com/cdh/2/, I think it still works fine (ditto
    code in the jar again):

    ➜ hadoop-0.20.1+169.127 > bin/hadoop jar ~/globtester.jar
    hdfs://localhost/user/harshchouraria/_abc
    hdfs://localhost/user/harshchouraria/_def
    hdfs://localhost/user/harshchouraria/abc
    hdfs://localhost/user/harshchouraria/def
    On Sun, Sep 4, 2011 at 12:04 AM, Meng Mao wrote:
    I get the opposite behavior --

    [this is more or less how I listed the files in the original email]
    hadoop dfs -ls /test/output/solr-20110901165238/part-00000/data/index/*
    -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2

    Whereas my globStatus doesn't capture them.

    I thought we were on Cloudera's CDH3, but now I'm not sure. This is what
    version reports:
    $ hadoop version
    Hadoop 0.20.1+169.56
    Subversion  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
    Compiled by root on Tue Feb  9 13:40:08 EST 2010




    On Fri, Sep 2, 2011 at 11:45 PM, Harsh J wrote:

    Meng,

    What version of hadoop are you on? I'm able to use globStatus(Path)
    for '_' listing successfully, with a '*' glob. Although the same
    doesn't apply to what FsShell's ls utility provide (which is odd
    here!).

    Here's my test code which can validate that the listing is indeed
    done: http://pastebin.com/vCbd2wmK

    $ hadoop dfs -ls
    Found 4 items
    drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 09:09
    /user/harshchouraria/_abc
    -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
    /user/harshchouraria/_def
    drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 08:10
    /user/harshchouraria/abc
    -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
    /user/harshchouraria/def


    $ hadoop dfs -ls '*'
    -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
    /user/harshchouraria/_def
    -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
    /user/harshchouraria/def

    $ # No dir results! ^^

    $ hadoop jar myjar.jar # (My code)
    hdfs://localhost/user/harshchouraria/_abc
    hdfs://localhost/user/harshchouraria/_def
    hdfs://localhost/user/harshchouraria/abc
    hdfs://localhost/user/harshchouraria/def

    I suppose that means globStatus is fine, but the FsShell.ls(…) code
    does something more than a simple glob status, and filters away
    directory results when used with a glob.
    On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao wrote:
    Is there a programmatic way to access these hidden files then?

    On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <edlinuxguru@gmail.com
    wrote:
    On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao wrote:

    We have a compression utility that tries to grab all subdirs to a directory
    on HDFS. It makes a call like this:
    FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

    and handles files vs dirs accordingly.

    We tried to run our utility against a dir containing a computed SOLR shard,
    which has files that look like this:
    -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
    -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
    -rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
    -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
    -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
    -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
    /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
    -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
    -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
    -rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
    /test/output/solr-20110901165238/part-00000/data/index/segments.gen
    -rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
    /test/output/solr-20110901165238/part-00000/data/index/segments_2


    The globStatus call seems only able to pick up those last 2 files; the
    several files that start with _ don't register.

    I've skimmed the FileSystem and GlobExpander source to see if there's
    anything related to this, but didn't see it. Google didn't turn up anything
    about underscores. Am I misunderstanding something about the regex patterns
    needed to pick these up or unaware of some filename convention in
    HDFS?
    Files starting with '_' are considered 'hidden' like unix files starting
    with '.'. I did not know that for a very long time because not everyone
    follows this rule or even knows about it.


    --
    Harsh J


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 2, '11 at 8:05p
activeSep 3, '11 at 8:40p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase