FAQ
Greetings,

Over on the Nutch dev list there's been a bit of chatter about
files/folders generated by Hadoop (generally speaking 0.17) such as
'_logs'. Digging through Hadoop a bit revealed that there is a
private hiddenFileFilter in FileInputFilter.java. This is similar to
what I came up with here:
http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg08214.html

I couldn't find any other mention of "hidden" files in the Hadoop
codebase or the wiki and little on the mailing lists.

Is there a defined standard for hidden files or a public interface for
determining file visibility?

Thanks!

-lincoln

--
lincolnritter.com

Search Discussions

  • Doug Cutting at Jun 27, 2008 at 5:12 pm

    Lincoln Ritter wrote:
    Is there a defined standard for hidden files or a public interface for
    determining file visibility?
    MapReduce's FileInputFormat, and its many subclasses, ignore files and
    directories whose names begin with either "." or "_". However FsShell's
    'ls' and 'lsr' commands do not currently hide any files, nor do any
    other parts of Hadoop, so far as I can recall.

    Doug
  • Lincoln Ritter at Jun 27, 2008 at 5:23 pm
    Thanks.

    I can see from the private hiddenFileFilter (used by listPaths) that
    '.' and '_' prefixed stuff is considered hidden, I just want to make
    sure that this is "standard".

    I'm working on getting Nutch 0.9 working with Hadoop 0.17 and hidden
    files ("_logs") have been causing some issues. Granted, you can
    configure around this, but I've been looking for other solutions as
    well.

    If the hidden file behavior is well defined, it would be nice to
    provide documentation, and a public interface for determining file
    visibility. Seems to me that splitting off 'hiddenFileFilter' into
    its own class or providing an accessor would be sufficient.

    -lincoln

    --
    lincolnritter.com


    On Fri, Jun 27, 2008 at 10:11 AM, Doug Cutting wrote:
    Lincoln Ritter wrote:
    Is there a defined standard for hidden files or a public interface for
    determining file visibility?
    MapReduce's FileInputFormat, and its many subclasses, ignore files and
    directories whose names begin with either "." or "_". However FsShell's
    'ls' and 'lsr' commands do not currently hide any files, nor do any other
    parts of Hadoop, so far as I can recall.

    Doug
  • Doug Cutting at Jun 27, 2008 at 6:38 pm

    Lincoln Ritter wrote:
    I can see from the private hiddenFileFilter (used by listPaths) that
    '.' and '_' prefixed stuff is considered hidden, I just want to make
    sure that this is "standard".
    Yes, it is standard for mapreduce input and output directories.
    I'm working on getting Nutch 0.9 working with Hadoop 0.17 and hidden
    files ("_logs") have been causing some issues. Granted, you can
    configure around this, but I've been looking for other solutions as
    well.

    If the hidden file behavior is well defined, it would be nice to
    provide documentation, and a public interface for determining file
    visibility. Seems to me that splitting off 'hiddenFileFilter' into
    its own class or providing an accessor would be sufficient.
    If Nutch cannot extend FileInputFormat then, yes, we should make this
    filter public. If that's the case, please submit a patch.

    Thanks,

    Doug

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 26, '08 at 10:35p
activeJun 27, '08 at 6:38p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Lincoln Ritter: 2 posts Doug Cutting: 2 posts

People

Translate

site design / logo © 2022 Grokbase