FAQ
Hi,

I have a question about practical limit on number of files per hdfs
directory. (what's the hard limit btw?)

What is a practical limit on a # of files in a hadoop directory so
that glob selection still works efficiently (by efficiently i mean
under 30 seconds?)

We have something like daily log registry where every day is
represented by n files. then we use globs for defining inputs for MR
jobs that run over certain period of time. (Usually jobs do not select
more than just a few days).

So, do you have any heuristics as for after what number of files such
approach would become a problem?

Thank you very much in advance.

-Dmitriy

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMay 19, '11 at 9:51p
activeMay 19, '11 at 9:51p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Dmitriy Lyubimov: 1 post

People

Translate

site design / logo © 2022 Grokbase