I'm looking at storing a large number of files under one directory.
I started to break the files into subdirectories out of habit (from working on ntfs/etc), but it occurred to me that maybe (from a performance perspective), it doesn't really matter on hdfs.
Does it? Is there some recommended limit on the number of files to store in one directory on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything, but a lot.
Or is it only limited by my sanity :) ?
I suppose it would come down to the data structure(s) used by the namenode when tracking file metadata. But I don't know what those are - I did skim the HDFS architecture document, but didn't see anything conclusive.