Getting information about a very large directory
I ended up in Linux 32,000 subdirectory. This caused problems with my PHP scripts and I don't want this to happen again.
A simple solution is to have my PHP scripts check the current subdirectory count before trying to create another subdirectory.
All the ideas I have seen for performing such a check include iterating over the entire directory and counting each folder. Considering my problem is with very large directories, is there a better way to get the number of files / folders it contains?
Bonus question: is there also an iterative way to find the disk usage of a directory?
Thanks in advance! Brian
source to share
Your best bet is to design your directory so that you never have 32,000 files in one directory. In fact, I would assume that even 1000 files in a directory are too many.
The approach I usually take to this problem is with additional levels of the directory hierarchy. The typical way is to use the filenames that you are currently storing in the same directory and split them into chunks corresponding to the subdirectories. So, if you have a bunch of files like
xyzzy
foo
bar
blah
you can store them like
x/xyzzy f/foo b/bar b/blah
etc. You can extend this to multiple directory levels, or use more than one character to separate subdirectories to trade the depth and breadth of this approach.
You will likely receive suggestions that you are using a filesystem that does not have a 32k file limit. Personally, even with such a filesystem, I would always use the schema as I suggest here. It is nearly impossible to work effectively with command line tools in directories with very large numbers of files (even ls
becoming completely unwieldy), and such manual research is always necessary during development, debugging, and often occasionally during normal operation.
source to share