Hi Dermot.

See my comments in line.

Dermot Paikkos wrote:
I am stuck. I have thousands of jpegs in hundreds of folders and I
would like to do a section by section tally of the number and file
sizes of all the files.

I am using File::Find as it is quick and good at recursively
searching dirs. The files are stored in folder under a letter, then a
section number then the file. So A100/002 is stored in
/usr2/images/a/100/002.jpg. The 'a' will often be a symlink to a
different disk but that hasn't been a problem.

The problem is my section tallies are either getting zero'ed or
accumulating (depending on where I re-initalise the variables). So
instead of getting:

a Size: 6460.38MB Number: 2367
b Size: 8022.31MB Number: 3034
I get:
a size: 6460.38MB number:2367
b size: 0MB number:0
e size: 0MB number:0

Heres what I have done:

============== JPEG_COUNT.PL =======

use File::Find;
Because you're defining your own find() subroutine you should
avoid importing File::Find's version. Change this line to

use File::Find 'finddepth';
#use strict;
Please take the comment off this line. I don't see anything in
your code that it would choke on, and it's enormously useful in
finding bugs.
use diagnostics;

my @dirs = qw(a b e g h m n p r s t v z);
my $images_root = "/usr2/images/";

my @d;
my $total_size = 0;
my $total_num = 0;

# Create an array of all the dirs to check.

foreach my $i (@dirs) {

my $toplevel = "$images_root"."$i"."/";

You may like:

my @d = map "$images_root$_/", @dirs;
# Cycle through dirs and get listing.

foreach my $dir (@d) {
my $sect_sz = 0;
my $file_size = 0;
my $sect_size = 0;
my $sect_num = 0;
These variables need to be declared /outside/ the foreach loop
and initialised to zero in the place they are now. Like this:

my $sect_sz;
my $file_size;
my $sect_size;
my $sect_num;

foreach my $dir (@d) {

$sect_sz = 0;
$file_size = 0;
$sect_size = 0;
$sect_num = 0;

finddepth({wanted => \&find, follow=>1},$dir);
This subroutine should really be /outside/ the foreach loop.
I suspect you've put it here so that it can see the four
lexical variables above, but in fact what you've done is
written a closure. The subroutine will hang on to the
/first/ set of variables that it sees while the containing loop
happily goes ahead and allocates a new set each time around.
That's the reason for your results: find() carries on modifying
the same set of values, but no code can see those values after
the first iteration except for find() itself.

Moving the variable declarations outside the loop will fix this
on its own, but you may as well move the subroutine back to its
conventional place at the end of the program.
sub find {

if ($_ !~ /jpg/) {
You might want to be a little stricter with this regex and use

next unless /\.jpe?g$/;

which checks that the file name ends with .jpg or .jpeg instead
of just containing 'jpg' somwhere.
my ($sz) = ((stat($_))[7] * 0.000001);
Since a Kbyte is 1024 bytes, a Mbyte is 1024 * 1024 bytes.
You also don't need the brackets, so:

my $sz = (stat)[7] / 1024 / 1024;
$file_size = sprintf("%.2f",$sz);

$sect_sz += $file_size;
$sect_size = sprintf("%.2f",$sect_sz);
You may as well accumulate $sect_size in bytes and then divide
it down once all the files have been seen. You'll get less
truncation errors that way as well.
# print "DEBUG: \$_=$_, $sect_size\n";


my ($let) = substr($dir,-2,1);
print "$let size: ${sect_size}MB number:$sect_num\n";



I could side-step this and simply do `du -k /path/to/jpegs` but I
would like to know what I am doing wrong.
I hope that helps. It's the first time I've known someone accidentally
write a closure. The problem is normally that people can't code it
right when they actually want one! ;-)



Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupbeginners @
postedMay 14, '04 at 5:35p
activeMay 14, '04 at 8:43p

2 users in discussion

Rob Dixon: 1 post Dermot Paikkos: 1 post



site design / logo © 2022 Grokbase