FAQ
Hi Gurus,

I am stuck. I have thousands of jpegs in hundreds of folders and I
would like to do a section by section tally of the number and file
sizes of all the files.

I am using File::Find as it is quick and good at recursively
searching dirs. The files are stored in folder under a letter, then a
section number then the file. So A100/002 is stored in
/usr2/images/a/100/002.jpg. The 'a' will often be a symlink to a
different disk but that hasn't been a problem.

The problem is my section tallies are either getting zero'ed or
accumulating (depending on where I re-initalise the variables). So
instead of getting:

a Size: 6460.38MB Number: 2367
b Size: 8022.31MB Number: 3034
......
I get:
a size: 6460.38MB number:2367
b size: 0MB number:0
e size: 0MB number:0

Heres what I have done:
============== JPEG_COUNT.PL =======
#!/bin/perl

use File::Find;
#use strict;
use diagnostics;

my @dirs = qw(a b e g h m n p r s t v z);
my $images_root = "/usr2/images/";

my @d;
my $total_size = 0;
my $total_num = 0;

# Create an array of all the dirs to check.

foreach my $i (@dirs) {

my $toplevel = "$images_root"."$i"."/";

push(@d,$toplevel);
}

# Cycle through dirs and get listing.

foreach my $dir (@d) {
my $sect_sz = 0;
my $file_size = 0;
my $sect_size = 0;
my $sect_num = 0;

finddepth({wanted => \&find, follow=>1},$dir);

sub find {

if ($_ !~ /jpg/) {
next;
}

++$sect_num;
my ($sz) = ((stat($_))[7] * 0.000001);
$file_size = sprintf("%.2f",$sz);

$sect_sz += $file_size;
$sect_size = sprintf("%.2f",$sect_sz);

# print "DEBUG: \$_=$_, $sect_size\n";


};

my ($let) = substr($dir,-2,1);
print "$let size: ${sect_size}MB number:$sect_num\n";

}

=========================

I could side-step this and simply do `du -k /path/to/jpegs` but I
would like to know what I am doing wrong.

Thanx.
Dp.

Search Discussions

  • Rob Dixon at May 14, 2004 at 8:43 pm
    Hi Dermot.

    See my comments in line.

    Dermot Paikkos wrote:
    I am stuck. I have thousands of jpegs in hundreds of folders and I
    would like to do a section by section tally of the number and file
    sizes of all the files.

    I am using File::Find as it is quick and good at recursively
    searching dirs. The files are stored in folder under a letter, then a
    section number then the file. So A100/002 is stored in
    /usr2/images/a/100/002.jpg. The 'a' will often be a symlink to a
    different disk but that hasn't been a problem.

    The problem is my section tallies are either getting zero'ed or
    accumulating (depending on where I re-initalise the variables). So
    instead of getting:

    a Size: 6460.38MB Number: 2367
    b Size: 8022.31MB Number: 3034
    ......
    I get:
    a size: 6460.38MB number:2367
    b size: 0MB number:0
    e size: 0MB number:0

    Heres what I have done:

    ============== JPEG_COUNT.PL =======
    #!/bin/perl

    use File::Find;
    Because you're defining your own find() subroutine you should
    avoid importing File::Find's version. Change this line to

    use File::Find 'finddepth';
    #use strict;
    Please take the comment off this line. I don't see anything in
    your code that it would choke on, and it's enormously useful in
    finding bugs.
    use diagnostics;

    my @dirs = qw(a b e g h m n p r s t v z);
    my $images_root = "/usr2/images/";

    my @d;
    my $total_size = 0;
    my $total_num = 0;

    # Create an array of all the dirs to check.

    foreach my $i (@dirs) {

    my $toplevel = "$images_root"."$i"."/";

    push(@d,$toplevel);
    }
    You may like:

    my @d = map "$images_root$_/", @dirs;
    # Cycle through dirs and get listing.

    foreach my $dir (@d) {
    my $sect_sz = 0;
    my $file_size = 0;
    my $sect_size = 0;
    my $sect_num = 0;
    These variables need to be declared /outside/ the foreach loop
    and initialised to zero in the place they are now. Like this:

    my $sect_sz;
    my $file_size;
    my $sect_size;
    my $sect_num;

    foreach my $dir (@d) {

    $sect_sz = 0;
    $file_size = 0;
    $sect_size = 0;
    $sect_num = 0;

    :
    }
    finddepth({wanted => \&find, follow=>1},$dir);
    This subroutine should really be /outside/ the foreach loop.
    I suspect you've put it here so that it can see the four
    lexical variables above, but in fact what you've done is
    written a closure. The subroutine will hang on to the
    /first/ set of variables that it sees while the containing loop
    happily goes ahead and allocates a new set each time around.
    That's the reason for your results: find() carries on modifying
    the same set of values, but no code can see those values after
    the first iteration except for find() itself.

    Moving the variable declarations outside the loop will fix this
    on its own, but you may as well move the subroutine back to its
    conventional place at the end of the program.
    sub find {

    if ($_ !~ /jpg/) {
    next;
    }
    You might want to be a little stricter with this regex and use

    next unless /\.jpe?g$/;

    which checks that the file name ends with .jpg or .jpeg instead
    of just containing 'jpg' somwhere.
    ++$sect_num;
    my ($sz) = ((stat($_))[7] * 0.000001);
    Since a Kbyte is 1024 bytes, a Mbyte is 1024 * 1024 bytes.
    You also don't need the brackets, so:

    my $sz = (stat)[7] / 1024 / 1024;
    $file_size = sprintf("%.2f",$sz);

    $sect_sz += $file_size;
    $sect_size = sprintf("%.2f",$sect_sz);
    You may as well accumulate $sect_size in bytes and then divide
    it down once all the files have been seen. You'll get less
    truncation errors that way as well.
    # print "DEBUG: \$_=$_, $sect_size\n";


    };

    my ($let) = substr($dir,-2,1);
    print "$let size: ${sect_size}MB number:$sect_num\n";

    }

    =========================

    I could side-step this and simply do `du -k /path/to/jpegs` but I
    would like to know what I am doing wrong.
    I hope that helps. It's the first time I've known someone accidentally
    write a closure. The problem is normally that people can't code it
    right when they actually want one! ;-)

    Cheers,

    Rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedMay 14, '04 at 5:35p
activeMay 14, '04 at 8:43p
posts2
users2
websiteperl.org

2 users in discussion

Rob Dixon: 1 post Dermot Paikkos: 1 post

People

Translate

site design / logo © 2022 Grokbase