FAQ
I've managed to confuse myself thoroughly working on a project.

I hoped to read a list of -f type files into a hash of
File::Find::name and $_.

But kept coming out with different counts than I get with shell find.

I tried writing a script that tested things various ways, but can't
really reconcile the differences I see.

I've been tinkering endlessly here so there may some claptrap in the
script.

First the output, then the script below it:

deleting ./h1
deleting ./h2
deleting ./ckarr
deleting ./nf
------- -------
./h1:
lines: 631
size: 32768
------- -------
./h2:
lines: 485
size: 24576
------- -------
./ckarr:
lines: 623
size: 20480
------- -------
./nf:
lines: 628
size: 32768
------- -------
------- --------- ---=--- --------- --------
Now from cmd line:
reader > wc -l h1 h2 ckarr nf
647 h1
565 h2
647 ckarr
647 nf
2506 total

------- --------- ---=--- --------- --------

#!/usr/local/bin/perl


use strict;
use warnings;
use File::Find;
use Cwd;

my %h1;
my %h2;
my $targ = shift;
my @ckarr;

my $h1ff = './h1';
my $h2ff = './h2';
my $ckff = './ckarr';
my $nf = './nf';

for($h1ff,$h2ff,$ckff,$nf){
if (-f $_){
print "deleting $_\n";
unlink $_ or die "Can't unlink $_ : $!";
}
}

open my $nfh,'>',$nf or die "Cannot open $nf: $!";

find({ wanted =>
sub {
my $dir = getcwd;
if(-f $dir . '/' . $_) {
print $nfh "V $File::Find::name K $_\n";
$h1{$File::Find::name} = $_;
$h2{$_} = $File::Find::name;
push @ckarr, $File::Find::name;
}
},no_chdir =>0,
},
$targ
);
open my $h1fh,'>',$h1ff or die "Can't open $h1ff: $!";
open my $h2fh,'>',$h2ff or die \"Cannot open $h2ff: $!";
open my $ckfh,'>',$ckff or die \"Cannot open $ckff: $!";

foreach my $key (keys %h1){
print $h1fh "V $h1{$key} K $key\n";
}

foreach my $key (keys %h2){
print $h2fh "V $h2{$key} K $key\n";
}

for(@ckarr){
print $ckfh "$_\n";
}

print " ------- -------\n";
for($h1ff,$h2ff,$ckff,$nf){
if (-f $_){
my $file = $_;
my $sz = (stat($_))[7];
open my $fh,'<', $_ or die "Can't open $_: $!";
my $lines;
while (<$fh>) {
$lines = $.;
}
print " $file:
lines: $lines
size: $sz\n";
close $fh;
}
print " ------- -------\n";
}

close $h1fh;
close $h2fh;
close $ckfh;

Search Discussions

  • Shawn H Corey at Apr 25, 2010 at 8:14 pm

    Harry Putnam wrote:
    #!/usr/local/bin/perl


    use strict;
    use warnings;
    use File::Find;
    use Cwd;

    my %h1;
    my %h2;
    my $targ = shift;
    my @ckarr;

    my $h1ff = './h1';
    my $h2ff = './h2';
    my $ckff = './ckarr';
    my $nf = './nf';

    for($h1ff,$h2ff,$ckff,$nf){
    if (-f $_){
    print "deleting $_\n";
    unlink $_ or die "Can't unlink $_ : $!";
    }
    }

    open my $nfh,'>',$nf or die "Cannot open $nf: $!";

    find({ wanted =>
    sub {
    my $dir = getcwd;
    if(-f $dir . '/' . $_) {
    print $nfh "V $File::Find::name K $_\n";
    $h1{$File::Find::name} = $_;
    $h2{$_} = $File::Find::name;
    push @ckarr, $File::Find::name;
    }
    },no_chdir =>0,
    },
    $targ
    );
    open my $h1fh,'>',$h1ff or die "Can't open $h1ff: $!";
    open my $h2fh,'>',$h2ff or die \"Cannot open $h2ff: $!";
    open my $ckfh,'>',$ckff or die \"Cannot open $ckff: $!";

    foreach my $key (keys %h1){
    print $h1fh "V $h1{$key} K $key\n";
    }

    foreach my $key (keys %h2){
    print $h2fh "V $h2{$key} K $key\n";
    }

    for(@ckarr){
    print $ckfh "$_\n";
    }
    close $h1fh or die "could not close $h1: $!\n";
    close $h2fh or die "could not close $h2: $!\n";
    close $ckfh or die "could not close $ckarr: $!\n";
    print " ------- -------\n";
    for($h1ff,$h2ff,$ckff,$nf){
    if (-f $_){
    my $file = $_;
    my $sz = (stat($_))[7];
    open my $fh,'<', $_ or die "Can't open $_: $!";
    my $lines;
    while (<$fh>) {
    $lines = $.;
    }
    print " $file:
    lines: $lines
    size: $sz\n";
    close $fh;
    }
    print " ------- -------\n";
    }
    Output is buffered. The files have to be closed for the last lines to
    be printed to the file.


    --
    Just my 0.00000002 million dollars worth,
    Shawn

    Programming is as much about organization and communication
    as it is about coding.

    I like Perl; it's the only language where you can bless your
    thingy.

    Eliminate software piracy: use only FLOSS.
  • Harry Putnam at Apr 25, 2010 at 9:37 pm
    Shawn H Corey writes:
    close $h1fh or die "could not close $h1: $!\n";
    close $h2fh or die "could not close $h2: $!\n";
    close $ckfh or die "could not close $ckarr: $!\n";
    Output is buffered. The files have to be closed for the last lines to
    be printed to the file.
    I was pretty sure it would be some basic tenet, I'd over
    looked. That happens a lot for me it seems.

    Thanks for you time, that made all the difference. After moving the
    closures, line counts started to match up.


    I do have another question that was only in the background of my first
    post.

    Is there a canonical way to read a bunch of -f type files into a hash?

    I want the end name `$_' on one side and full name `File::Find::name'
    on the other...

    what happens is the keys start slaughtering each other if you get it
    the wrong way round... and even when it ends up right... I wonder
    there may still be some chance of names canceling

    Doing it like this:

    $h1{$File::Find::name} = $_

    So far, has agreed with the count I see from `wc -l'. I'd like to
    know for sure though if that is a reliable way to do it?

    And is there some kind of handy way to turn a hash into a scalar like
    can be done to arrays with File::Slurp

    What I'm after is a way to grep the list of full names using the
    endnames of a similar but not identical list, in order to discover
    which names are in the longer list, but not the shorter list.

    Writing it to file is one way. And it seem likely to be the better
    way really since the lists can be pretty long.

    I wondered if this can all be done in the script with hashes somehow.

    Here is roughly what I've tried with already compiled lists in disk
    files.

    So far, I've used the longer list in a for loop one line at a time to
    check for the the endname in the longer list as a slurped file.

    just briefly.. (code is just made up to show the goal
    something like (this is not working code[yet]):

    use strict;
    use warnings;
    use File::Slurp;
    use File::Find;
    use Cwd;

    my $file = './file';
    my $var = 'gnus';

    [...]

    my @longar = read_file($longarfile);
    my $shortlist = read_file($shortfile);

    open my $gclfh,'>',$file or die "Can't open <$file>: $!";

    for(@longar){
    chomp;
    ## isolate just the endname for our purposes
    my ($endname) = $_ =~ m/.*\/([^\/]+)$/;

    my $ematch;

    ## No match even on the end name... then we write to
    ## file
    ## use \b instead of $ since its a slurped list
    if (($ematch) = $shortlist =~ m/(.*$endname)\b/) {
    print "$_ MATCHES $ematch\n";
    }else {
    ##
    ( my $adjusted_line = $_) =~ s/^\.*\/*$var\///;
    print "NO MATCH ON $_, writing to <$file>\n";
    print $gclfh "$adjusted_line\n";
    }
    }

    close $gclfh;
  • Jim Gibson at Apr 25, 2010 at 11:46 pm

    At 4:37 PM -0500 4/25/10, Harry Putnam wrote:
    I do have another question that was only in the background of my first
    post.

    Is there a canonical way to read a bunch of -f type files into a hash?
    I take it you mean add the file names to a hash, not the file contents.
    I want the end name `$_' on one side and full name `File::Find::name'
    on the other...
    The "end name" is called the "file name". What comes before the file
    name is referred to as the "directory" or "directory path". The whole
    string is referred to as the "path" or "full path".
    what happens is the keys start slaughtering each other if you get it
    the wrong way round... and even when it ends up right... I wonder
    there may still be some chance of names canceling
    Hash keys must be unique. If you are worried about key collision (two
    keys the same), always test whether a key already exists before
    inserting it into a hash.
    Doing it like this:

    $h1{$File::Find::name} = $_

    So far, has agreed with the count I see from `wc -l'. I'd like to
    know for sure though if that is a reliable way to do it?

    That is the reliable way to generate a hash of all files in a
    directory tree. Since full paths must be unique on a system (else how
    could the operating system find the file?), a full path specification
    must be unique. The reverse (inverse, obverse?) is not true: because
    of links and aliases, two full path strings could refer to the same
    file.
    And is there some kind of handy way to turn a hash into a scalar like
    can be done to arrays with File::Slurp
    Arrays can be transformed to scalars by the join function.
    File::Slurp can either return the contents of a file as a single
    scalar or as an array, one line per array element. It doesn't really
    turn an array into a scalar.
    What I'm after is a way to grep the list of full names using the
    endnames of a similar but not identical list, in order to discover
    which names are in the longer list, but not the shorter list.

    Hashes are the best data structure to use for this purpose.
    Writing it to file is one way. And it seem likely to be the better
    way really since the lists can be pretty long.

    I wondered if this can all be done in the script with hashes somehow.

    I suggest you try implementing an algorithm using hashes. Your method
    (looking for substrings in a string containing all file names), is
    needlessly inefficient and prone to error).

    --
    Jim Gibson
    Jim@Gibson.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedApr 25, '10 at 7:25p
activeApr 25, '10 at 11:46p
posts4
users3
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase