FAQ
Have quite a lot of trouble getting my pea brain around working even
with simple hashs.

What I hope to do is compare hashes and a few different ways.

determine what is in one and not in the other for example.

These exmple hashes are supposed to represent file names.

The hashes are created by making the key the full path and file name,
and the value just the end filename

/some/path/name => name

I'm looking to compare hashes on the last elements
(in square brackets) /x/y/[z] .. not trying to match the full file
name, only the last part.
key value
/some/path/name => name
/some/other/path/name => name

would be a match

The idea is to determine what is one hash but not the other in terms
of values ... as above.

It was recommended to me to reader perldoc perllol.

After about the 4/5 paragraph, I'm already seeing red. A little
further along and I'm hopelessly confused .. and that is with a
simpler data structure... arrays.

Its going to take a while for me to understand these ideas I'm afraid.

I've been shown at least one example of using `exists' for something
like this, but not really understanding it..

So I've devised couple of example hashes and run each through a double
`foreach' loop to get the information I'm after.

But I have the nagging feeling there is some easier and less labor
intensive way to do this.

In the case below... I'm spinning though each hash one full loop for
each line of the other hash.... lots of spinning involved.

It appears to work, in so far as extracting what I wanted to know.

Output looks like:
,----
No match: (%h1) f1 ne anything in (%h2)
No match: (%h1) fa ne anything in (%h2)
Match: (%h1) fb eq (%h2) fb
Match: (%h1) fb eq (%h2) fb
Match: (%h1) f2 eq (%h2) f2
----- ---=--- ----- ---=--- -----
No match: (%h2) fc ne anything in (%h1)
Match: (%h2) fb eq (%h1) fb
No match: (%h2) fd ne anything in (%h1)
Match: (%h2) fb eq (%h1) fb
Match: (%h2) f2 eq (%h1) f2
`----

But, is there an easier way?
------- --------- ---=--- --------- --------

#!/usr/local/bin/perl

use strict;
use warnings;

my %h1 = (
'./b/f1' => 'f1',
'./b/c/fa' => 'fa',
'./b/l/c/f2' => 'f2',
'./b/g/f/r/fb' => 'fb'
);


my %h2 = (
'./b/fb' => 'fb',
'./b/c/fd' => 'fd',
'./b/l/c/f2' => 'f2',
'./b/g/f/r/fc' => 'fc',
'./b/g/h/r/fb' => 'fb'

);

## Trot all of %h2 through for every line of %h1
foreach my $h1val (values %h1) {
my $hit = 0;
foreach my $h2val (values %h2){
if($h1val eq $h2val){
$hit++;
printf "%-15s %s %s %s\n", "Match: (%h1)",$h1val, " eq (%h2)",$h2val;
}
}
if (!$hit) {
printf "%-15s %s %s %s\n","No match: (%h1)",$h1val, " ne ","anything in h2";
}
}

print "----- ---=--- ----- ---=--- -----\n";

## Trot all of %h1 through for every line of %h2
foreach my $h2val (values %h2) {
my $hit = 0;
foreach my $h1val (values %h1){
if($h2val eq $h1val){
$hit++;
print "Match: (%h2) $h2val eq (%h1) $h1val\n";
}
}
if (!$hit) {
print "No match: (%h2) $h2val ne anything in %h1\n";
}
}

Search Discussions

  • Shawn H Corey at May 2, 2010 at 5:31 pm

    Harry Putnam wrote:
    But, is there an easier way?
    Invert both hashes and find the keys in both inverses.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Data::Dumper;

    # Make Data::Dumper pretty
    $Data::Dumper::Sortkeys = 1;
    $Data::Dumper::Indent = 1;

    # Set maximum depth for Data::Dumper, zero means unlimited
    $Data::Dumper::Maxdepth = 0;

    my %h1 = (
    './b/f1' => 'f1',
    './b/c/fa' => 'fa',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fb' => 'fb'
    );


    my %h2 = (
    './b/fb' => 'fb',
    './b/c/fd' => 'fd',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fc' => 'fc',
    './b/g/h/r/fb' => 'fb'

    );

    my %inverse_h1 = invert( \%h1 );
    my %inverse_h2 = invert( \%h2 );

    # print 'h1: ', Dumper \%h1, \%inverse_h1;
    # print 'h2: ', Dumper \%h2, \%inverse_h2;

    for my $name ( keys %inverse_h1 ){
    if( exists $inverse_h2{$name} ){
    print "$name exists in both hashes:\n",
    Data::Dumper->Dump( [ $inverse_h1{$name}, $inverse_h2{$name}
    ], [ 'h1', 'h2' ] ),
    "\n";
    }
    }

    sub invert {
    my $h = shift @_;
    my %inv = ();

    while( my ( $k, $v ) = each %{ $h } ){
    push @{ $inv{$v} }, $k;
    }
    return %inv;
    }

    __END__


    --
    Just my 0.00000002 million dollars worth,
    Shawn

    Programming is as much about organization and communication
    as it is about coding.

    I like Perl; it's the only language where you can bless your
    thingy.

    Eliminate software piracy: use only FLOSS.
  • Harry Putnam at May 2, 2010 at 10:33 pm

    Shawn H Corey writes:

    Harry Putnam wrote:
    But, is there an easier way?
    Invert both hashes and find the keys in both inverses.
    [...]

    Thanks for the nice working script... Lots to learn there.

    But not sure how to get at the information I asked about with it.
    Maybe because there was a type in my post.

    Harry wrote:
    The idea is to determine what is one hash but not the other in terms
    ^
    in
    of values ... as above.

    Maybe once we have what is in them both, as your script does.
    Then maybe delete those elements from both h1 and h2, leaving what is
    in one but not the other?

    But no, that won't work completely because some of the names are
    removed in the inversion process. So I guess I'm still not quite
    seeing how to use this to get at what is in one, and not the other.

    But it is a lot nicer way to find what they have in common.
  • Shawn H Corey at May 3, 2010 at 12:43 pm

    Harry Putnam wrote:
    Shawn H Corey <shawnhcorey@gmail.com> writes:
    Harry Putnam wrote:
    But, is there an easier way?
    Invert both hashes and find the keys in both inverses.
    [...]

    Thanks for the nice working script... Lots to learn there.

    But not sure how to get at the information I asked about with it.
    Maybe because there was a type in my post.

    Harry wrote:
    The idea is to determine what is one hash but not the other in terms
    ^
    in
    of values ... as above.

    Harry Putnam wrote:
    Have quite a lot of trouble getting my pea brain around working even
    with simple hashs. >
    What I hope to do is compare hashes and a few different ways.
    >

    If you want to find those in one but no the other, change the if:

    for my $name ( keys %inverse_h1 ){
    if( exists $inverse_h2{$name} ){
    # print "$name exists in both hashes:\n",
    # Data::Dumper->Dump( [ $inverse_h1{$name},
    $inverse_h2{$name} ], [ 'h1', 'h2' ] ),
    # "\n";
    }else{
    print "$name exists in only h1\n";
    }
    }


    --
    Just my 0.00000002 million dollars worth,
    Shawn

    Programming is as much about organization and communication
    as it is about coding.

    I like Perl; it's the only language where you can bless your
    thingy.

    Eliminate software piracy: use only FLOSS.
  • Harry Putnam at May 3, 2010 at 3:33 pm
    Shawn H Corey writes:


    [...]
    If you want to find those in one but no the other, change the if:

    for my $name ( keys %inverse_h1 ){
    if( exists $inverse_h2{$name} ){
    # print "$name exists in both hashes:\n",
    # Data::Dumper->Dump( [ $inverse_h1{$name},
    $inverse_h2{$name} ], [ 'h1', 'h2' ] ),
    # "\n";
    }else{
    print "$name exists in only h1\n";
    }
    }
    I guess you didn't mean to leave Data::Dumper[...] commented?

    With that uncommented, yes it does just that. Again thanks for the
    practical code.
  • Harry Putnam at May 11, 2010 at 8:52 pm

    Shawn H Corey writes:

    Harry Putnam wrote:
    But, is there an easier way?
    Invert both hashes and find the keys in both inverses.
    Shawn, hoping to pester you once more about this topic.

    first:

    Hashes involved are built like this (Using File::Find nomenclature):

    (NOT CODE... Just description)
    use File::Find;

    %d1h is made up like this

    this is key this is value
    $File::Find::name = $_
    # ./dir1/sub/fname = fname

    %d2h is made up the same way:
    this is key this is value
    $File::Find::name = $_
    # ./dir2/what/sub/fname = fname

    Also keeping in mind there are many thousands of lines in both hashs.

    There will be many many ways the path part of those names will differ,
    and only some will actually match on the ends (the values in hash
    terms) like the two above. Many more will be different, but the
    objective here is to find those that matches.

    I've found after many hours of tinkering and bugging the heck out of
    patient posters here, what I think you were trying to tell me in this
    thread. I wasn't capable yet of understanding it all. I'm still not
    but quite a lot more now has finally wormed into my pea brain.

    So cutting to the chase, I try to take advantage of your inversion
    code. It seems to work well, but I'm not confident enough to know
    if there are possible hidden gotchas someone with more experience
    might see right off the bat.

    ------- --------- ---=--- --------- --------
    Some selective output first:

    [...]

    d1 ./dir1/etc/images/gnus/exit-summ.xpm
    d2 (1) ./dir2/etc/images/gnus/exit-summ.xpm

    d1 ./dir1/etc/images/gnus/reply.xpm
    d2 (1) ./dir2/etc/images/mail/reply.xpm
    d2 (2) ./dir2/etc/images/gnus/reply.xpm

    d1 ./dir1/etc/images/gnus/README
    d2 (1) ./dir2/src/m/README
    d2 (2) ./dir2/etc/e/README
    [...]
    d2 (47) ./dir2/doc/lispintro/README

    d1 ./dir1/lisp/gnus-util.el
    d2 (1) ./dir2/lisp/gnus/gnus-util.el

    [...]

    ------- 8< snip ---------- 8< snip ---------- 8<snip -------

    #!/usr/local/bin/perl

    use strict;
    use warnings;
    use File::Find;
    #use diagnostics;

    my %d1h;
    my %d2h;
    my $d1tag = 'd1';
    my $d2tag = 'd2';

    ## Make sure we are feed two directory names
    ( my ( $d1, $d2 ) = @ARGV ) == 2
    or die "\nUsage: $0 ./dir1 ./dir2\n";

    ## Make sure incoming directory names exist
    for ($d1, $d2 ){
    ( -d ) or die "<$_> cannot be found on the file system";
    }

    ## Build the hashs

    find sub {
    return unless -f;
    $d1h{ $File::Find::name } = $_;
    },$d1;

    find sub {
    return unless -f;
    $d2h{ $File::Find::name } = $_;
    },$d2;

    ## Invert 1 hash and it needs to be the second one on cmd line
    my %inv_d2h = invert( \%d2h );

    sub invert {
    my $h = shift @_;
    my %inv = ();

    while( my ( $k, $v ) = each %{ $h } ){
    push @{ $inv{$v} }, $k;
    }
    return %inv;
    }

    ## Could have used `values' of %d1h here for $value (values %d1h)
    ## and avoided things like `@{ $inv_d2h{ $d1h{ $key } } }',
    ## which would then be `@{ $inv_d2h{ $value } }' but would
    ## not then have such ready access to the `keys' which are
    ## needed here too, and will be needed later on (not shown here)
    ## for now we just print to show how it works.

    foreach my $key ( keys %d1h ){
    if(exists $inv_d2h{ $d1h{ $key } }){
    print " $d1tag $key\n";

    ## separate counter to keep (my) confusion down
    my $matchcnt = 0;
    for ( @{ $inv_d2h{ $d1h{ $key } } } ) {
    print " $d2tag (" . ++$matchcnt .") $_\n";
    }
    print "\n";
    }
    }
  • Jim Gibson at May 11, 2010 at 10:13 pm
    On 5/11/10 Tue May 11, 2010 1:52 PM, "Harry Putnam" <reader@newsguy.com>
    scribbled:
    Shawn H Corey <shawnhcorey@gmail.com> writes:
    Harry Putnam wrote:
    But, is there an easier way?
    Invert both hashes and find the keys in both inverses.
    Shawn, hoping to pester you once more about this topic.
    It is not fair to single out Shawn for help. Just post your question and
    hope for a response.
    %d1h is made up like this

    this is key this is value
    $File::Find::name = $_
    The use of the equal sign '=' in the above makes it look like you are
    assigning a value to $File::Find::name. It is better to stick to Perl
    syntax:

    $d1h{$File::Find::name} = $_;

    for example:

    $d1h{'./dir1/sub/fname'} = 'fname';
    So cutting to the chase, I try to take advantage of your inversion
    code. It seems to work well, but I'm not confident enough to know
    if there are possible hidden gotchas someone with more experience
    might see right off the bat.

    ------- --------- ---=--- --------- --------
    Some selective output first:

    [...]

    d1 ./dir1/etc/images/gnus/exit-summ.xpm
    d2 (1) ./dir2/etc/images/gnus/exit-summ.xpm

    d1 ./dir1/etc/images/gnus/reply.xpm
    d2 (1) ./dir2/etc/images/mail/reply.xpm
    d2 (2) ./dir2/etc/images/gnus/reply.xpm

    d1 ./dir1/etc/images/gnus/README
    d2 (1) ./dir2/src/m/README
    d2 (2) ./dir2/etc/e/README
    [...]
    d2 (47) ./dir2/doc/lispintro/README

    d1 ./dir1/lisp/gnus-util.el
    d2 (1) ./dir2/lisp/gnus/gnus-util.el
    You can use the Unix file command (if you are on Unix) to find the files
    with a certain name:

    find dir1 -name README

    etc. to check the results of your program.

    Your program below looks fine. I see no obvious defects. You can generate
    the inverted hashes in the find routines (see below). You can use the
    inverted hashes only (see below).
    #!/usr/local/bin/perl

    use strict;
    use warnings;
    use File::Find;
    #use diagnostics;

    my %d1h;
    my %d2h;
    my $d1tag = 'd1';
    my $d2tag = 'd2';

    ## Make sure we are feed two directory names
    ( my ( $d1, $d2 ) = @ARGV ) == 2
    or die "\nUsage: $0 ./dir1 ./dir2\n";

    ## Make sure incoming directory names exist
    for ($d1, $d2 ){
    ( -d ) or die "<$_> cannot be found on the file system";
    }

    ## Build the hashs
    my( %inv_d1h, %inv_d2h );
    find sub {
    return unless -f;
    $d1h{ $File::Find::name } = $_;
    push( @{$inv_d1h{$_}}, $File::Find::name );
    },$d1;

    find sub {
    return unless -f;
    $d2h{ $File::Find::name } = $_;
    push( @{$inv_d2h{$_}}, $File::Find::name );
    },$d2;

    ## Invert 1 hash and it needs to be the second one on cmd line
    my %inv_d2h = invert( \%d2h );

    sub invert {
    my $h = shift @_;
    my %inv = ();

    while( my ( $k, $v ) = each %{ $h } ){
    push @{ $inv{$v} }, $k;
    }
    return %inv;
    }

    ## Could have used `values' of %d1h here for $value (values %d1h)
    ## and avoided things like `@{ $inv_d2h{ $d1h{ $key } } }',
    ## which would then be `@{ $inv_d2h{ $value } }' but would
    ## not then have such ready access to the `keys' which are
    ## needed here too, and will be needed later on (not shown here)
    ## for now we just print to show how it works.

    foreach my $key ( keys %d1h ){
    if(exists $inv_d2h{ $d1h{ $key } }){
    print " $d1tag $key\n";

    ## separate counter to keep (my) confusion down
    my $matchcnt = 0;
    for ( @{ $inv_d2h{ $d1h{ $key } } } ) {
    print " $d2tag (" . ++$matchcnt .") $_\n";
    }
    print "\n";
    }
    }
    Using only "inverted" hashes (untested):

    for my $file ( sort keys %inv_d1h ) {
    if( exists $inv_d2h{$file} ) {
    print "Duplicate file names found: ", scalar @{$inv_d1h{$file}},
    " in $d1 and ", scalar @{$inv_d2h{$file}}, " in $d2\n";
    print "\n$d1:\n ", join("\n ",@{inv_d1h{$file}}), "\n";
    print "\n$d2:\n ", join("\n ",@{inv_d2h{$file}}), "\n";
    }
    }
  • Harry Putnam at May 12, 2010 at 4:12 am
    Jim Gibson writes:

    Harry wrote:
    Shawn, hoping to pester you once more about this topic.
    Jim G responded:
    It is not fair to single out Shawn for help. Just post your question
    and hope for a response.
    Just a manner of speaking, but you're right it does appear to be a
    little off the wall. It wasn't my intent to elicit responses from
    only Shawn... It was just that he was the one who authored the nifty
    inversion sub routine.

    [...]

    Jim wrote:
    Using only "inverted" hashes (untested):

    for my $file ( sort keys %inv_d1h ) {
    if( exists $inv_d2h{$file} ) {
    print "Duplicate file names found: ", scalar @{$inv_d1h{$file}},
    " in $d1 and ", scalar @{$inv_d2h{$file}}, " in $d2\n";
    print "\n$d1:\n ", join("\n ",@{inv_d1h{$file}}), "\n";
    print "\n$d2:\n ", join("\n ",@{inv_d2h{$file}}), "\n";
    }
    }
    Haven't gotten to try this yet... but it made me wonder right off why
    I'd want to do that. I'm on my way out the door in a moment but
    curious now... about your reasoning.

    Also eager to put that into a real script and try it out. You've made
    a neat job of it.

    Is there some gain in doing it with both hashes inverted? Does it
    simplify things in some way, or is it mainly an example of another way
    to go at it?

    It doesn't seem to take much time at all to do the inversion, seems
    more or less instantaneous in fact. That surprised me a bit, when one
    of those hashs has something like 4000 lines.

    I noticed quite a marked gain comparing my original approach to the
    code you were responding to in your message above.

    Originally went something like this (not actual code):

    foreach my $keyd1 (keys %d1h){

    foreach my $keysd2 (keys %d2h){
    if($d1h{ $keyd1 } eq $d2h{ $keyd2 } {
    push @matches, $d2h{ $keyd2 };
    }
    }
    ## process @matches
    dispatch_table($dh1{ $keyd1 }, @matches);
    }

    That really puts the whammy on resources since it marches some 4000
    possible matches for each line of %d1h (which is a nearly 2000 lines)
    through the gauntlet. 8,000,000 or so lines in the actual event.
  • Shawn H Corey at May 11, 2010 at 11:31 pm

    On 10-05-11 04:52 PM, Harry Putnam wrote:
    Some selective output first:

    [...]

    d1 ./dir1/etc/images/gnus/exit-summ.xpm
    d2 (1) ./dir2/etc/images/gnus/exit-summ.xpm

    d1 ./dir1/etc/images/gnus/reply.xpm
    d2 (1) ./dir2/etc/images/mail/reply.xpm
    d2 (2) ./dir2/etc/images/gnus/reply.xpm

    d1 ./dir1/etc/images/gnus/README
    d2 (1) ./dir2/src/m/README
    d2 (2) ./dir2/etc/e/README
    [...]
    d2 (47) ./dir2/doc/lispintro/README

    d1 ./dir1/lisp/gnus-util.el
    d2 (1) ./dir2/lisp/gnus/gnus-util.el

    [...]
    If you want the output only in that format, construct your internal
    representation to make it easy to process.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Data::Dumper;

    # Make Data::Dumper pretty
    $Data::Dumper::Sortkeys = 1;
    $Data::Dumper::Indent = 1;

    # Set maximum depth for Data::Dumper, zero means unlimited
    local $Data::Dumper::Maxdepth = 0;

    use File::Find;

    my %Paths_for = ();
    my $Base_dir;
    my $Table_format = "%-20s %10s %s\n";

    sub wanted {
    if( -f ){
    push @{ $Paths_for{$_}{$Base_dir} }, $File::Find::name;
    }
    }

    for my $dir ( @ARGV ){
    $Base_dir = $dir;
    find( \&wanted, $Base_dir );
    }

    # print '%Paths_for: ', Dumper \%Paths_for;

    for my $fname ( sort keys %Paths_for ){
    my $count = '';
    for my $base_dir ( sort keys %{ $Paths_for{$fname} } ){
    for my $path ( @{ $Paths_for{$fname}{$base_dir} } ){
    my $enclosed_count = '';
    $enclosed_count = "($count)" if $count;
    printf $Table_format, $base_dir, $enclosed_count, $path;
    $count ++;
    }
    }
    print "\n";
    }

    __END__


    --
    Just my 0.00000002 million dollars worth,
    Shawn

    Programming is as much about organization and communication
    as it is about coding.

    I like Perl; it's the only language where you can bless your
    thingy.

    Eliminate software piracy: use only FLOSS.
  • Harry Putnam at May 12, 2010 at 4:19 am
    Shawn H Corey writes:

    Oh nice... thanks. Hope I can get to try this out later tonight... I
    have to go out for a while and can't get to it right now though.

    The main `for loop' near the end, and really, all of it, looks to be
    highly portable like the inversion code was.... I think that little
    inversion sub function could be dropped in about anywhere in code
    using hashs and just go work with no fuss.
  • Dr.Ruud at May 2, 2010 at 11:39 pm

    Harry Putnam wrote:

    What I hope to do is compare hashes and a few different ways.

    determine what is in one and not in the other for example.

    These exmple hashes are supposed to represent file names.

    The hashes are created by making the key the full path and file name,
    and the value just the end filename
    my %h1 = (
    './b/f1' => 'f1',
    './b/c/fa' => 'fa',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fb' => 'fb'
    );


    my %h2 = (
    './b/fb' => 'fb',
    './b/c/fd' => 'fd',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fc' => 'fc',
    './b/g/h/r/fb' => 'fb'

    I think you want to find filenames that exist in multiple paths.

    So you could create a hash like this:

    (
    'f1' => [ 'p1', 'p4' ],
    'f2' => [ 'p1', 'p2', 'p4' ],
    ...
    )

    (read f as filename and p as pathname)

    In that way, you would have, per filename, a list of all paths.



    An example that shows you the commands that exist more than once in your
    path:

    perl -Mstrict -MData::Dumper -MFile::Basename=basename,dirname -wle'
    my %h;
    for my $dir ( split ":", $ENV{PATH} ) {
    push @{ $h{ basename($_) } }, dirname($_) for glob "$dir/*";
    }
    @{ $h{ $_ } } < 2 and delete $h{ $_ } for keys %h;
    print Dumper( \%h );
    ' |less


    Similar without File::Basename

    perl -Mstrict -MData::Dumper -wle'
    my %h;
    for my $dir ( split ":", $ENV{PATH} ) {
    m{(.*/)(.*)} and push @{ $h{ $2 } }, $1 for glob "$dir/*";
    }
    @{ $h{ $_ } } < 2 and delete $h{ $_ } for keys %h;
    print Dumper( \%h );
    ' |less


    --
    Ruud
  • Harry Putnam at May 3, 2010 at 4:35 pm
    "Dr.Ruud" <rvtol+usenet@isolution.nl> writes:

    [...] snipped poorly written example
    I think you want to find filenames that exist in multiple paths.
    As John K. has noted... my example was misleading. If you were to
    prepend a different root directory to each list of filenames in my
    example it would be much more like what I'm trying to figure out.

    But your guess still stands. Only doesn't quite get to the scope of the
    problem. Yes, files that exist on multiple paths, but there are also
    many matched names that are not actually the same file.

    Its actually a little more complex than that... since some of the
    matches are the right file but do have slight differences that will
    show in the sizes. Not so many like that... but I have seen a few so
    far.

    The matched files that do exist on different paths will eventually be
    run through a dispatch table to ..er,, be dispatched.

    In my full script... that problem is dealt with in a dispatch table
    where one of the choices is to print the sizes of all matches.
    Allowing the user to see at a glance which file is likely to be the
    real match... and prepares (an external) diff... if there is still any
    doubt.

    So, many (most) of the matched files need to go through human hands
    for final actions.

    Some things the user is likely to know at a glance if presented with a
    full filename from hash1 and several matches from hash2. Whereas it
    would be quite hard (for me) to code an automated solution. Hence the
    dispatch table

    I'm not familiar enough with Data::Dumper and its output. All the
    brackets and stuff are confusing. In your examples... not so much.
    In fact not at all. (I'm talking about the output here. but in the
    code, it is confusing... I'm not sharp enough to just follow it
    without serious study)

    So I'll need to figure out how to fix data::Dumper output so matches
    can be listed and the list run through a dispatch table.

    In every case where there is a match, no matter how many, they'll need
    to be listed as numbered choices where typing a number will select the one
    that is useful, and typing a letter representing a sub function will
    dispatch those two files (one from h1, one from h2) as needed.

    I haven't worked out how the number gets passed into the function yet
    so have tried just listing the files and in some cases user can paste
    a filename into a function through a menu..

    (But that is a different subject)

    I have written such a table. Where all sub functions are a single
    letter.
    `sub A {blah}'
    etc

    Currently feeding it by collecting the matches to an end filename from
    hash1 of all matching files in hash2 into an array.

    (That array contains the same kind of information as your examples
    display.)

    Then print a numbered list and the menu (like snippet below) of the
    dispatch table (snippet below) and possible functions.

    ------- --------- ---=--- --------- --------

    my %dispt = (
    A => sub { print "File to add (enter name) > ";
    chomp(my $answer = <STDIN>);
    if (! -f $answer) {
    print "<$answer> not found on file system\n";
    print "No action taken\n";
    }else {
    print A($answer) . "\n";
    };
    },
    D => sub { print D(@matches) ."\n"; },
    [...]

    Z => sub { print Z(@matches) ."\n"; },
    [...]
    ------- --------- ---=--- --------- --------
    while ($cnt == 1) {
    print "press A to add to `add' list\n",
    "press D for external diff line to copy paste\n",
    [...]
    "press Z to see file sizes\n",
    [...]
    ------- --------- ---=--- --------- --------

    I'm doing that without involving Data::Dumper.

    Do you think it would be better done using Data::Dumper?

    First I'd have to learn how to use Data::Dumper

    Both you and Shawn have shown its great utility.
  • Jim Gibson at May 3, 2010 at 5:02 pm
    On 5/3/10 Mon May 3, 2010 9:35 AM, "Harry Putnam" <reader@newsguy.com>
    scribbled:

    [long problem description snipped]
    I'm doing that without involving Data::Dumper.

    Do you think it would be better done using Data::Dumper?

    First I'd have to learn how to use Data::Dumper
    The usual purpose of using Data::Dumper (DD) is to print out a complex data
    structure to see what is there. This is only for debugging purposes. Once
    you have a program working, you can delete all references to DD. However, it
    is usually better to comment out the lines that use DD or use a conditional
    flag to disable them, since you may need them in the future if you discover
    a problem.

    I don't think anybody is saying use Data::Dumper to implement your algorithm
    or simplify your program. (There is another reason to use Data::Dumper -- to
    store a data structure in a data file and reconstruct it in a later
    invocation of a program, but that is not being suggested here, and there are
    other modules that do that better.)

    So create your hashes and implement your testing algorithms without using
    Data::Dumper. Then use it to print your hashes if you are confused about
    what they contain.

    There is a package variable $Data::Dumper::Indent that you can set to change
    how Data::Dumper formats its output. The default value is 2. I sometimes use
    the following to get a more compact output:

    $Data::Dumper::Indent = 1;

    See the Data::Dumper documentation for other options that affect the output.
  • Harry Putnam at May 3, 2010 at 6:55 pm

    Jim Gibson writes:

    The usual purpose of using Data::Dumper (DD) is to print out a complex data
    What an excellent synopsis of how DD fits into stuff like this.
    Thanks a lot...

    I was on the verge of starting to really pound away trying to learn DD
    and how to use it. I mean making it the guts of the program... not as
    debugger.

    At least now I don't have the feeling I've had it completely wrong so far.
  • Uri Guttman at May 3, 2010 at 5:15 pm
    "HP" == Harry Putnam writes:
    HP> Its actually a little more complex than that... since some of the
    HP> matches are the right file but do have slight differences that will
    HP> show in the sizes. Not so many like that... but I have seen a few so
    HP> far.

    this is what has been bothering me here. you haven't yet spit out a
    proper problem specification. as i kept saying comparing dir trees is
    tricky and you kept showing incomplete examples which now all seem to be
    wrong as you just made a major change in the 'spec'. now duplicate
    filenames could actually be different files vs just copies in different
    places. this is a whole other mess of fish.

    so learn this before you get in deeper. always have a proper
    specification in ENGLISH before you design or code. a major change like
    this can cause a complete redesign of all your previous work.

    uri

    --
    Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
  • Harry Putnam at May 3, 2010 at 6:51 pm

    "Uri Guttman" <uri@StemSystems.com> writes:

    this is what has been bothering me here. you haven't yet spit out a
    proper problem specification. as i kept saying comparing dir trees is
    tricky and you kept showing incomplete examples which now all seem to be
    wrong as you just made a major change in the 'spec'. now duplicate
    filenames could actually be different files vs just copies in different
    places. this is a whole other mess of fish.
    None of what I've shown contradicts the full program... quit karping
    about it. Sloppy... yes, I have been. I wonder if that might be
    because I don't really have a good idea of what I'm doing or how to do
    it. What a surprise... that's why its called perl.beginners.
    so learn this before you get in deeper. always have a proper
    specification in ENGLISH before you design or code. a major change like
    this can cause a complete redesign of all your previous work.
    This is nonesense and absolutely wrong. Please don't give such poor
    advice to anyone else.

    Not a major change at all.. I've gotten much much farther along with
    what has been posted and responded to.

    I hope no one takes you seriously with that. Like many before me, I'm
    working this out as I go along. In the beginning I didn't know what a
    full program would look like. I took some time with the examples I
    tried to raise here ... much more than I should admit to. This stuff
    comes hard to me.

    Other helpful posters have shown clear working examples... most of
    them now appear in this program in one way or another.

    I've learned so much here that now I can probably put it fully in
    words. Maybe even ones you'd approve of.

    But could I have started there.... not a chance. So please engage
    some shred of common sense before routinely posting constant karping
    and even seriously wrong headed advice like this.
  • Dr.Ruud at May 3, 2010 at 7:20 pm

    Harry Putnam wrote:

    But could I have started there.... not a chance. So please engage
    some shred of common sense before routinely posting constant karping
    and even seriously wrong headed advice like this.
    You are very wrong here. Just put in a sentence what you really try to
    achieve. Not that I think I will spend more time replying to your
    postings, because I detest all the blabber in them.

    --
    Ruud
  • Harry Putnam at May 5, 2010 at 2:00 am

    "Dr.Ruud" <rvtol+usenet@isolution.nl> writes:

    Harry Putnam wrote:
    But could I have started there.... not a chance. So please engage
    some shred of common sense before routinely posting constant karping
    and even seriously wrong headed advice like this.
    You are very wrong here. Just put in a sentence what you really try to
    achieve. Not that I think I will spend more time replying to your
    postings, because I detest all the blabber in them.
    That's the beauty of usenet eh?

    Thanks for the time you did put into answers... they were always
    helpful.
  • Uri Guttman at May 3, 2010 at 7:56 pm
    "HP" == Harry Putnam writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    this is what has been bothering me here. you haven't yet spit out a
    proper problem specification. as i kept saying comparing dir trees is
    tricky and you kept showing incomplete examples which now all seem to be
    wrong as you just made a major change in the 'spec'. now duplicate
    filenames could actually be different files vs just copies in different
    places. this is a whole other mess of fish.
    HP> None of what I've shown contradicts the full program... quit karping
    HP> about it. Sloppy... yes, I have been. I wonder if that might be
    HP> because I don't really have a good idea of what I'm doing or how to do
    HP> it. What a surprise... that's why its called perl.beginners.

    true about your perl but this is more than that. you haven't described
    what the issues or goals are yet. how can you (or us) tell if the
    program is correct unless you have something to which to compare the
    results? that is what the goal is. you have changed it recently when you
    realized you had duplicate filenames with different paths and they were
    different files. you never stated what the actual purpose of this
    was. this is also called the XY problem (google it).
    so learn this before you get in deeper. always have a proper
    specification in ENGLISH before you design or code. a major change like
    this can cause a complete redesign of all your previous work.
    HP> This is nonesense and absolutely wrong. Please don't give such poor
    HP> advice to anyone else.

    nope. been doing this for 35 years and it is solid advice. you can't do
    a proper program unless you have a proper goal which is what the
    specification is.

    HP> Not a major change at all.. I've gotten much much farther along with
    HP> what has been posted and responded to.

    it is a major change. it affects how the dir tree comparison is to be
    done which is the heart of the application.

    HP> I hope no one takes you seriously with that. Like many before me, I'm
    HP> working this out as I go along. In the beginning I didn't know what a
    HP> full program would look like. I took some time with the examples I
    HP> tried to raise here ... much more than I should admit to. This stuff
    HP> comes hard to me.

    this is not about programming but understanding your problem and
    needs. that is the goal. do you drive around aimlessly until you
    actually go by a store which may have what you want?

    HP> Other helpful posters have shown clear working examples... most of
    HP> them now appear in this program in one way or another.

    in one way or the others. because they never got the full picture. some
    of the code has been thrown away which is a waste of everyone's
    time. think about that.

    HP> I've learned so much here that now I can probably put it fully in
    HP> words. Maybe even ones you'd approve of.

    sure but there are better ways to learn too. writing up a goal is one of
    them. even if it does change (and they do) they give you and others a
    target to shoot at. your specs have been very fluid and not clear as i
    keep saying.

    HP> But could I have started there.... not a chance. So please engage
    HP> some shred of common sense before routinely posting constant karping
    HP> and even seriously wrong headed advice like this.

    wrong again. you could easily have written up a short goal document.

    i have some directory trees which may have duplicate filenames and those
    files could be the same or actual different files. i want to scan the
    two trees, compare them and locate common filenames in both. then i want
    users to be able to choose from menu what to do with the files when dups
    are found.

    that is close to what you are doing. was that too hard to write?

    you are really closing your eyes here and just driving around. sure you
    have learned some coding skills but you aren't seeing the bigger
    picture. this will keep biting you until you discover it yourself with
    more experience. that deep experience is what i am offering here and you
    are refusing it. note that others aren't refuting what i am saying, they
    all work with goal documents at some time or other.

    uri

    --
    Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
  • Philip Potter at May 3, 2010 at 8:19 pm

    On 3 May 2010 20:56, Uri Guttman wrote:
    "HP" == Harry Putnam <reader@newsguy.com> writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    this is what has been bothering me here. you haven't yet spit out a
    proper problem specification. as i kept saying comparing dir trees is
    tricky and you kept showing incomplete examples which now all seem to be
    wrong as you just made a major change in the 'spec'. now duplicate
    filenames could actually be different files vs just copies in different
    places. this is a whole other mess of fish.
    HP> None of what I've shown contradicts the full program... quit karping
    HP> about it.  Sloppy... yes, I have been.  I wonder if that might be
    HP> because I don't really have a good idea of what I'm doing or how to do
    HP> it.   What a surprise... that's why its called perl.beginners. [snip]
    you could easily have written up a short goal document.

    i have some directory trees which may have duplicate filenames and those
    files could be the same or actual different files. i want to scan the
    two trees, compare them and locate common filenames in both. then i want
    users to be able to choose from menu what to do with the files when dups
    are found.

    that is close to what you are doing. was that too hard to write?

    you are really closing your eyes here and just driving around. sure you
    have learned some coding skills but you aren't seeing the bigger
    picture. this will keep biting you until you discover it yourself with
    more experience. that deep experience is what i am offering here and you
    are refusing it. note that others aren't refuting what i am saying, they
    all work with goal documents at some time or other.
    I have to support what Uri says here. You shouldn't start writing
    *any* code at all until you know what you want it to do. If you don't
    know what you want to do, you're groping in the dark. If you ask for
    help, people can't give you the help you need because they don't know
    what you want to do either.

    Every software engineering method incorporates this principle, usually
    as a step called "requirements analysis". It's a fancy way of saying
    "what do I want to do?"

    Phil
  • Harry Putnam at May 3, 2010 at 9:06 pm
    Philip Potter writes:

    [...]

    Both you and Uri are right to a degree. I have to respect Uris'
    experience, but in fact I have presented goals at every step in this
    thread. Uri just doesn't want to recognize them.

    1) how to find what files are in one tree but not the other
    I've received a few good techniqes to do that... goal accomplished.

    2) ditto but reverse Samething applies.

    3) how to find all the matches again, reponses supplied some good code
    that could be edited to my purposes.

    4) How to setup a dispatch table... involved in other threads more than
    here but still the same project (Again, good solid posts, more
    than one by Uri himself as I recall - goal accomplished)

    5) How to use Data::Dumper for this kind of stuff - goal accomplished
    I have to support what Uri says here. You shouldn't start writing
    *any* code at all until you know what you want it to do. If you don't
    know what you want to do, you're groping in the dark. If you ask for
    help, people can't give you the help you need because they don't know
    what you want to do either.
    I have done that at every step... and people have been thoroughly
    capable of giving sound help. Uri is talking through his hat.
    Every software engineering method incorporates this principle, usually
    as a step called "requirements analysis". It's a fancy way of saying
    "what do I want to do?"
    That is the kind of `always true' thing one might say. I forgot what
    the term is but it means its fairly meaningless and mainly sounds
    good. But none the less true.

    About driving around not knowing where I'm going. I'd wager a full
    pension check that everyone here has done that.... and maybe not so
    long ago.

    Uri asked would I do that.... A resounding YES. I'd first check a map
    or maybe make some calls (that is, research a bit) and then if I still
    wasn't sure... I'd go have a look, and try to find what ever it is I
    was after. That's exactly what most people would do if they really
    wanted to get somewhere or find something.

    Concerning people having wasted there time.... where has that
    happened? Literally every response has been put to work here. If not
    on this exact program then another.

    I really hope none of the responders feel they have wasted there time.
    If they do, that is MY fault... and likely at least partially because
    I didn't understand the information given me.

    I'll say for the record.... I've received very professional and
    thorough help here... Also, a truly amazing level of patience has been
    shown to me.

    I haven't always been able to understand the information
    presented... But I often keep these threads and refer back to them
    when I do begin to understand a little better.
  • Philip Potter at May 4, 2010 at 7:42 am

    On 3 May 2010 22:06, Harry Putnam wrote:
    Philip Potter <philip.g.potter@gmail.com> writes:

    [...]

    Both you and Uri are right to a degree.  I have to respect Uris'
    experience, but in fact I have presented goals at every step in this
    thread.  Uri just doesn't want to recognize them.

    1) how to find what files are in one tree but not the other
    I've received a few good techniqes to do that... goal accomplished.

    2) ditto but reverse  Samething applies.

    3) how to find all the matches again, reponses supplied some good code
    that could be edited to my purposes.

    4) How to setup a dispatch table... involved in other threads more than
    here but still the same project  (Again, good solid posts, more
    than one by Uri himself as I recall - goal accomplished)

    5) How to use Data::Dumper for this kind of stuff - goal accomplished
    Ok yes, that's pretty reasonable; i agree no message has gone wasted.

    But I'm not sure how much closer you are to solving your problem. If
    you're trying to identify duplicate files in a filesystem, I would
    think there's a better way of doing it than what you've got. At first
    you were only trying to find duplicate filenames, but now you say you
    care about whether they are actually the same file. But you still
    haven't explained *why* you are doing this comparison in this thread.
    [You might have done elsewhere, but I don't read every thread.]

    It comes down to this: are you happy learning Perl haphazard and
    piecemeal? Or do you want to learn Perl by actually getting things
    done in Perl?

    If you want to actually get something done, you should state what it
    is. Then we can help you with the design as well as the
    implementation.
    I have to support what Uri says here. You shouldn't start writing
    *any* code at all until you know what you want it to do. If you don't
    know what you want to do, you're groping in the dark. If you ask for
    help, people can't give you the help you need because they don't know
    what you want to do either.
    I have done that at every step... and people have been thoroughly
    capable of giving sound help.  Uri is talking through  his hat.
    I still don't know what your overarching goal is here. You're looking
    to find files with the same name but you also care that they are the
    same file.. but why? What is the overall purpose?
    Every software engineering method incorporates this principle, usually
    as a step called "requirements analysis". It's a fancy way of saying
    "what do I want to do?"
    That is the kind of `always true' thing one might say.  I forgot what
    the term is but it means its fairly meaningless and mainly sounds
    good. But none the less true.
    A tautology? No, it's not. I'm saying you can either do requirements
    analysis or you can skip it, but if you skip it you haven't got a
    criterion for success. You won't know what you're trying to do or when
    you've done it. If you do requirements analysis, you define what you
    want to do up front. You know what you're trying to do it and you'll
    know when you've done it.
    About driving around not knowing where I'm going.  I'd wager a full
    pension check that everyone here has done that.... and maybe not so
    long ago.
    I have done it recently. I didn't enjoy it. I prefer to get things done.

    [In particular, I'm working with MooseX::Method::Signatures, which is
    great when it works, but its cryptic error messages can leave you
    struggling to find the one-liner which fixes your code. I'd rather
    just find it and fix it than grope around for 10 minutes per bug.]

    [If you're meaning literal driving, then no, because I haven't got my
    license yet :P ]

    Phil
  • Akhthar Parvez K at May 4, 2010 at 1:16 pm

    On Tuesday 04 May 2010, Philip Potter wrote:
    On 3 May 2010 22:06, Harry Putnam wrote:
    That is the kind of `always true' thing one might say.  I forgot what
    the term is but it means its fairly meaningless and mainly sounds
    good. But none the less true.
    A tautology? No, it's not. I'm saying you can either do requirements
    analysis or you can skip it, but if you skip it you haven't got a
    criterion for success. You won't know what you're trying to do or when
    you've done it. If you do requirements analysis, you define what you
    want to do up front. You know what you're trying to do it and you'll
    know when you've done it.
    I think it's necessary to have a clear idea of what should be done and how. It applies to each program, whether small or big. It consists of preparing a design and writing a flow chart (not necessary on paper, but in our mind). This would help us getting the program done in a faster and proper way. I thought everyone would be doing the same. Is that not the case?

    Eventhough I think it's not necessary to post the goal of one's program here, I think he should state clearly what he wants to do as it will help both parties (those who seek help and those who help) and eventually the resolution would be faster.

    --
    Regards,
    Akhthar Parvez K
    http://Tips.SysAdminGUIDE.COM
    UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity - Dennie Richie
  • Harry Putnam at May 4, 2010 at 2:20 pm

    Philip Potter writes:
    haven't explained *why* you are doing this comparison in this thread.
    [You might have done elsewhere, but I don't read every thread.]
    Uri actually did (most of) it for me at one point in a recent post on
    this thread. Message-ID: <87mxwgbvyo.fsf@quad.sysarch.com> on
    gmane.comp.lang.perl.beginners

    This is probably overly verbose... as I am not very good at explain
    the full project simply.

    Project: Merge two directory structures that contain:

    Many files in one and not the other
    But different thing have to happen in each structures case.

    1)
    A larger structure has many files not contained in a smaller
    structure. Those files need to disappear.

    2)
    The smaller structure has many files not present in the larger
    structure. Those files need to remain.

    NOTE: Those two chores are in hand... due to excellent examples
    supplied here.

    3) Of the remaining files:
    Many of them have the same final (end name) but different
    paths... besides the obvious difference of root path.

    b1/some/path/file (from larger hierarch)
    b2/path/file (from smaller hierarch)

    But also there are many cases where the end name is the same but
    the files themselves are different. (Those are handled through a
    dispatch table of functions. One being to check the files size
    and another supplies command line to diff the files.). In fact
    usually the size is enough to make a choice.

    So, if they are the same file then the file from the larger
    structure will be moved to new base and over the path of the
    smaller structure.. That sounds too confusing but illustration will
    show is not really:

    b1/some/path/file
    b2/path/file

    b1 file will be moved to
    b3/path/file

    Overwriting the file from b2.

    The final confusion is that all this is not really being done, but is
    generating a list of cmds that will do the complete job.

    What will happen in the end is all of b2 will be left and all
    identical files from b1 will overwrite there twin by being move to
    the twins address. Whatever is in b1 that isn't in b2 at all will be
    deleted.


    In step 1. rm (b1) [ files not in b2]

    step 2) add (b2) [ files not in b1 ]

    step 3) mv matching b1 over there twin in b2

    The equivalent cmds will be carried out inside a Distributed
    Versioning System called `git'.

    I'm not familiar yet with git. At least not much, but can generate a
    list of cmds to be carrried out by that program.

    ------- --------- ---=--- --------- --------
    I now have those chores listed above in hand due to the helpful
    responses here. The full program is nearly complete
    Just having some trouble getting the dispatch table boiled down to its
    most usable incarnation.

    If I strike out on that or need a nudge... I'll probably post
    something on a new thread. But again it will be about the part
    concerning dispatch tables. Not the whole project.

    I will be much more careful with any examples of what I need. The
    sharp eyed experts here will get sick of me pretty soon if I don't
    give better examples. I'll get sick of me too....
  • Philip Potter at May 4, 2010 at 2:45 pm

    On 4 May 2010 15:19, Harry Putnam wrote:
    Philip Potter <philip.g.potter@gmail.com> writes:
    haven't explained *why* you are doing this comparison in this thread.
    [You might have done elsewhere, but I don't read every thread.]
    Uri actually did (most of) it for me at one point in a recent post on
    this thread.  Message-ID: <87mxwgbvyo.fsf@quad.sysarch.com> on
    gmane.comp.lang.perl.beginners

    This is probably overly verbose... as I am not very good at explain
    the full project simply.

    Project: Merge two directory structures that contain:

    Many files in one and not the other
    But different thing have to happen in each structures case.

    1)
    A larger structure has many files not contained in a smaller
    structure. Those files need to disappear.

    2)
    The smaller structure has many files not present in the larger
    structure. Those files need to remain.

    NOTE: Those two chores are in hand... due to excellent examples
    supplied here.

    3) Of the remaining files:
    Many of them have the same final (end name) but different
    paths... besides the obvious difference of root path.

    b1/some/path/file  (from larger hierarch)
    b2/path/file       (from smaller hierarch)

    But also there are many cases where the end name is the same but
    the files themselves are different.  (Those are handled through a
    dispatch table of functions.  One being to check the files size
    and another supplies command line to diff the files.).  In fact
    usually the size is enough to make a choice.

    So, if they are the same file then the file from the larger
    structure will be moved to new base and over the path of the
    smaller structure.. That sounds too confusing but illustration will
    show is not really:

    b1/some/path/file
    b2/path/file

    b1 file will be moved to
    b3/path/file

    Overwriting the file from b2.

    The final confusion is that all this is not really being done, but is
    generating a list of cmds that will do the complete job.

    What will happen in the end is all of b2 will be left and all
    identical files from b1 will overwrite there twin by being move to
    the twins address.  Whatever is in b1 that isn't in b2 at all will be
    deleted.
    Thank you harry for writing this. Can I ask some questions?

    You have path P1 and path P2. P1 has more files than P2, but P2 may
    contain files that P1 doesn't.

    You want to create P3, which is a merge of P1 and P2. But if a file is
    in P1 and not P2, that file gets ignored. Doesn't this mean P3 is
    identical to P2?

    If two files are identical, you want to copy the P1 version over the
    P2 version. If the files are identical, what effect will this copy
    have?
    In step 1.  rm (b1) [ files not in b2]
    remove from b1 those files which are in b1 but not in b2. ok.
    step 2)  add (b2) [ files not in b1 ]
    add to b2 those files which are in b2 but not in b1.
    --> Aren't they already in b2?
    step 3)  mv matching b1 over there twin in b2
    move identical files from b1 over their twins in b2.
    --> If they are identical, why bother moving them?
    The equivalent cmds will be carried out inside a Distributed
    Versioning System called `git'.

    I'm not familiar yet with git.  At least not much, but can generate a
    list of cmds to be carrried out by that program.
    "git merge" probably does what you want in one command.

    Phil
  • Harry Putnam at May 3, 2010 at 9:11 pm

    "Uri Guttman" <uri@StemSystems.com> writes:

    nope. been doing this for 35 years and it is solid advice. you can't do
    a proper program unless you have a proper goal which is what the
    specification is.
    Thank you Uncle Uri. I guess I just don't yet know how to make use
    all you present as help.

    Some of it looks suspiciously like hair splitting and karping of the
    first order.
  • Uri Guttman at May 3, 2010 at 9:52 pm
    "HP" == Harry Putnam writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    nope. been doing this for 35 years and it is solid advice. you can't do
    a proper program unless you have a proper goal which is what the
    specification is.
    HP> Thank you Uncle Uri. I guess I just don't yet know how to make use
    HP> all you present as help.

    if you can't understand something, then ask about it. we can't read your
    mind, only your rants! :)

    HP> Some of it looks suspiciously like hair splitting and karping of the
    HP> first order.

    what you think is hair splitting, we think of as moving mountains. this
    is what experience in developing projects (big and small) tells us. you
    came here to learn perl. there is much more to programming than learning
    a particular language. in fact most programming skills and knowledge is
    language independent and that is also important to know.

    uri

    --
    Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
  • Bob McConnell at May 4, 2010 at 12:46 pm
    From: Uri Guttman
    "HP" == Harry Putnam <reader@newsguy.com> writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    nope. been doing this for 35 years and it is solid advice. you
    can't do
    a proper program unless you have a proper goal which is what the
    specification is.
    HP> Some of it looks suspiciously like hair splitting and karping of the
    HP> first order.

    what you think is hair splitting, we think of as moving mountains. this
    is what experience in developing projects (big and small) tells us. you
    came here to learn perl. there is much more to programming than learning
    a particular language. in fact most programming skills and knowledge is
    language independent and that is also important to know.
    This is sounding more and more like an argument between waterfall and
    agile managers about the best methodology for developing applications.
    In waterfall you always started with a locked down requirements
    document, but in agile we never do. The best we can get is the product
    manager's interpretation of what she heard the client describe. That
    usually changes as soon as she sees the first prototype.

    Harry has said he is just beginning to learn the language. As a result,
    I would expect his short range goals to be adjusted as he learns what is
    possible and what it takes to accomplish it. That does require some
    'driving around' to get an idea of the lay of the land and the paths
    available to get from here to there. It is also called experimenting
    with the tool set, or working through the exercises at the end of the
    chapter. As long as he is learning, what difference does it make what
    his final destination is. Do any of us know what that will be when we
    start playing in a new sand box?

    Bob McConnell

    ----
    I know that you think you understand what you thought you heard me say.
    But I don't believe you realize that what I said is not what I meant.
  • Philip Potter at May 4, 2010 at 1:02 pm

    On 4 May 2010 13:45, Bob McConnell wrote:
    From: Uri Guttman
    "HP" == Harry Putnam <reader@newsguy.com> writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    nope. been doing this for 35 years and it is solid advice. you
    can't do
    a proper program unless you have a proper goal which is what the
    specification is.
    HP> Some of it looks suspiciously like hair splitting and karping of the
    HP> first order.

    what you think is hair splitting, we think of as moving mountains. this
    is what experience in developing projects (big and small) tells us. you
    came here to learn perl. there is much more to programming than learning
    a particular language. in fact most programming skills and knowledge is
    language independent and that is also important to know.
    This is sounding more and more like an argument between waterfall and
    agile managers about the best methodology for developing applications.
    In waterfall you always started with a locked down requirements
    document, but in agile we never do. The best we can get is the product
    manager's interpretation of what she heard the client describe. That
    usually changes as soon as she sees the first prototype.

    Harry has said he is just beginning to learn the language. As a result,
    I would expect his short range goals to be adjusted as he learns what is
    possible and what it takes to accomplish it. That does require some
    'driving around' to get an idea of the lay of the land and the paths
    available to get from here to there. It is also called experimenting
    with the tool set, or working through the exercises at the end of the
    chapter. As long as he is learning, what difference does it make what
    his final destination is. Do any of us know what that will be when we
    start playing in a new sand box?
    You have to start with *some* goal. Even in agile, you start with
    stories to work out in what direction you are headed. You formalise
    your requirements into tests and then you start coding. Yes, you
    revise your stories, requirements and tests as you learn more about
    the design and the technology, but you still need to start from
    somewhere.

    Agile is a more nuanced approach to having everything planned out
    before you begin coding but it's still the same underlying model of
    [requirements] -> [design] -> [code], just repeated daily ad nauseum.

    I do agree that there is a contiuum here, with "totally uninformed
    undirected exploring and experimentation" on one end, and "totally
    preplanned careful single-minded coding" on the other. Waterfall is
    totally preplanned, agile is somewhere in the middle. I don't think
    anyone advocates having *no plan* at all and guessing everywhere.

    Phil
  • Bob McConnell at May 4, 2010 at 2:14 pm
    From: Philip Potter
    On 4 May 2010 13:45, Bob McConnell wrote:
    From: Uri Guttman
    "HP" == Harry Putnam <reader@newsguy.com> writes:
    HP> "Uri Guttman" <uri@StemSystems.com> writes:
    nope. been doing this for 35 years and it is solid advice. you
    can't do
    a proper program unless you have a proper goal which is what the
    specification is.
    HP> Some of it looks suspiciously like hair splitting and karping of the
    HP> first order.

    what you think is hair splitting, we think of as moving mountains. this
    is what experience in developing projects (big and small) tells us. you
    came here to learn perl. there is much more to programming than learning
    a particular language. in fact most programming skills and knowledge is
    language independent and that is also important to know.
    This is sounding more and more like an argument between waterfall and
    agile managers about the best methodology for developing applications.
    In waterfall you always started with a locked down requirements
    document, but in agile we never do. The best we can get is the product
    manager's interpretation of what she heard the client describe. That
    usually changes as soon as she sees the first prototype.

    Harry has said he is just beginning to learn the language. As a result,
    I would expect his short range goals to be adjusted as he learns what is
    possible and what it takes to accomplish it. That does require some
    'driving around' to get an idea of the lay of the land and the paths
    available to get from here to there. It is also called experimenting
    with the tool set, or working through the exercises at the end of the
    chapter. As long as he is learning, what difference does it make what
    his final destination is. Do any of us know what that will be when we
    start playing in a new sand box?
    You have to start with *some* goal. Even in agile, you start with
    stories to work out in what direction you are headed. You formalise
    your requirements into tests and then you start coding. Yes, you
    revise your stories, requirements and tests as you learn more about
    the design and the technology, but you still need to start from
    somewhere.

    Agile is a more nuanced approach to having everything planned out
    before you begin coding but it's still the same underlying model of
    [requirements] -> [design] -> [code], just repeated daily ad nauseum.

    I do agree that there is a contiuum here, with "totally uninformed
    undirected exploring and experimentation" on one end, and "totally
    preplanned careful single-minded coding" on the other. Waterfall is
    totally preplanned, agile is somewhere in the middle. I don't think
    anyone advocates having *no plan* at all and guessing everywhere.
    Your version of agile sounds a *lot* more formal than ours. We are nowhere near the middle, but much closer to the 'no plan' end of the spectrum. The story we get is often a one liner with more wish than reality. I have thrown out several weeks of work over the past year because what the PM gave us to work from wasn't what she or the client meant.

    We have one developer that has described the situation like this: "Management asked me to build a box. But as soon as that box was done, they looked at it and said it was supposed to have round corners. So we rounded the corners and then they said it was supposed to have a lid. So we build the lid and then they said, the lid should have hinges..." He implemented one PDA application from scratch three times before they were happy with it, changing SDKs twice and the target OS once.

    Yes, we have a real problem getting management to give us useful requirements before we start a project, or even part way through it when we show them prototypes. We try to pry more details out of them, and do come close most of the time, but it is about as easy as pulling hen's teeth.

    The PDA developer and I are now waiting to see which of us retires first. He has less than 400 days left. If I decide to wait until the 25th anniversary of my date of hire, I have 518 days. But I will be eligible for early retirement in 273 days.

    Bob McConnell
  • Harry Putnam at May 4, 2010 at 2:30 pm

    Philip Potter writes:

    You have to start with *some* goal. Even in agile, you start with
    stories to work out in what direction you are headed. You formalise
    your requirements into tests and then you start coding. Yes, you
    revise your stories, requirements and tests as you learn more about
    the design and the technology, but you still need to start from
    somewhere.
    Here we are again... good heavens Philip, haven't I shown a number of
    goals? You continue to sound as if I've never uttered a word about
    what I wanted to do..... That is just baloney.

    Of course what you say above is right (I don't mean just in
    agile.. which I know nothing whatever about).

    I guess I don't see a need to keep repeating the same mantra.. when
    that is exactly (close enough anyway) what has happened in this thread.

    I may have presented poor examples or asked the wrong or dumb
    questions but at root... what you continue to repeat is pretty much
    what happened here.
  • Harry Putnam at May 4, 2010 at 2:25 pm

    "Bob McConnell" <rvm@CBORD.com> writes:

    I would expect his short range goals to be adjusted as he learns what is
    possible and what it takes to accomplish it. That does require some
    Thank you sir for a decent summary of how this has gone so far.
  • Dr.Ruud at May 3, 2010 at 6:47 pm

    Harry Putnam wrote:

    Yes, files that exist on multiple paths, but there are also
    many matched names that are not actually the same file.
    Then use md5, or a similar tool.

    --
    Ruud
  • Dermot at May 4, 2010 at 8:08 pm

    On 3 May 2010 19:47, Dr.Ruud wrote:
    Harry Putnam wrote:
    Yes, files that exist on multiple paths, but there are also
    many matched names that are not actually the same file.
    Then use md5, or a similar tool.

    Seconded. If you want to find duplicate files you will need to use MD5
    or a SHA tool.
    Dp.
  • John W. Krahn at May 3, 2010 at 1:44 am

    Harry Putnam wrote:

    my %h1 = (
    './b/f1' => 'f1',
    './b/c/fa' => 'fa',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fb' => 'fb'
    );


    my %h2 = (
    './b/fb' => 'fb',
    './b/c/fd' => 'fd',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fc' => 'fc',
    './b/g/h/r/fb' => 'fb'
    );
    In your previous example you used two different paths, so why does this
    example have only one path and why do './b/l/c/f2' and './b/g/f/r/fb'
    show up in both hashes but the other values do not?




    John
    --
    The programmer is fighting against the two most
    destructive forces in the universe: entropy and
    human stupidity. -- Damian Conway
  • Harry Putnam at May 3, 2010 at 3:18 pm

    "John W. Krahn" <jwkrahn@shaw.ca> writes:

    Harry Putnam wrote:
    my %h1 = (
    './b/f1' => 'f1',
    './b/c/fa' => 'fa',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fb' => 'fb'
    );


    my %h2 = (
    './b/fb' => 'fb', './b/c/fd'
    => 'fd',
    './b/l/c/f2' => 'f2',
    './b/g/f/r/fc' => 'fc',
    './b/g/h/r/fb' => 'fb'
    );
    In your previous example you used two different paths, so why does
    this example have only one path and why do './b/l/c/f2' and
    ./b/g/f/r/fb' show up in both hashes but the other values do not?
    About the single base path... it was just an oversite... the result of
    copy pasting the first hash I typed out before I started fiddling with
    them to get the problem a little more realistic... and never thought
    to change the base path.

    Your observation is right on the money though.. as in the real case
    there are definitely two different root paths. In fact its really the
    only reason to need two hashes.

    Your second question is, again poor observation on my part... but
    doesn't change the task really.

    In fact, the base path is actually discarded in the final result.

    base1/some/path/file
    base2/path/file

    Will actually be moved like this (roughly):

    move:
    base1/some/path/file
    (minus the existing root)

    to this path:
    base2/path/file
    (minus the existing root)

    So: some/path/file

    to: path/file

    In here:
    newbase/path/file


    So doesn't the chore I tried for a simple example of remain the same?

    That is, to identify which end file names are present in either hash
    regardless of the rest of the path?

    The example task was just to find the matching end names.

    The task I'm after really has 2 major components... and a third
    component that depends on those three for its input.

    The actual project is a merging of two file structures that have many
    similar files, many identical files, and many file in one but not the other.

    One hierarchy is vastly larger than the other. Any files in it, but
    not the smaller one, can be deleted.

    Any files in the smaller one and not the larger need to remain in the
    final product.

    The problem gets more complicated with what is left.

    Many are identical except for the path and I don't mean just the root
    path.

    So the name and paths in the larger structure that contain files
    identical to files in the smaller structure... will be moved to
    overwrite their twins in still a different root..

    That is, we are building up a third hierarchy that contains a merger
    of the 2 hashed structures in a specific way.

    That isn't actually quite right either... really I'm trying to create
    a list of cmds that will create such a merged structure.

    No files are actually manipulated (yet) just drawing up a list of
    things to do to create a merged structure.

    Looking back at what I wrote above... its confusing as heck... sorry
    I'm not better at explaining what I'm trying to do.

    ------- --------- ---=--- --------- --------

    Of course first generate the hashes (in terms of File::Find).

    h1{$File::Find::name} = $_
    h2{$File::Find::name} = $_

    so that the keys are File::Find::name and values are $_

    1) find all files (end of path filenames) that are in one
    and not the other

    2) Ditto the other way round

    3) Find all that match on end name (but maybe not path) and dispatch
    them through means of a dispatch table, in one of 3 basic ways.
    remove
    add
    overwrite twin

    Part 3, I've actually got something of a handle on.. from other
    helpful posts here regarding `dispatch tables'.

    I'm looking for better ways to get 1 and 2 done with minimal effort.

    I actually am able to get the needed results now... I was asking just
    for better ways to do it and in the course of things provided somewhat
    inaccurate examples as you've noted.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedMay 2, '10 at 4:29p
activeMay 12, '10 at 4:19a
posts36
users10
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase