FAQ
I have multiple CSV files (3 for this example) with
identical record layouts except that each Server Name may not be in
all of the files. Also the CSV files might not be sorted by Server
Name.

File A
Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.75,95.87,66.67,68.13
wsomdavpra03,90.39,94,65.77,68.51
wsomddvfxa01,39.22,92.19,82.59,88.25

File B
Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,34.78,100,55.1,67.6
wsomdavpra03,69.04,98.55,84.07,89.73
wsomddvfxa01,92.44,97.54,67.72,71.69
wsompapgtw05,48.77,96.9,92.1,93.55

File C
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.13,98.11,68.95,73.47
wsomdavpra03,68.85,97.56,76.35,98.23
wsomddvfxa01,46.97,96.29,88.23,94.02
wsompapgtw05,30.66,93.74,39.89,71.35

What I am trying to do is for each Server (in Column
1) I want to get the Avg CPU (in Column 2) from each
of the 3 files. In the files above this would produce


File OUT
WSOMQAVPRA05,93.75,34.78,93.13
wsomdavpra03,90.39,69.04,68.85
wsomddvfxa01,39.22,92.44,46.97
wsompapgtw05,0,48.77,30.66


Notice in File OUT for Server Name (wsompapgtw05),
since wsompapgtw05 does not appear in File A, the
value is replaced with '0'. How can I get perl to
place a '0' in the output file when a particular
server name appears in at least 1 of the input files,
but not in all of the input files?

The perl code I've written up to this point does not handle the
missing server and Avg.CPU value. I think that I need to add
code to first build a hash which contains the server names in a
key by going through all the files - 3 for this example - and then
iterating over that list of servers in a while loop, inserting the
value
found in the respective file or a '0'. You will see in my code, which
follows I have created 1 hash. I am not sure in perl how I might
attempt to build this additional hash with a Server Name as key
and then iterating over that list. My code so far is:

#!/usr/bin/perl

use strict;
use warnings;
###########################################
# Create File with Average CPU Numbers
###########################################
my %resultacpu;

for my $file ("FileA","FileB","FileC") {

open (my $fh,"<",$file) or die "Can't open file $file: $!";
<$fh>; # skip header line

while (my $line = <$fh>) {
my ($server,$cpua) = (split(",",$line))[0,1];
push @{$resultacpu{$server}},$cpua;
}

close $fh or die "Can't close file $file: $!";
}

open (my $nfh,">","OUT") or die "Can't open result file: $!";

for my $server (sort keys %resultacpu) {
print $nfh $server.",".join(",",@{$resultacpu{$server}})."\n";
}
close $nfh or die "Can't close result file: $!";


I would appreciate the assistance of any perl coders who might
be able to help me with this problem

Search Discussions

  • Wagner, David --- Senior Programmer Analyst --- WGO at Jul 18, 2007 at 4:28 pm

    -----Original Message-----
    From: country
    Sent: Wednesday, July 18, 2007 06:00
    To: beginners@perl.org
    Cc: RVokulich@Pershing.com
    Subject: Combining Records From Multiple Files based on
    Common Key Values

    I have multiple CSV files (3 for this example) with
    identical record layouts except that each Server Name may not be in
    all of the files. Also the CSV files might not be sorted by Server
    Name.

    File A
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.75,95.87,66.67,68.13
    wsomdavpra03,90.39,94,65.77,68.51
    wsomddvfxa01,39.22,92.19,82.59,88.25

    File B
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,34.78,100,55.1,67.6
    wsomdavpra03,69.04,98.55,84.07,89.73
    wsomddvfxa01,92.44,97.54,67.72,71.69
    wsompapgtw05,48.77,96.9,92.1,93.55

    File C
    Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.13,98.11,68.95,73.47
    wsomdavpra03,68.85,97.56,76.35,98.23
    wsomddvfxa01,46.97,96.29,88.23,94.02
    wsompapgtw05,30.66,93.74,39.89,71.35

    What I am trying to do is for each Server (in Column
    1) I want to get the Avg CPU (in Column 2) from each
    of the 3 files. In the files above this would produce


    File OUT
    WSOMQAVPRA05,93.75,34.78,93.13
    wsomdavpra03,90.39,69.04,68.85
    wsomddvfxa01,39.22,92.44,46.97
    wsompapgtw05,0,48.77,30.66


    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?

    The perl code I've written up to this point does not handle the
    missing server and Avg.CPU value. I think that I need to add
    code to first build a hash which contains the server names in a
    key by going through all the files - 3 for this example - and then
    iterating over that list of servers in a while loop, inserting the
    value
    found in the respective file or a '0'. You will see in my code, which
    follows I have created 1 hash. I am not sure in perl how I might
    attempt to build this additional hash with a Server Name as key
    and then iterating over that list. My code so far is:
    no need to read thru the files multiple. Build the hash and one
    way is to use an index and as you open the next file, add to the index.
    Now when doing the print, you only need to check the array element and
    if not defined, then zero is used.

    Wags ;)
    David R Wagner
    Senior Programmer Analyst
    FedEx Freight
    1.408.323.4225x2224 TEL
    1.408.323.4449 FAX
    http://fedex.com/us

    #!/usr/bin/perl

    use strict;
    use warnings;
    ###########################################
    # Create File with Average CPU Numbers
    ###########################################
    my %resultacpu;

    for my $file ("FileA","FileB","FileC") {

    open (my $fh,"<",$file) or die "Can't open file $file: $!";
    <$fh>; # skip header line

    while (my $line = <$fh>) {
    my ($server,$cpua) = (split(",",$line))[0,1];
    push @{$resultacpu{$server}},$cpua;
    }

    close $fh or die "Can't close file $file: $!";
    }

    open (my $nfh,">","OUT") or die "Can't open result file: $!";

    for my $server (sort keys %resultacpu) {
    print $nfh $server.",".join(",",@{$resultacpu{$server}})."\n";
    }
    close $nfh or die "Can't close result file: $!";


    I would appreciate the assistance of any perl coders who might
    be able to help me with this problem


    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    http://learn.perl.org/

    **********************************************************************
    This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited.
    **********************************************************************
  • Rob Dixon at Jul 18, 2007 at 4:42 pm

    country wrote:
    I have multiple CSV files (3 for this example) with
    identical record layouts except that each Server Name may not be in
    all of the files. Also the CSV files might not be sorted by Server
    Name.

    File A
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.75,95.87,66.67,68.13
    wsomdavpra03,90.39,94,65.77,68.51
    wsomddvfxa01,39.22,92.19,82.59,88.25

    File B
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,34.78,100,55.1,67.6
    wsomdavpra03,69.04,98.55,84.07,89.73
    wsomddvfxa01,92.44,97.54,67.72,71.69
    wsompapgtw05,48.77,96.9,92.1,93.55

    File C
    Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.13,98.11,68.95,73.47
    wsomdavpra03,68.85,97.56,76.35,98.23
    wsomddvfxa01,46.97,96.29,88.23,94.02
    wsompapgtw05,30.66,93.74,39.89,71.35

    What I am trying to do is for each Server (in Column
    1) I want to get the Avg CPU (in Column 2) from each
    of the 3 files. In the files above this would produce


    File OUT
    WSOMQAVPRA05,93.75,34.78,93.13
    wsomdavpra03,90.39,69.04,68.85
    wsomddvfxa01,39.22,92.44,46.97
    wsompapgtw05,0,48.77,30.66


    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?

    The perl code I've written up to this point does not handle the
    missing server and Avg.CPU value. I think that I need to add
    code to first build a hash which contains the server names in a
    key by going through all the files - 3 for this example - and then
    iterating over that list of servers in a while loop, inserting the
    value
    found in the respective file or a '0'. You will see in my code, which
    follows I have created 1 hash. I am not sure in perl how I might
    attempt to build this additional hash with a Server Name as key
    and then iterating over that list. My code so far is:

    #!/usr/bin/perl

    use strict;
    use warnings;
    ###########################################
    # Create File with Average CPU Numbers
    ###########################################
    my %resultacpu;

    for my $file ("FileA","FileB","FileC") {

    open (my $fh,"<",$file) or die "Can't open file $file: $!";
    <$fh>; # skip header line

    while (my $line = <$fh>) {
    my ($server,$cpua) = (split(",",$line))[0,1];
    push @{$resultacpu{$server}},$cpua;
    }

    close $fh or die "Can't close file $file: $!";
    }

    open (my $nfh,">","OUT") or die "Can't open result file: $!";

    for my $server (sort keys %resultacpu) {
    print $nfh $server.",".join(",",@{$resultacpu{$server}})."\n";
    }
    close $nfh or die "Can't close result file: $!";


    I would appreciate the assistance of any perl coders who might
    be able to help me with this problem
    Something like this? I think the only thing that needs explaining is that
    the 'next if grep' line rejects the header line by ensuring that all of the
    data fields after the server consist of only numeric and decimal point
    characters.

    HTH,

    Rob



    use strict;
    use warnings;

    local @ARGV = qw/ fileB.txt fileB.txt fileC.txt /;

    my %usage;

    while (<>) {
    chomp;
    my ($server, @data) = split /,/;
    next if grep /[^0-9.]/, @data;
    push @{$usage{$server}}, $data[0];
    }

    foreach my $server (sort keys %usage) {

    my $data = $usage{$server};
    print join ',', $server, @$data;
    print "\n";
    }

    **OUTPUT**

    WSOMQAVPRA05,34.78,34.78,93.13
    wsomdavpra03,69.04,69.04,68.85
    wsomddvfxa01,92.44,92.44,46.97
    wsompapgtw05,48.77,48.77,30.66
  • Rob Dixon at Jul 18, 2007 at 6:44 pm

    Rob Dixon wrote:
    country wrote:
    I have multiple CSV files (3 for this example) with
    identical record layouts except that each Server Name may not be in
    all of the files. Also the CSV files might not be sorted by Server
    Name.

    File A
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.75,95.87,66.67,68.13
    wsomdavpra03,90.39,94,65.77,68.51
    wsomddvfxa01,39.22,92.19,82.59,88.25

    File B
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,34.78,100,55.1,67.6
    wsomdavpra03,69.04,98.55,84.07,89.73
    wsomddvfxa01,92.44,97.54,67.72,71.69
    wsompapgtw05,48.77,96.9,92.1,93.55

    File C
    Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.13,98.11,68.95,73.47
    wsomdavpra03,68.85,97.56,76.35,98.23
    wsomddvfxa01,46.97,96.29,88.23,94.02
    wsompapgtw05,30.66,93.74,39.89,71.35

    What I am trying to do is for each Server (in Column
    1) I want to get the Avg CPU (in Column 2) from each
    of the 3 files. In the files above this would produce


    File OUT
    WSOMQAVPRA05,93.75,34.78,93.13
    wsomdavpra03,90.39,69.04,68.85
    wsomddvfxa01,39.22,92.44,46.97
    wsompapgtw05,0,48.77,30.66


    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?

    The perl code I've written up to this point does not handle the
    missing server and Avg.CPU value. I think that I need to add
    code to first build a hash which contains the server names in a
    key by going through all the files - 3 for this example - and then
    iterating over that list of servers in a while loop, inserting the
    value
    found in the respective file or a '0'. You will see in my code, which
    follows I have created 1 hash. I am not sure in perl how I might
    attempt to build this additional hash with a Server Name as key
    and then iterating over that list. My code so far is:

    #!/usr/bin/perl

    use strict;
    use warnings;
    ###########################################
    # Create File with Average CPU Numbers
    ###########################################
    my %resultacpu;

    for my $file ("FileA","FileB","FileC") {

    open (my $fh,"<",$file) or die "Can't open file $file: $!";
    <$fh>; # skip header line

    while (my $line = <$fh>) {
    my ($server,$cpua) = (split(",",$line))[0,1];
    push @{$resultacpu{$server}},$cpua;
    }

    close $fh or die "Can't close file $file: $!";
    }

    open (my $nfh,">","OUT") or die "Can't open result file: $!";

    for my $server (sort keys %resultacpu) {
    print $nfh $server.",".join(",",@{$resultacpu{$server}})."\n";
    }
    close $nfh or die "Can't close result file: $!";


    I would appreciate the assistance of any perl coders who might
    be able to help me with this problem
    Something like this? I think the only thing that needs explaining is that
    the 'next if grep' line rejects the header line by ensuring that all of the
    data fields after the server consist of only numeric and decimal point
    characters.
    [snip old solution]

    My apologies. There was a bug in my previous post, and besides it didn't
    address the problem of inserting zeroes where files didn't have any
    information for a given server. This will do the trick, and I hope it
    helps.

    Rob



    use strict;
    use warnings;

    local @ARGV = qw/ fileA.txt fileB.txt fileC.txt /;

    my %usage;
    my $files = 0;

    while (<>) {

    chomp;
    my ($server, @data) = split /,/;
    next if grep /[^0-9.]/, @data;

    $usage{$server} = [] unless exists $usage{$server};
    push @{$usage{$server}}, 0 while @{$usage{$server}} < $files;
    push @{$usage{$server}}, $data[0];
    }
    continue {
    $files++ if eof;
    }

    foreach my $server (keys %usage) {
    my $usage = $usage{$server};
    push @$usage, 0 while @$usage < $files;
    }

    foreach my $server (sort keys %usage) {

    my $data = $usage{$server};
    print join ',', $server, @$data;
    print "\n";
    }

    **OUTPUT**

    WSOMQAVPRA05,93.75,34.78,93.13
    wsomdavpra03,90.39,69.04,68.85
    wsomddvfxa01,39.22,92.44,46.97
    wsompapgtw05,0,48.77,30.66
  • Mr. Shawn H. Corey at Jul 18, 2007 at 4:54 pm

    country wrote:
    #!/usr/bin/perl

    use strict;
    use warnings;
    ###########################################
    # Create File with Average CPU Numbers
    ###########################################
    my %resultacpu;
    my @files = qw( FileA FileB FileC );
    for my $file ("FileA","FileB","FileC") {
    for my $i ( 0 .. $#files ) {
    my $file = $files[$i];
    open (my $fh,"<",$file) or die "Can't open file $file: $!";
    <$fh>; # skip header line

    while (my $line = <$fh>) {
    my ($server,$cpua) = (split(",",$line))[0,1];
    push @{$resultacpu{$server}},$cpua;
    $resultacpu{$server}[$i] = $cpua;
    }

    close $fh or die "Can't close file $file: $!";
    }

    open (my $nfh,">","OUT") or die "Can't open result file: $!";

    for my $server (sort keys %resultacpu) {
    print $nfh $server.",".join(",",@{$resultacpu{$server}})."\n";
    my @list = ();
    for my $i ( 0 .. $#files ) { # Yes that's $#files
    push @list, $resultacpu{$server}[$i] || 0;
    }
    print $nfh join( ',', $server, @list ), "\n";
    # For print, comma is faster than dot
    }
    close $nfh or die "Can't close result file: $!";


    I would appreciate the assistance of any perl coders who might
    be able to help me with this problem
    --
    Just my 0.00000002 million dollars worth,
    Shawn

    "For the things we have to learn before we can do them, we learn by
    doing them."
    Aristotle
  • John W. Krahn at Jul 18, 2007 at 6:14 pm

    country wrote:
    I have multiple CSV files (3 for this example) with
    identical record layouts except that each Server Name may not be in
    all of the files. Also the CSV files might not be sorted by Server
    Name.

    File A
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.75,95.87,66.67,68.13
    wsomdavpra03,90.39,94,65.77,68.51
    wsomddvfxa01,39.22,92.19,82.59,88.25

    File B
    Server Name,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,34.78,100,55.1,67.6
    wsomdavpra03,69.04,98.55,84.07,89.73
    wsomddvfxa01,92.44,97.54,67.72,71.69
    wsompapgtw05,48.77,96.9,92.1,93.55

    File C
    Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
    WSOMQAVPRA05,93.13,98.11,68.95,73.47
    wsomdavpra03,68.85,97.56,76.35,98.23
    wsomddvfxa01,46.97,96.29,88.23,94.02
    wsompapgtw05,30.66,93.74,39.89,71.35

    What I am trying to do is for each Server (in Column
    1) I want to get the Avg CPU (in Column 2) from each
    of the 3 files. In the files above this would produce


    File OUT
    WSOMQAVPRA05,93.75,34.78,93.13
    wsomdavpra03,90.39,69.04,68.85
    wsomddvfxa01,39.22,92.44,46.97
    wsompapgtw05,0,48.77,30.66


    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?
    This appears to do what you want:

    @ARGV = glob 'File[ABC]';

    my %resultacpu;
    my $column = 0;

    while ( <> ) {
    next if $. == 1; # skip header line
    my ( $server, $cpua ) = split /,/;
    $resultacpu{ $server }[ $column ] = $cpua;
    if ( eof ) {
    $column++;
    close ARGV;
    }
    }

    open my $nfh, '>', 'OUT' or die "Can't open 'OUT' $!";

    for my $server ( sort keys %resultacpu ) {
    print $nfh join( ',', map $_ || 0, $server, @{ $resultacpu{ $server } }
    ), "\n";
    }

    close $nfh or die "Can't close 'OUT' $!";




    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
  • Rob Dixon at Jul 18, 2007 at 6:50 pm

    John W. Krahn wrote:
    country wrote:
    [snip]
    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?
    This appears to do what you want:

    @ARGV = glob 'File[ABC]';

    my %resultacpu;
    my $column = 0;

    while ( <> ) {
    next if $. == 1; # skip header line
    my ( $server, $cpua ) = split /,/;
    $resultacpu{ $server }[ $column ] = $cpua;
    if ( eof ) {
    $column++;
    close ARGV;
    }
    }

    open my $nfh, '>', 'OUT' or die "Can't open 'OUT' $!";

    for my $server ( sort keys %resultacpu ) {
    print $nfh join( ',', map $_ || 0, $server, @{ $resultacpu{ $server
    } } ), "\n";
    }

    close $nfh or die "Can't close 'OUT' $!";
    John,

    Your solution has the same problem that made my final one frustratingly
    ugly: if a server name is missing from the last file in the list there
    will be no corresponding trailing zero in the output file for that server.

    Rob
  • John W. Krahn at Jul 18, 2007 at 7:08 pm

    Rob Dixon wrote:
    John W. Krahn wrote:
    country wrote:
    [snip]
    Notice in File OUT for Server Name (wsompapgtw05),
    since wsompapgtw05 does not appear in File A, the
    value is replaced with '0'. How can I get perl to
    place a '0' in the output file when a particular
    server name appears in at least 1 of the input files,
    but not in all of the input files?
    This appears to do what you want:

    @ARGV = glob 'File[ABC]';

    my %resultacpu;
    my $column = 0;

    while ( <> ) {
    next if $. == 1; # skip header line
    my ( $server, $cpua ) = split /,/;
    $resultacpu{ $server }[ $column ] = $cpua;
    if ( eof ) {
    $column++;
    close ARGV;
    }
    }

    open my $nfh, '>', 'OUT' or die "Can't open 'OUT' $!";

    for my $server ( sort keys %resultacpu ) {
    print $nfh join( ',', map $_ || 0, $server, @{ $resultacpu{
    $server } } ), "\n";
    }

    close $nfh or die "Can't close 'OUT' $!";
    John,

    Your solution has the same problem that made my final one frustratingly
    ugly: if a server name is missing from the last file in the list there
    will be no corresponding trailing zero in the output file for that server.
    Good catch Rob. That can be fixed with a couple of changes:

    $column--;
    for my $server ( sort keys %resultacpu ) {
    print join( ',', map $_ || 0, $server, @{ $resultacpu{ $server } }[ 0 ..
    $column ] ), "\n";
    }



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
  • Chas Owens at Jul 18, 2007 at 7:44 pm
    On 7/18/07, Rob Dixon wrote:
    snip
    Your solution has the same problem that made my final one frustratingly
    ugly: if a server name is missing from the last file in the list there
    will be no corresponding trailing zero in the output file for that server.
    snip

    The answer is to use a HoH instead of a HoA. This lets you use a hash
    slice to ensure that every file is represented. Files that don't have
    a given server will return undef and a simple map will turn the undef
    values into 0s. You can also shave away some of the code by
    integrating the split and the check into one regex.

    #!/usr/bin/perl

    use strict;
    use warnings;

    local @ARGV = qw/ fileA.txt fileB.txt fileC.txt /;

    my @files = @ARGV;
    my %usage;
    while (<>) {
    next unless /(\w+),([\d.]+)(?:,|$)/;
    $usage{$1}{$ARGV} = $2;
    }

    for my $k (sort keys %usage) {
    print join(",", $k, map {$_ or 0} @{$usage{$k}}{@files}), "\n";
    }

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedJul 18, '07 at 1:01p
activeJul 18, '07 at 7:44p
posts9
users6
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase