FAQ
Hi there!

I'm fairly new to Perl and need some help to acomplish a (simple?) task.

I extract strings from some logfiles, namely an ip-adress and bytes, by
using regexes. I use a hash to store ip-adress and associated bytes.
First i packed all logs in a temporary file but it was getting too big.
Now i'm having problems to merge the hashes from the single logfiles to
one hash for all. Either i have a hash for each of the files or just one
for the last processed file.

Thanks in advance

Folker Naumann

Search Discussions

  • JupiterHost.Net at Oct 6, 2004 at 8:58 pm

    Folker Naumann wrote:
    Hi there! Hello,
    I'm fairly new to Perl and need some help to acomplish a (simple?) task.

    I extract strings from some logfiles, namely an ip-adress and bytes, by
    using regexes. I use a hash to store ip-adress and associated bytes.

    Sounds like you have the hard part done :)
    First i packed all logs in a temporary file but it was getting too big.

    Instead of createing a new file that has each fuile in it (doubling the
    space and memeory used) process them one ata time:

    my %ipbytes = ();


    for my $file(@logfiles) {

    open LOG, $file or die $1;
    while(<LOG>) {
    my ($ip,$bytes) = split /\:/, $_; # or however you get the ip and
    bytes from a line
    $ipbytes{$ip} += $bytes; # may want top make sure $bytes is
    numeric to avoid possible errors
    }
    close LOG;
    }


    HTH :)

    Lee.M - JupiterHost.Net
  • Folker Naumann at Oct 6, 2004 at 9:58 pm

    JupiterHost.Net wrote:


    Instead of createing a new file that has each fuile in it (doubling the
    space and memeory used) process them one ata time:

    my %ipbytes = ();


    for my $file(@logfiles) {

    open LOG, $file or die $1;
    while(<LOG>) {
    my ($ip,$bytes) = split /\:/, $_; # or however you get the ip and
    bytes from a line
    $ipbytes{$ip} += $bytes; # may want top make sure $bytes is
    numeric to avoid possible errors
    }
    close LOG;
    }
    Hi,

    i should have include my code in the beginning, because i've done this
    already. I know that in this case a hash is generated for every file.
    But when i do the printing outside the foreach-loop just the hash of the
    last file is printed. I'm aware about that problem but could not figure
    out how to create only one hash for all files.

    Thanks

    Folker Naumann

    ----------------------------------------------------------
    (...)
    foreach $file (@sortlist){

    open(LOG,$file) or die "Can't open $file: $!\n";
    @lines = <LOG>;
    foreach my $logline (reverse(@lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    }
    }

    #Close log file
    close(LOG) or die "Can't close $file: $!\n";

    #Print hash sorted by Host-IP-Adress
    foreach $ip ( map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
    (/(\d+)$/)[0] ] } keys %ipload) {
    print "$ip = $ipload{$ip}\n";
    }

    ----------------------------------------------------------
  • JupiterHost.Net at Oct 6, 2004 at 10:06 pm
    Always always always:

    use strict;
    use warnings;
    (...)
    foreach $file (@sortlist){
    my @sortlist = ...
    my %ipload =();
    foreach my $file (@sortlist) {
    open(LOG,$file) or die "Can't open $file: $!\n";
    @lines = <LOG>;
    my @lines = <LOG>;
    foreach my $logline (reverse(@lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    Wht not just
    $ipload{$1} += $2;
    instead of the if else?
    }
    }

    #Close log file
  • Folker Naumann at Oct 6, 2004 at 10:19 pm

    JupiterHost.Net wrote:
    Always always always:

    use strict;
    use warnings;
    Sorry, i just used (...) to indicate that i left out some lines of code.
    Including use strict, use warnings and all initialisations.
    foreach my $logline (reverse(@lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}

    Wht not just
    $ipload{$1} += $2;
    instead of the if else?
    Thought it would be safer, because some ip-adresses have multiple file
    entries. But i guess it's not necessary.

    Thanks

    Folker Naumann
  • JupiterHost.Net at Oct 6, 2004 at 10:39 pm

    Always always always:

    use strict;
    use warnings;
    Sorry, i just used (...) to indicate that i left out some lines of code.
    Including use strict, use warnings and all initialisations.
    Then how did the code you posted work? Non of it was initialized with
    the scope it should have been. (IE any of the places I added 'my' to in
    my previous email.
    foreach my $logline (reverse(@lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    Wht not just
    $ipload{$1} += $2;
    instead of the if else?
    Thought it would be safer, because some ip-adresses have multiple file
    entries. But i guess it's not necessary.
    Don't you want the total for each IP from all the files?

    I'm still not sure where you where put all the log files into one ans
    still "already did it that way" like you said in the previous email.
  • JupiterHost.Net at Oct 6, 2004 at 10:48 pm
    Try this
    foreach $file (@sortlist){

    open(LOG,$file) or die "Can't open $file: $!\n";
    @lines = <LOG>;
    foreach my $logline (reverse(@lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    }
    }

    #Close log file
    close(LOG) or die "Can't close $file: $!\n";

    #Print hash sorted by Host-IP-Adress
    foreach $ip ( map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
    (/(\d+)$/)[0] ] } keys %ipload) {
    print "$ip = $ipload{$ip}\n";
    }

    Try this script out (replacing the fake filenames with actual ones)
    and see if it will help you sort out what's happening:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use Data::Dumper;

    my %ipload = ();
    my @sortlist = qw(file1 file2);

    for my $file(@sortlist) {
    open LOG, $file or die "Open $file: $!";
    while(<LOG>) {
    my ($ip,$bytes) = $_ =~ m/(\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+)
    [A-Z]+/;
    $ipload{$ip} += $bytes if $ip && $bytes;
    }
    close LOG;
    }

    print Dumper \%ipload;

    HTH :) - Lee.M - JupiterHost.Net
  • Charles K. Clarkson at Oct 6, 2004 at 9:36 pm
    Folker Naumann wrote:

    : I'm fairly new to Perl and need some help to acomplish a
    : (simple?) task.
    :
    : I extract strings from some logfiles, namely an ip-adress
    : and bytes, by using regexes. I use a hash to store
    : ip-adress and associated bytes. First i packed all logs
    : in a temporary file but it was getting too big.
    : Now i'm having problems to merge the hashes from the
    : single logfiles to one hash for all. Either i have a
    : hash for each of the files or just one for the last
    : processed file.


    Show us your code.


    Charles K. Clarkson
    --
    Mobile Homes Specialist
    254 968-8328
  • John W. Krahn at Oct 7, 2004 at 6:43 am

    Folker Naumann wrote:
    Hi there! Hello,
    I'm fairly new to Perl and need some help to acomplish a (simple?) task.

    I extract strings from some logfiles, namely an ip-adress and bytes, by
    using regexes. I use a hash to store ip-adress and associated bytes.
    First i packed all logs in a temporary file but it was getting too big.
    Now i'm having problems to merge the hashes from the single logfiles to
    one hash for all. Either i have a hash for each of the files or just one
    for the last processed file.

    i should have include my code in the beginning, because i've done this
    already. I know that in this case a hash is generated for every file.
    But when i do the printing outside the foreach-loop just the hash of the
    last file is printed. I'm aware about that problem but could not figure
    out how to create only one hash for all files.
    You may be able to do this by using a tied hash which will actually store the
    hash's contents in a file.

    perldoc DB_File
    perldoc AnyDBM_File
    perldoc perldbmfilter

    ----------------------------------------------------------
    (...)
    foreach $file (@sortlist){ >
    open(LOG,$file) or die "Can't open $file: $!\n";
    @lines = <LOG>;
    foreach my $logline (reverse(@lines)) {
    You could use File::ReadBackwards (which is a lot more efficient) if you
    really need to this however there is no point as you are storing the data in a
    hash which will not preserve the input order.

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    You don't need the if test as perl will do the right thing when $ipload{$1}
    doesn't exist (autovivification.) You can compress the IP address quite a bit
    by using Socket::inet_aton() which will also confirm that it is a valid IP
    address.

    }
    } >
    #Close log file
    close(LOG) or die "Can't close $file: $!\n"; >
    #Print hash sorted by Host-IP-Adress
    foreach $ip ( map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
    (/(\d+)$/)[0] ] } keys %ipload) {
    You don't need the list slice because without the /g (global) option the
    expression can only match once. Your comment says you are sorting by IP
    address but your code says you are only sorting by the last octet in the
    address. Did you intend to sort by the complete IP address?

    print "$ip = $ipload{$ip}\n";
    } >
    ----------------------------------------------------------
    This may work as it doesn't slurp the whole file(s) into memory:

    use warnings;
    use strict;
    use Socket;

    my %ipload;
    { local @ARGV = @sortlist;

    while ( <> ) {
    next unless / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/
    my $ip = inet_aton( $1 ) or do {
    warn "$1 is an invalid IP address.\n";
    next;
    };
    $ipload{ $1 } += $2
    }
    }

    #Print hash sorted by Host-IP-Adress
    for ( sort keys %ipload ) {
    my $ip = inet_ntoa( $_ );
    print "$ip = $ipload{$_}\n";
    }

    __END__



    John
    --
    use Perl;
    program
    fulfillment
  • Folker Naumann at Oct 7, 2004 at 10:24 am

    John W. Krahn wrote:
    You may be able to do this by using a tied hash which will actually
    store the hash's contents in a file.

    perldoc DB_File
    perldoc AnyDBM_File
    perldoc perldbmfilter
    Tied hashes look fairly complicated to me, but i'll give them a try ;)
    #Print hash sorted by Host-IP-Adress
    foreach $ip ( map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
    (/(\d+)$/)[0] ] } keys %ipload) {
    You don't need the list slice because without the /g (global) option the
    expression can only match once. Your comment says you are sorting by IP
    address but your code says you are only sorting by the last octet in the
    address. Did you intend to sort by the complete IP address?
    I have to admit that i'm not completly firm with the Schwartzian
    Transformation, but it does what i want. Because all adresses belong to
    only one subnet i just need to sort by the last octet and i get:

    192.168.0.1
    192.168.0.2
    ...
    192.168.0.255
    ----------------------------------------------------------
    This may work as it doesn't slurp the whole file(s) into memory:

    use warnings;
    use strict;
    use Socket;

    my %ipload;
    { local @ARGV = @sortlist;

    while ( <> ) {
    next unless / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/
    my $ip = inet_aton( $1 ) or do {
    warn "$1 is an invalid IP address.\n";
    next;
    };
    $ipload{ $1 } += $2
    }
    }

    #Print hash sorted by Host-IP-Adress
    for ( sort keys %ipload ) {
    my $ip = inet_ntoa( $_ );
    print "$ip = $ipload{$_}\n";
    }

    __END__
    This gives me "Bad arg length for Socket::inet_ntoa, length is 13,
    should be 4 at line..."

    Thanks

    Folker Naumann
  • John W. Krahn at Oct 7, 2004 at 3:09 pm

    Folker Naumann wrote:
    John W. Krahn wrote:
    This may work as it doesn't slurp the whole file(s) into memory:

    use warnings;
    use strict;
    use Socket;

    my %ipload;
    { local @ARGV = @sortlist;

    while ( <> ) {
    next unless / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/
    my $ip = inet_aton( $1 ) or do {
    warn "$1 is an invalid IP address.\n";
    next;
    };
    $ipload{ $1 } += $2
    }
    }

    #Print hash sorted by Host-IP-Adress
    for ( sort keys %ipload ) {
    my $ip = inet_ntoa( $_ );
    print "$ip = $ipload{$_}\n";
    }

    __END__
    This gives me "Bad arg length for Socket::inet_ntoa, length is 13,
    should be 4 at line..."
    Sorry, the line:

    $ipload{ $1 } += $2

    should be:

    $ipload{ $ip } += $2



    John
    --
    use Perl;
    program
    fulfillment

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedOct 6, '04 at 8:45p
activeOct 7, '04 at 3:09p
posts11
users4
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase