FAQ
I have this code that looks through a series of files, and for each,
counts
the unique number of IP addresses over the past 20 minutes, then
rewrites
the file contents.

It seems pretty simple - each of the 5 $site files has at most 100 or
so IP
entries it in - many files have 0-5 rows of data at any time. So all
in all,
the 5 files are small.

Yet, as I monitor my server CPU and memory usage, top shows this as
a #1 offender often. It runs out of cron every 10 minutes all day
long.

Can anyone help me determine why this is so bad in terms of CPU and
memory according to "top"?

Thanks!


#!/usr/local/bin/perl

my $DEBUG=0;

my $HOME=qq(/home/site);
use Date::Manip;

use CGI;
my $cgi = new CGI;

my $then=&ParseDate("20 minutes ago");

# Get list of all sites for which IP data is being logged
my @todo=`ls $HOME/www/cgi-bin/js-ip.*`;

while (my $site=shift(@todo)) {
chomp $site;
$site=~s/.*js-ip.//g;

my $ipfile=qq($HOME/www/cgi-bin/js-ip.$site);
open (JF, "$ipfile");
my @data=<JF>;
close JF;

my %haveip=();
my $new="";
my $online=0;
# Look at the IP data and count the number of unique IPs
# Save only those which have appeared in the past 20 minutes
while (my $row=shift(@data)) {
chomp $row;
my ($date, $ip)=split("-", $row);
my $date1=&ParseDate($date);
my $cmp=&Date_Cmp($date1, $then);
if ($cmp>-1 && ! $haveip{$ip}) {
$new.=qq($date-$ip\n);
$haveip{$ip}=1;
$online++;
}
}
# The array is empty already, but what the hey
@data=();

# Write over the file with the latest IP data
open (JF1, ">$ipfile");
print JF1 qq($new);
close JF1;

# Write the online count for this site to file for display
open (JC, ">$HOME/www/cgi-bin/online-now.$site");
print JC qq($online);
close JC;
}

Search Discussions

  • Sureshkumar M (HCL Financial Services) at Nov 22, 2008 at 12:08 pm
    Hi All,



    I want to find the string which are having the date inside the
    file.

    Please help me how do I match it,below is my program and it's not
    returning anything.







    #!/usr/bin/perl

    open(DATA,"i")||die "Unable to open the file";

    while(<DATA>)

    {

    if($_=~/(\d{2})([\W])\1\2\1]/)

    {

    print $_;

    }

    }

    close(DATA);

    exit 0;

    ~

    Input file:



    $cat i

    15-06-79

    05-06-1981

    12-11-9

    13-10-89

    19-10-20009

    1-10-0002

    02-03-2008

    03-nov-2008



    $





    Output should be:-



    15-06-79

    05-06-1981

    13-10-89

    02-03-2008

    03-nov-2008











    DISCLAIMER:
    -----------------------------------------------------------------------------------------------------------------------
    The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
    It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
    this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
    Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
    this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
    received this email in error please delete it and notify the sender immediately. Before opening any mail and
    attachments please check them for viruses and defect.
    -----------------------------------------------------------------------------------------------------------------------
  • Dermot at Nov 22, 2008 at 2:16 pm
    2008/11/22 Sureshkumar M (HCL Financial Services) <Sureshkumar.M@hcl.in>:
    Hi All, Hi
    #!/usr/bin/perl
    # Always use these, particularly when things aren't working as expected.
    use strict;
    use warnings;
    open(DATA,"i")||die "Unable to open the file";

    while(<DATA>)

    {

    if($_=~/(\d{2})([\W])\1\2\1]/)
    I could be wrong but I don't think \w will not match a hypen "-" so
    the test will fail.
    This works for me:

    if ($_=~/\d{1,2}-(\d{2}|\w{3})-\d+/)


    HTH,
    Dp.
  • John W. Krahn at Nov 22, 2008 at 2:31 pm

    Dermot wrote:
    2008/11/22 Sureshkumar M (HCL Financial Services) <Sureshkumar.M@hcl.in>:
    #!/usr/bin/perl
    # Always use these, particularly when things aren't working as expected.
    use strict;
    use warnings;
    open(DATA,"i")||die "Unable to open the file";

    while(<DATA>)

    {

    if($_=~/(\d{2})([\W])\1\2\1]/)
    I could be wrong but I don't think \w will not match a hypen "-" so
    the test will fail.
    \w won't match '-', but \W will.
    This works for me:

    if ($_=~/\d{1,2}-(\d{2}|\w{3})-\d+/)
    The OP wanted to match only two digit day values. \w will also match
    digits so that will match a three digit month number. The OP didn't
    want to match year values with one or five digits but that will match them.



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
  • Sftriman at Nov 22, 2008 at 7:05 pm

    On Nov 22, 6:16 am, paik...@googlemail.com (Dermot) wrote:
    2008/11/22 Sureshkumar M (HCL Financial Services) <Sureshkuma...@hcl.in>:


    Hi All,
    Hi


    #!/usr/bin/perl
    # Always use these, particularly when things aren't working as expected.
    use strict;
    use warnings;
    open(DATA,"i")||die "Unable to open the file";
    while(<DATA>)
    {
    if($_=~/(\d{2})([\W])\1\2\1]/)
    I could be wrong but I don't think \w will not match a hypen "-" so
    the test will fail.
    This works for me:

    if ($_=~/\d{1,2}-(\d{2}|\w{3})-\d+/)

    HTH,
    Dp.
    Thanks for the reply. I added strict and warning, and thankfully,
    there were no messages.

    The while loop on the file handle makes sense - I should do that.

    What are you referring to in the if part? I see I have an unescaped
    hyphen which
    I will make \- in the regexp compare. But what is the compare you are
    writing?

    Also, I ran the script many times just now - it runs so fast, I can't
    see why it's
    causing the CPU surge:

    <Q>$ time proc-js*pl
    0.29s real 0.24s user 0.04s system
    <Q>$ time proc-js*pl
    0.33s real 0.28s user 0.03s system
    <Q>$ time proc-js*pl
    0.39s real 0.34s user 0.05s system

    David
  • Dermot at Nov 24, 2008 at 3:06 pm

    2008/11/22 sftriman <dalyea@gmail.com>:
    On Nov 22, 6:16 am, paik...@googlemail.com (Dermot) wrote:
    2008/11/22 Sureshkumar M (HCL Financial Services) <Sureshkuma...@hcl.in>:
    I could be wrong but I don't think \w will not match a hypen "-" so
    the test will fail.
    This works for me:

    if ($_=~/\d{1,2}-(\d{2}|\w{3})-\d+/)

    HTH,
    Dp.
    Opps, yes John that correct. I didn't scroll down to the bit where it said

    Output should be.
    Thanks for the reply. I added strict and warning, and thankfully,
    there were no messages.

    The while loop on the file handle makes sense - I should do that.

    What are you referring to in the if part? I see I have an unescaped
    hyphen which
    I will make \- in the regexp compare.
    Who escaped a hypen? There is no need to escape a hypen. You escape
    meta-characters and a hypen isn't.

    But what is the compare you are
    writing?
    My Regex was incorrect, as John pointed out. I was looking for

    1 or 2 digit, a hypen, a 2 digit number of 3 character string, a hypen
    and any number of word characters.

    You only want two digit day values and a 2-4 digit year value so
    /\d{2}-(\d{2}|\w{3}-\d{2,4}/ would be the regex I would use.

    Have a look at perldoc perlretut

    Also, I ran the script many times just now - it runs so fast, I can't
    see why it's
    causing the CPU surge:

    <Q>$ time proc-js*pl
    0.29s real 0.24s user 0.04s system
    <Q>$ time proc-js*pl
    0.33s real 0.28s user 0.03s system
    <Q>$ time proc-js*pl
    0.39s real 0.34s user 0.05s system
    Can't help with that. :-/

    Dp.
  • John W. Krahn at Nov 22, 2008 at 2:24 pm

    Sureshkumar M (HCL Financial Services) wrote:
    Hi All, Hello,
    I want to find the string which are having the date inside the
    file.
    Please help me how do I match it,below is my program and it's not
    returning anything.

    #!/usr/bin/perl
    use warnings;
    use strict;
    open(DATA,"i")||die "Unable to open the file";
    You should include the $! variable in the error message so you know
    *why* it failed to open.
    while(<DATA>)
    {
    if($_=~/(\d{2})([\W])\1\2\1]/)
    That regular expression says: match two digits anywhere in the string
    and store the results in $1, followed by a non-word character and store
    the results in $2, followed by the contents of the first capturing
    parentheses, followed by the contents of the second capturing
    parentheses, followed by the contents of the first capturing
    parentheses, followed by a ']' character.

    So that will match, for example '15-15-15]'
    {
    print $_;
    }
    }
    close(DATA);
    exit 0;
    ~
    Input file:

    $cat i
    15-06-79
    05-06-1981
    12-11-9
    13-10-89
    19-10-20009
    1-10-0002
    02-03-2008
    03-nov-2008
    $

    Output should be:-

    15-06-79
    05-06-1981
    13-10-89
    02-03-2008
    03-nov-2008
    It looks like you want something like:

    /^\d\d\D(?:\d\d|[a-zA-Z]{3})\D(?:\d\d|\d{4})$/




    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
  • Mr. Shawn H. Corey at Nov 22, 2008 at 12:46 pm

    On Fri, 2008-11-21 at 23:51 -0800, sftriman wrote:
    I have this code that looks through a series of files, and for each,
    counts
    the unique number of IP addresses over the past 20 minutes, then
    rewrites
    the file contents.

    It seems pretty simple - each of the 5 $site files has at most 100 or
    so IP
    entries it in - many files have 0-5 rows of data at any time. So all
    in all,
    the 5 files are small.

    Yet, as I monitor my server CPU and memory usage, top shows this as
    a #1 offender often. It runs out of cron every 10 minutes all day
    long.

    Can anyone help me determine why this is so bad in terms of CPU and
    memory according to "top"?
    Add a nice(1) command to your cron(8). See `man nice` for details.
    This will lower the priority of your script which gives it fewer CPU
    cycles than your normal applications.
    Thanks!


    #!/usr/local/bin/perl
    use strict;
    use warnings;
    my $DEBUG=0;

    my $HOME=qq(/home/site);
    use Date::Manip;

    use CGI;
    my $cgi = new CGI;
    You don't use $cgi in the rest of your code. Remove the above two
    lines.

    my $then=&ParseDate("20 minutes ago");

    # Get list of all sites for which IP data is being logged
    my @todo=`ls $HOME/www/cgi-bin/js-ip.*`;
    To make your code system independent, use glob(). See `perldoc -f glob`
    for details.
    while (my $site=shift(@todo)) {
    chomp $site;
    $site=~s/.*js-ip.//g;

    my $ipfile=qq($HOME/www/cgi-bin/js-ip.$site);
    open (JF, "$ipfile");
    my @data=<JF>;
    close JF;
    You are slurping in the entire file. This requires memory. Use the
    more conventional open-while-close procedure.
    my %haveip=();
    my $new="";
    my $online=0;
    # Look at the IP data and count the number of unique IPs
    # Save only those which have appeared in the past 20 minutes
    while (my $row=shift(@data)) {
    chomp $row;
    my ($date, $ip)=split("-", $row);
    my $date1=&ParseDate($date);
    my $cmp=&Date_Cmp($date1, $then);
    if ($cmp>-1 && ! $haveip{$ip}) {
    $new.=qq($date-$ip\n);
    Open your output file before the loop and print to it one line at a
    time. This will save more memory.
    $haveip{$ip}=1;
    $online++;
    }
    }
    # The array is empty already, but what the hey
    @data=();

    # Write over the file with the latest IP data
    open (JF1, ">$ipfile");
    print JF1 qq($new);
    close JF1;

    # Write the online count for this site to file for display
    open (JC, ">$HOME/www/cgi-bin/online-now.$site");
    print JC qq($online);
    close JC;
    }

    --
    Just my 0.00000002 million dollars worth,
    Shawn

    The map is not the territory,
    the dossier is not the person,
    the model is not reality,
    and the universe is indifferent to your beliefs.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedNov 22, '08 at 7:51a
activeNov 24, '08 at 3:06p
posts8
users5
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase