FAQ
Hi Gurus,
I am new to perl and need some help to learn regex in perl. From the
below line i need to extract following:

Part I
1) $ENVFROM = dapizza@testhost.com
2) $ENVTO1 = test1@etc.com
3) $ENVTO2 = test1@etc.com
4) $ENVTO3 = samtest@abc.com


line=EnvFrom: dapizza@testhost.com, HdrTo: <davis.mel@test.com>,
EnvTo: avis.mel@test.com, test1@etc.com, samtest@abc.com, Subject:
TEST for perl

Part II
We have a huge log files with thousands of entries like above so might
need to be done in either while /for loop in the final phase. Also,
the EnvTo: can be from one email address to multiple. In the above
example there are 3 EnvTo but it can be more than 3 or can be one only
but the regex should be able to determine and put each in a scalar
variable.

I do need the regex specific to Part I (This will help me learn the
regex better) and also separate regex for Part II if possible.

I am newbie to Perl and in the learning phase so please be descriptive
if possible in the answer. I greatly appreciate your help.

Search Discussions

  • Chas. Owens at Dec 14, 2008 at 5:56 pm

    On Sat, Dec 13, 2008 at 19:35, explor wrote:
    Hi Gurus,
    I am new to perl and need some help to learn regex in perl. From the
    below line i need to extract following:

    Part I
    1) $ENVFROM = dapizza@testhost.com
    2) $ENVTO1 = test1@etc.com
    3) $ENVTO2 = test1@etc.com
    4) $ENVTO3 = samtest@abc.com


    line=EnvFrom: dapizza@testhost.com, HdrTo: <davis.mel@test.com>,
    EnvTo: avis.mel@test.com, test1@etc.com, samtest@abc.com, Subject:
    TEST for perl
    snip


    If you can guarantee that EnvFrom:, EnvTo:, and Subject: will always
    exist and that they will exist in the same order you can say
    #!/usr/bin/perl

    use warnings;
    use strict;

    my $line = 'line=EnvFrom: dapizza@testhost.com, HdrTo:
    <davis.mel@test.com>, EnvTo: avis.mel@test.com, test1@etc.com,
    samtest@abc.com, Subject: TEST for perl';

    #assign captures from regex to $from and $to
    my ($from, $to) = $line =~ m{
    EnvFrom: #match starting at EnvFrom:
    \s+ #follwed by one or more whitespace characters
    ( #begin first capture
    [^\s,]+ #match one or more of any chacter but whitespace or ,
    ) #end first capture
    .* #match the largest string up to next anchor
    EnvTo: #which is the string EnvTo:
    \s+ #follwed by one or more whitespace characters
    ( #begin second capture
    .*? #match the shortest string up to the next anchor
    ) #begin second capture
    Subject: #which is the string Subject:
    }xms;

    #all of the addresses are still in one string for EnvTo:, so we
    #split the on a comma followed by zero or more whitespace characters
    #and store the resulting list into an array for easy access
    my @to = split /,\s*/, $to;

    print "from: $from\nto: ", join(", ", @to), "\n";
    Part II
    We have a huge log files with thousands of entries like above so might
    need to be done in either while /for loop in the final phase. Also,
    the EnvTo: can be from one email address to multiple. In the above
    example there are 3 EnvTo but it can be more than 3 or can be one only
    but the regex should be able to determine and put each in a scalar
    variable.

    I do need the regex specific to Part I (This will help me learn the
    regex better) and also separate regex for Part II if possible.
    If each record is on a line by itself, there is no need for a separate regex:

    #!/usr/bin/perl

    use warnings;
    use strict;

    #read lines from either STDIN or a set of files passed in on the commandline
    while (my $line = <>) {
    #move to the next line if the regex doesn't match
    next unless my ($from, $to) = $line =~ m{
    EnvFrom: \s+ ([^\s,]+)
    .*
    EnvTo: \s+ (.*?)
    Subject:
    }xms;

    my @to = split /,\s*/, $to;

    print "for record $. I found\n\tfrom: $from\n\tto: ",
    join(", ", @to), "\n";
    }

    If each record is split over many lines you will need a different
    solution depending on how the data is split. If you post what your
    data looks like we may be able to help.

    --
    Chas. Owens
    wonkden.net
    The most important skill a programmer can have is the ability to read.
  • John W. Krahn at Dec 14, 2008 at 7:09 pm

    explor wrote:
    Hi Gurus, Hello,
    I am new to perl and need some help to learn regex in perl. From the
    below line i need to extract following:

    Part I
    1) $ENVFROM = dapizza@testhost.com
    2) $ENVTO1 = test1@etc.com
    3) $ENVTO2 = test1@etc.com
    4) $ENVTO3 = samtest@abc.com


    line=EnvFrom: dapizza@testhost.com, HdrTo: <davis.mel@test.com>,
    EnvTo: avis.mel@test.com, test1@etc.com, samtest@abc.com, Subject:
    TEST for perl
    $ perl -le'
    my $x = q[line=EnvFrom: dapizza@testhost.com, HdrTo:
    <davis.mel@test.com>, EnvTo: avis.mel@test.com, test1@etc.com,
    samtest@abc.com, Subject: TEST for perl];

    my @fields = split /(\w+):\s+/, $x;

    my %data;
    for ( my $index = 0; $index <= $#fields; ++$index ) {
    if ( $fields[ $index ] =~ /\Aenv(?:from|to)\z/i ) {
    push @{ $data{ uc $fields[ $index ] } }, split /\s*,\s*/,
    $fields[ ++$index ];
    }
    }

    use Data::Dumper;
    print Dumper \%data;
    '
    $VAR1 = {
    'ENVFROM' => [
    'dapizza@testhost.com'
    ],
    'ENVTO' => [
    'avis.mel@test.com',
    'test1@etc.com',
    'samtest@abc.com'
    ]
    };



    John
    --
    Those people who think they know everything are a great
    annoyance to those of us who do. -- Isaac Asimov

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedDec 14, '08 at 12:36a
activeDec 14, '08 at 7:09p
posts3
users3
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase