FAQ
I am trying to get all the 6 letter names in the second field in DATA
below, eg

BARTON
DARWIN
DARWIN

But the script below gives me all 6 letter and more entries.

What I read says {6} means exactly 6.

What is the correct RE?

I have solved the problem my using if (length($data[1]) == 6 ) but
would love to know the correct syntax for the RE


TIA


Owen


=================================================================

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
my $line = $_;

my @line = split /,/;
$line[1] =~ s /\"//g;

print "$line[1]\n" if $line[1] =~ /\S{6}/;
}

__DATA__
"0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
"0221","BARTON","ACT","LVR Special Mailing"
"0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
"0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
"0804","PARAP","NT","PO Boxes","PARAP LPO"
"0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
"0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
"0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
"0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

===============================================================

Search Discussions

  • Jim Gibson at May 16, 2011 at 11:00 pm

    On 5/16/11 Mon May 16, 2011 3:44 PM, "Owen" <rcook@pcug.org.au> scribbled:

    I am trying to get all the 6 letter names in the second field in DATA
    below, eg

    BARTON
    DARWIN
    DARWIN

    But the script below gives me all 6 letter and more entries.

    What I read says {6} means exactly 6.
    \S{6} will match any string containing 6 consecutive non-whitespace
    characters. It will also match any string containing more than 6 such
    characters, because any such string contains within it a substring of
    exactly six characters. Perl matches do not have to match the entire string.
    What is the correct RE?
    If you want exactly six characters, then you need to specify that any
    characters before or after the wanted six are not also members of the
    desired class. In your case, the easiest way is to anchor the match at the
    beginning and the end:

    $line[1] =~ /^\S{6}$/

    If you were looking for word characters, e.g. \w, you could use the word
    boundary assertion metasymbol \b:

    $line[1] =~ /\b\w{6}\b/

    That will not work if your names contain punctuation characters, e.g
    O'Reilly. More complex matches can use the negative lookahead and lookbehind
    constructs.
    I have solved the problem my using if (length($data[1]) == 6 ) but
    would love to know the correct syntax for the RE


    TIA


    Owen


    =================================================================

    #!/usr/bin/perl

    use strict;
    use warnings;

    while (<DATA>) {
    my $line = $_;

    my @line = split /,/;
    $line[1] =~ s /\"//g;

    print "$line[1]\n" if $line[1] =~ /\S{6}/;
    }

    __DATA__
    "0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
    "0221","BARTON","ACT","LVR Special Mailing"
    "0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
    "0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
    "0804","PARAP","NT","PO Boxes","PARAP LPO"
    "0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
    "0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

    ===============================================================
  • Rob Dixon at May 17, 2011 at 10:52 am

    On 16/05/2011 23:44, Owen wrote:
    I am trying to get all the 6 letter names in the second field in DATA
    below, eg

    BARTON
    DARWIN
    DARWIN

    But the script below gives me all 6 letter and more entries.

    What I read says {6} means exactly 6.

    What is the correct RE?

    I have solved the problem my using if (length($data[1]) == 6 ) but
    would love to know the correct syntax for the RE


    =================================================================

    #!/usr/bin/perl

    use strict;
    use warnings;

    while (<DATA>) {
    my $line = $_;

    my @line = split /,/;
    $line[1] =~ s /\"//g;

    print "$line[1]\n" if $line[1] =~ /\S{6}/;
    }

    __DATA__
    "0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
    "0221","BARTON","ACT","LVR Special Mailing"
    "0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
    "0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
    "0804","PARAP","NT","PO Boxes","PARAP LPO"
    "0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
    "0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

    ===============================================================
    Hi Owen.

    Your test establishes only whether the pattern can be found within the
    object string a test like

    "CASUARINA" =~ /\S{6}/;

    finds the six non-space characters "CASUAR" and then returns success as
    the criterion has been satisfied.

    To get it to match /only/ six-character non-space strings you can add
    anchors at the beginning and end of the regex:

    "CASUARINA" =~ /^\S{6}$/;

    will fail because the sequence "beginning of line, six non-space
    characters, end of line" don't appear in "CASUARINA".

    But the proper way to do this is to forget about regular expressions and
    treat the data as comma-separated fields. The module Text::CSV will do
    this for you, as per the progrm below.

    HTH,

    Rob


    use strict;
    use warnings;

    use Text::CSV;

    my $csv = Text::CSV->new;

    while (my $fields = $csv->getline(*DATA)) {
    my $suburb = $fields->[1];
    next unless $suburb and length $suburb == 6;
    print $suburb, "\n";
    }

    __DATA__
    "0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
    "0221","BARTON","ACT","LVR Special Mailing"
    "0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
    "0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
    "0804","PARAP","NT","PO Boxes","PARAP LPO"
    "0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
    "0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
    "0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

    **OUTPUT**

    BARTON
    DARWIN
    DARWIN

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedMay 16, '11 at 10:44p
activeMay 17, '11 at 10:52a
posts3
users3
websiteperl.org

3 users in discussion

Jim Gibson: 1 post Owen: 1 post Rob Dixon: 1 post

People

Translate

site design / logo © 2022 Grokbase