FAQ
Hello,
I have been scratching my head on this problem and was wondering if someone
can help me out. Basically I need to take a raw list of data (a snippet of
it is below my email) and create another file with the information formatted
in the following format: "Date: Category: Winner." The example of the
finished file is as follows:

1934: Actor: Clark Gable
1934: Actress: Claudette Colbert
1934: Art Direction: The Merry Widow

As I am not a programmer by nature, I'm trying to figure out how to work out
the logic of this program. The "Date" does not repeat with each category
but only changes when the next year of results is displayed in the data
file. How do I setup my logic to support this? Any help that can be
provided would be much appreciated.

Just an FYI - here is my crack at finding the lines that match each
attribute:
Date: print $_ if $_ =~ /^(\d{4})*/
#look for four digits at the beginning of a string
Category: print $_ if $_ =~ /^[A-Z]+/
#look for one or more all caps characters at the beginning of a string
Winner: print $_ if $_ =~ /(\*)*(--)/
#look for a field that starts with an asterisks and contains "--"

I am open to comments on my regular expressions. Thanks!


==================== SNIPPET OF RAW DATA FILE ====================

1934 (7th),
ACTOR,
*,"Clark Gable -- It Happened One Night {""Peter Warne""}"
ACTRESS,
*,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
ART DIRECTION,
*,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
,[NOTE: won by two votes]
ASSISTANT DIRECTOR,
*,Viva Villa! -- John Waters
CINEMATOGRAPHY,per
*,Cleopatra -- Vicxtor Milner
DIRECTING,
*,It Happened One Night -- Frank Capra
FILM EDITING,
*,Eskimo -- Conrad Nervig
MUSIC (Scoring),
*,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
MUSIC (Song),
*,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad; Lyrics
by Herb Magidson"
OUTSTANDING PRODUCTION,
*,It Happened One Night -- Columbia
SHORT SUBJECT (Cartoon),
*,"The Tortoise and the Hare -- Walt Disney, Producer"
SHORT SUBJECT (Comedy),
*,"La Cucaracha -- Kenneth Macgowan, Producer"
SHORT SUBJECT (Novelty),
*,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
SOUND RECORDING,
*,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
Sound Director"
WRITING (Adaptation),
*,It Happened One Night -- Robert Riskin
WRITING (Original Story),
*,Manhattan Melodrama -- Arthur Caesar
SPECIAL AWARD,
*,"To Shirley Temple, in grateful recognition of her outstanding
contribution to screen entertainment during the year 1934."
SCIENTIFIC OR TECHNICAL AWARD (Class II),
*,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
Vertical Cut Disc Method of recording sound for motion pictures (hill and
dale recording). [Sound]"
SCIENTIFIC OR TECHNICAL AWARD (Class III),
*,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
Cut Disc Method (hill and dale recording) to actual studio production, with
their recording of the sound on the picture One Night of Love. [Sound]"
*,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
Fully Automatic Sound and Picture Printer. [Laboratory]
,
1935 (8th),
ACTOR,
*,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
ACTRESS,
*,"Bette Davis -- Dangerous {""Joyce Heath""}"
ART DIRECTION,
*,The Dark Angel -- Richard Day
ASSISTANT DIRECTOR,
*,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
CINEMATOGRAPHY,
*,A Midsummer Night's Dream -- Hal Mohr
,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
DANCE DIRECTION,
*,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from Broadway
Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
DIRECTING,
*,The Informer -- John Ford
FILM EDITING,
*,A Midsummer Night's Dream -- Ralph Dawson
MUSIC (Scoring),
*,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head of
department (Score by Max Steiner)"
MUSIC (Song),
*,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
Warren; Lyrics by Al Dubin"
OUTSTANDING PRODUCTION,
*,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
SHORT SUBJECT (Cartoon),
*,"Three Orphan Kittens -- Walt Disney, Producer"
SHORT SUBJECT (Comedy),
*,"How to Sleep -- Jack Chertok, Producer"
SHORT SUBJECT (Novelty),
*,Wings over Mt. Everest -- Gaumont British and Skibo Productions
SOUND RECORDING,
*,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department, Douglas
Shearer, Sound Director"
WRITING (Original Story),
*,"The Scoundrel -- Ben Hecht, Charles MacArthur"
WRITING (Screenplay),
*,The Informer -- Dudley Nichols
,"[NOTE: Mr. Nichols initially refused the award, but Academy records
indicate that he was in possession of a statuette by 1949.]"
SPECIAL AWARD,
*,"To David Wark Griffith, for his distinguished creative achievements as
director and producer and his invaluable initiative and lasting
contributions to the progress of the motion picture arts."
SCIENTIFIC OR TECHNICAL AWARD (Class II),
*,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
film. [Film]
*,To EASTMAN KODAK COMPANY for their development of the Eastman Pola-Screen.
[Lenses and Filters]
SCIENTIFIC OR TECHNICAL AWARD (Class III),
*,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
negative and positive development by means of jet turbulation, and the
application of the method to all negative and print processing of the entire
product of a major producing company. [Laboratory]"
*,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
Department for his method of dubbing, in which the level of the dialogue
automatically controls the level of the accompanying music and sound
effects. [Sound]"
*,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
spot lamps. [Lighting]"
*,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
their automatic control system for cameras and sound recording machines and
auxiliary stage equipment. [Stage Operations]
*,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development of
equipment to analyze and measure flutter resulting from the travel of the
film through the mechanisms used in the recording and reproduction of sound.
[Sound]"
*,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
Paramount transparency air turbine developing machine. [Laboratory]"
*,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
National Studio, for the method of intercutting variable density and
variable area sound tracks to secure an increase in the effective volume
range of sound recorded for motion pictures. [Sound]"
,

Search Discussions

  • Timothy adigun at Aug 4, 2011 at 8:41 am
    Hi Ryan,
    Try the the code below, it should help.

    ==========<CODE>=============

    #!/usr/bin/perl -w
    use strict;

    my $ln="";
    my ($yr,$cat,$win)=("","","");
    my $filename="New_output.txt";
    chomp(my $raw_file=<@ARGV>);

    open READFILE,"<","$raw_file" or die "can't open $!";
    open OUTPUTFILE,">","$filename" or die "cannot read $!";
    while(<READFILE>){chomp;
    $ln.="\n" if /^\W.?+$/;
    if(/^\d{4}/){$yr=$&;} # get the year
    if(/^[A-Z].+/){ $cat=$&; # get the Category
    $cat=join"",split /,/,$cat; # remove the comma in front
    $ln.=" $yr: ".$cat; # add both the year and Category
    }
    if(/\--.+/){$win=$`; # get the winner
    $win=join"",split /[\*,\"]/,$win;
    $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
    #### you get "-- It Happened One Night {""Peter Warne""}",etc added to
    what you have
    }
    }
    print OUTPUTFILE $ln;
    close OUTPUTFILE;
    close READFILE;
    ============================================================
    I used special match variables ($`, $&, and $'), which means
    $` ==> before match variable,
    $& ==> match variable and
    $' ==> after match variable.
    If the code doesn't like you want it you might have to play around with
    regular expressions!
    Regards.
    On Thu, Aug 4, 2011 at 2:38 AM, Ryan Lagola wrote:

    Hello,
    I have been scratching my head on this problem and was wondering if someone
    can help me out. Basically I need to take a raw list of data (a snippet of
    it is below my email) and create another file with the information
    formatted
    in the following format: "Date: Category: Winner." The example of the
    finished file is as follows:

    1934: Actor: Clark Gable
    1934: Actress: Claudette Colbert
    1934: Art Direction: The Merry Widow

    As I am not a programmer by nature, I'm trying to figure out how to work
    out
    the logic of this program. The "Date" does not repeat with each category
    but only changes when the next year of results is displayed in the data
    file. How do I setup my logic to support this? Any help that can be
    provided would be much appreciated.

    Just an FYI - here is my crack at finding the lines that match each
    attribute:
    Date: print $_ if $_ =~ /^(\d{4})*/
    #look for four digits at the beginning of a string
    Category: print $_ if $_ =~ /^[A-Z]+/
    #look for one or more all caps characters at the beginning of a string
    Winner: print $_ if $_ =~ /(\*)*(--)/
    #look for a field that starts with an asterisks and contains "--"

    I am open to comments on my regular expressions. Thanks!


    ==================== SNIPPET OF RAW DATA FILE ====================

    1934 (7th),
    ACTOR,
    *,"Clark Gable -- It Happened One Night {""Peter Warne""}"
    ACTRESS,
    *,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
    ART DIRECTION,
    *,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
    ,[NOTE: won by two votes]
    ASSISTANT DIRECTOR,
    *,Viva Villa! -- John Waters
    CINEMATOGRAPHY,per
    *,Cleopatra -- Vicxtor Milner
    DIRECTING,
    *,It Happened One Night -- Frank Capra
    FILM EDITING,
    *,Eskimo -- Conrad Nervig
    MUSIC (Scoring),
    *,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
    head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
    MUSIC (Song),
    *,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad; Lyrics
    by Herb Magidson"
    OUTSTANDING PRODUCTION,
    *,It Happened One Night -- Columbia
    SHORT SUBJECT (Cartoon),
    *,"The Tortoise and the Hare -- Walt Disney, Producer"
    SHORT SUBJECT (Comedy),
    *,"La Cucaracha -- Kenneth Macgowan, Producer"
    SHORT SUBJECT (Novelty),
    *,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
    SOUND RECORDING,
    *,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
    Sound Director"
    WRITING (Adaptation),
    *,It Happened One Night -- Robert Riskin
    WRITING (Original Story),
    *,Manhattan Melodrama -- Arthur Caesar
    SPECIAL AWARD,
    *,"To Shirley Temple, in grateful recognition of her outstanding
    contribution to screen entertainment during the year 1934."
    SCIENTIFIC OR TECHNICAL AWARD (Class II),
    *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
    Vertical Cut Disc Method of recording sound for motion pictures (hill and
    dale recording). [Sound]"
    SCIENTIFIC OR TECHNICAL AWARD (Class III),
    *,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
    Cut Disc Method (hill and dale recording) to actual studio production, with
    their recording of the sound on the picture One Night of Love. [Sound]"
    *,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
    Fully Automatic Sound and Picture Printer. [Laboratory]
    ,
    1935 (8th),
    ACTOR,
    *,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
    ACTRESS,
    *,"Bette Davis -- Dangerous {""Joyce Heath""}"
    ART DIRECTION,
    *,The Dark Angel -- Richard Day
    ASSISTANT DIRECTOR,
    *,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
    CINEMATOGRAPHY,
    *,A Midsummer Night's Dream -- Hal Mohr
    ,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
    DANCE DIRECTION,
    *,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from Broadway
    Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
    DIRECTING,
    *,The Informer -- John Ford
    FILM EDITING,
    *,A Midsummer Night's Dream -- Ralph Dawson
    MUSIC (Scoring),
    *,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head of
    department (Score by Max Steiner)"
    MUSIC (Song),
    *,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
    Warren; Lyrics by Al Dubin"
    OUTSTANDING PRODUCTION,
    *,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
    SHORT SUBJECT (Cartoon),
    *,"Three Orphan Kittens -- Walt Disney, Producer"
    SHORT SUBJECT (Comedy),
    *,"How to Sleep -- Jack Chertok, Producer"
    SHORT SUBJECT (Novelty),
    *,Wings over Mt. Everest -- Gaumont British and Skibo Productions
    SOUND RECORDING,
    *,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department, Douglas
    Shearer, Sound Director"
    WRITING (Original Story),
    *,"The Scoundrel -- Ben Hecht, Charles MacArthur"
    WRITING (Screenplay),
    *,The Informer -- Dudley Nichols
    ,"[NOTE: Mr. Nichols initially refused the award, but Academy records
    indicate that he was in possession of a statuette by 1949.]"
    SPECIAL AWARD,
    *,"To David Wark Griffith, for his distinguished creative achievements as
    director and producer and his invaluable initiative and lasting
    contributions to the progress of the motion picture arts."
    SCIENTIFIC OR TECHNICAL AWARD (Class II),
    *,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
    film. [Film]
    *,To EASTMAN KODAK COMPANY for their development of the Eastman
    Pola-Screen.
    [Lenses and Filters]
    SCIENTIFIC OR TECHNICAL AWARD (Class III),
    *,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
    negative and positive development by means of jet turbulation, and the
    application of the method to all negative and print processing of the
    entire
    product of a major producing company. [Laboratory]"
    *,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
    Department for his method of dubbing, in which the level of the dialogue
    automatically controls the level of the accompanying music and sound
    effects. [Sound]"
    *,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
    spot lamps. [Lighting]"
    *,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
    their automatic control system for cameras and sound recording machines and
    auxiliary stage equipment. [Stage Operations]
    *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development of
    equipment to analyze and measure flutter resulting from the travel of the
    film through the mechanisms used in the recording and reproduction of
    sound.
    [Sound]"
    *,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
    Paramount transparency air turbine developing machine. [Laboratory]"
    *,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
    National Studio, for the method of intercutting variable density and
    variable area sound tracks to secure an increase in the effective volume
    range of sound recorded for motion pictures. [Sound]"
    ,
  • Ryan at Aug 4, 2011 at 11:25 am
    Timothy,
    That worked like a charm. Thank you so much for the help.
    -Ryan


    On Thu, Aug 4, 2011 at 4:41 AM, timothy adigun wrote:

    Hi Ryan,
    Try the the code below, it should help.

    ==========<CODE>=============

    #!/usr/bin/perl -w
    use strict;

    my $ln="";
    my ($yr,$cat,$win)=("","","");
    my $filename="New_output.txt";
    chomp(my $raw_file=<@ARGV>);

    open READFILE,"<","$raw_file" or die "can't open $!";
    open OUTPUTFILE,">","$filename" or die "cannot read $!";
    while(<READFILE>){chomp;
    $ln.="\n" if /^\W.?+$/;
    if(/^\d{4}/){$yr=$&;} # get the year
    if(/^[A-Z].+/){ $cat=$&; # get the Category
    $cat=join"",split /,/,$cat; # remove the comma in front
    $ln.=" $yr: ".$cat; # add both the year and Category
    }
    if(/\--.+/){$win=$`; # get the winner
    $win=join"",split /[\*,\"]/,$win;
    $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
    #### you get "-- It Happened One Night {""Peter Warne""}",etc added to
    what you have
    }
    }
    print OUTPUTFILE $ln;
    close OUTPUTFILE;
    close READFILE;
    ============================================================
    I used special match variables ($`, $&, and $'), which means
    $` ==> before match variable,
    $& ==> match variable and
    $' ==> after match variable.
    If the code doesn't like you want it you might have to play around with
    regular expressions!
    Regards.
    On Thu, Aug 4, 2011 at 2:38 AM, Ryan Lagola wrote:

    Hello,
    I have been scratching my head on this problem and was wondering if someone
    can help me out. Basically I need to take a raw list of data (a snippet of
    it is below my email) and create another file with the information
    formatted
    in the following format: "Date: Category: Winner." The example of the
    finished file is as follows:

    1934: Actor: Clark Gable
    1934: Actress: Claudette Colbert
    1934: Art Direction: The Merry Widow

    As I am not a programmer by nature, I'm trying to figure out how to work
    out
    the logic of this program. The "Date" does not repeat with each category
    but only changes when the next year of results is displayed in the data
    file. How do I setup my logic to support this? Any help that can be
    provided would be much appreciated.

    Just an FYI - here is my crack at finding the lines that match each
    attribute:
    Date: print $_ if $_ =~ /^(\d{4})*/
    #look for four digits at the beginning of a string
    Category: print $_ if $_ =~ /^[A-Z]+/
    #look for one or more all caps characters at the beginning of a string
    Winner: print $_ if $_ =~ /(\*)*(--)/
    #look for a field that starts with an asterisks and contains "--"
    I am open to comments on my regular expressions. Thanks!


    ==================== SNIPPET OF RAW DATA FILE ====================

    1934 (7th),
    ACTOR,
    *,"Clark Gable -- It Happened One Night {""Peter Warne""}"
    ACTRESS,
    *,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
    ART DIRECTION,
    *,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
    ,[NOTE: won by two votes]
    ASSISTANT DIRECTOR,
    *,Viva Villa! -- John Waters
    CINEMATOGRAPHY,per
    *,Cleopatra -- Vicxtor Milner
    DIRECTING,
    *,It Happened One Night -- Frank Capra
    FILM EDITING,
    *,Eskimo -- Conrad Nervig
    MUSIC (Scoring),
    *,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
    head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
    MUSIC (Song),
    *,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad; Lyrics
    by Herb Magidson"
    OUTSTANDING PRODUCTION,
    *,It Happened One Night -- Columbia
    SHORT SUBJECT (Cartoon),
    *,"The Tortoise and the Hare -- Walt Disney, Producer"
    SHORT SUBJECT (Comedy),
    *,"La Cucaracha -- Kenneth Macgowan, Producer"
    SHORT SUBJECT (Novelty),
    *,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
    SOUND RECORDING,
    *,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
    Sound Director"
    WRITING (Adaptation),
    *,It Happened One Night -- Robert Riskin
    WRITING (Original Story),
    *,Manhattan Melodrama -- Arthur Caesar
    SPECIAL AWARD,
    *,"To Shirley Temple, in grateful recognition of her outstanding
    contribution to screen entertainment during the year 1934."
    SCIENTIFIC OR TECHNICAL AWARD (Class II),
    *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
    Vertical Cut Disc Method of recording sound for motion pictures (hill and
    dale recording). [Sound]"
    SCIENTIFIC OR TECHNICAL AWARD (Class III),
    *,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
    Cut Disc Method (hill and dale recording) to actual studio production, with
    their recording of the sound on the picture One Night of Love. [Sound]"
    *,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
    Fully Automatic Sound and Picture Printer. [Laboratory]
    ,
    1935 (8th),
    ACTOR,
    *,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
    ACTRESS,
    *,"Bette Davis -- Dangerous {""Joyce Heath""}"
    ART DIRECTION,
    *,The Dark Angel -- Richard Day
    ASSISTANT DIRECTOR,
    *,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
    CINEMATOGRAPHY,
    *,A Midsummer Night's Dream -- Hal Mohr
    ,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
    DANCE DIRECTION,
    *,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from Broadway
    Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
    DIRECTING,
    *,The Informer -- John Ford
    FILM EDITING,
    *,A Midsummer Night's Dream -- Ralph Dawson
    MUSIC (Scoring),
    *,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head of
    department (Score by Max Steiner)"
    MUSIC (Song),
    *,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
    Warren; Lyrics by Al Dubin"
    OUTSTANDING PRODUCTION,
    *,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
    SHORT SUBJECT (Cartoon),
    *,"Three Orphan Kittens -- Walt Disney, Producer"
    SHORT SUBJECT (Comedy),
    *,"How to Sleep -- Jack Chertok, Producer"
    SHORT SUBJECT (Novelty),
    *,Wings over Mt. Everest -- Gaumont British and Skibo Productions
    SOUND RECORDING,
    *,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department, Douglas
    Shearer, Sound Director"
    WRITING (Original Story),
    *,"The Scoundrel -- Ben Hecht, Charles MacArthur"
    WRITING (Screenplay),
    *,The Informer -- Dudley Nichols
    ,"[NOTE: Mr. Nichols initially refused the award, but Academy records
    indicate that he was in possession of a statuette by 1949.]"
    SPECIAL AWARD,
    *,"To David Wark Griffith, for his distinguished creative achievements as
    director and producer and his invaluable initiative and lasting
    contributions to the progress of the motion picture arts."
    SCIENTIFIC OR TECHNICAL AWARD (Class II),
    *,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
    film. [Film]
    *,To EASTMAN KODAK COMPANY for their development of the Eastman
    Pola-Screen.
    [Lenses and Filters]
    SCIENTIFIC OR TECHNICAL AWARD (Class III),
    *,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
    negative and positive development by means of jet turbulation, and the
    application of the method to all negative and print processing of the
    entire
    product of a major producing company. [Laboratory]"
    *,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
    Department for his method of dubbing, in which the level of the dialogue
    automatically controls the level of the accompanying music and sound
    effects. [Sound]"
    *,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
    spot lamps. [Lighting]"
    *,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
    their automatic control system for cameras and sound recording machines and
    auxiliary stage equipment. [Stage Operations]
    *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development of
    equipment to analyze and measure flutter resulting from the travel of the
    film through the mechanisms used in the recording and reproduction of
    sound.
    [Sound]"
    *,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
    Paramount transparency air turbine developing machine. [Laboratory]"
    *,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
    National Studio, for the method of intercutting variable density and
    variable area sound tracks to secure an increase in the effective volume
    range of sound recorded for motion pictures. [Sound]"
    ,
  • John W. Krahn at Aug 4, 2011 at 5:21 pm

    timothy adigun wrote:
    Hi Ryan,
    Try the the code below, it should help.

    ==========<CODE>=============

    #!/usr/bin/perl -w
    use strict;

    my $ln="";
    my ($yr,$cat,$win)=("","","");
    my $filename="New_output.txt";
    chomp(my $raw_file=<@ARGV>);
    That is the same as saying:

    chomp( my $raw_file = glob "@ARGV" );

    Why are you copying the contents of @ARGV to a string and then globbing
    that string?

    If @ARGV contains more than one element then this will not work correctly.

    And why chomp() a string that will not contain newlines?

    What you want is something like:

    my $raw_file = $ARGV[ 0 ];

    Or:

    my $raw_file = shift;

    But you should probably verify that @ARGV is not empty first.

    open READFILE,"<","$raw_file" or die "can't open $!";
    Why are you copying $raw_file to a string?

    open OUTPUTFILE,">","$filename" or die "cannot read $!";
    Why are you copying $filename to a string?

    while(<READFILE>){chomp;
    $ln.="\n" if /^\W.?+$/;
    if(/^\d{4}/){$yr=$&;} # get the year
    if(/^[A-Z].+/){ $cat=$&; # get the Category
    $cat=join"",split /,/,$cat; # remove the comma in front
    $ln.=" $yr: ".$cat; # add both the year and Category
    }
    if(/\--.+/){$win=$`; # get the winner
    The use of $&, $' and $` will slow down *ALL* regular expressions in the
    program. Better to just use capturing parentheses.

    if (/^(\d{4})/ ) { $yr = $1 } # get the year
    if ( /^([A-Z].+)/ ) {
    $cat = $1; # get the Category
    $cat = join "", split /,/, $cat; # remove the comma in front
    $ln.= " $yr: " . $cat; # add both the year and Category
    }
    if ( /(.*?)\--.+/ ) { $win = $1; # get the winner

    And the line:
    $cat = join "", split /,/, $cat; # remove the comma in front
    Says "remove the comma in front" but it will remove ALL commas.

    A more efficient way to remove all commas is:

    $cat =~ tr/,//d; # remove all commas

    $win=join"",split /[\*,\"]/,$win;
    Again, a more efficient way to remove all '*', ',' and '"' characters is:

    $win =~ tr/*,"//d;

    $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
    #### you get "-- It Happened One Night {""Peter Warne""}",etc added to
    what you have
    }
    }
    print OUTPUTFILE $ln;
    close OUTPUTFILE;
    close READFILE;


    John
    --
    Any intelligent fool can make things bigger and
    more complex... It takes a touch of genius -
    and a lot of courage to move in the opposite
    direction. -- Albert Einstein
  • Timothy adigun at Aug 5, 2011 at 12:33 am
    Hi John,
    I believe you know that in Perl there are more than one way to do it! i.e
    solve a problem. And that one way is no better than the other, it only
    depend on what the programmer preferred to use, as long as the syntax are
    correct.
    Secondly, most of your why would have been answered, if only you check Ryan
    Lagola's request.
    If @ARGV contains more than one element then this will not work correctly.
    *Not true!* using $ARGV[0], select only one file to generate your report
    from. But using @ARGV, one has all the files listed. Moreover, how do you
    know how many files he/she intended using from the CLI at once?! So, for me
    it is saver to use @ARGV. Please, don't misunderstand this, there are
    several ways of doing things!

    "
    open READFILE,"<","$raw_file" or die "can't open $!";
    Why are you copying $raw_file to a string?


    open OUTPUTFILE,">","$filename" or die "cannot read $!";
    >
    Why are you copying $filename to a string?
    "
    I don't know what you mean by "copying both $raw_file and $filename into a
    string"! If you mean by using a double quote around $raw_file and $filename,
    then I should explain that that is called Interpolation in Perl! -- These
    two variables are scalar, so when a double quote they are interpolated i.e
    the value of the scalar (in this context) is inserted.
    Lastly, codes are written to be improved on. One of the reasons we have
    different books!
    Regards.
    On Thu, Aug 4, 2011 at 6:21 PM, John W. Krahn wrote:

    timothy adigun wrote:
    Hi Ryan,
    Try the the code below, it should help.

    ==========<CODE>=============

    #!/usr/bin/perl -w
    use strict;

    my $ln="";
    my ($yr,$cat,$win)=("","","");
    my $filename="New_output.txt";
    chomp(my $raw_file=<@ARGV>);
    That is the same as saying:

    chomp( my $raw_file = glob "@ARGV" );

    Why are you copying the contents of @ARGV to a string and then globbing
    that string?

    If @ARGV contains more than one element then this will not work correctly.

    And why chomp() a string that will not contain newlines?

    What you want is something like:

    my $raw_file = $ARGV[ 0 ];

    Or:

    my $raw_file = shift;

    But you should probably verify that @ARGV is not empty first.



    open READFILE,"<","$raw_file" or die "can't open $!";
    Why are you copying $raw_file to a string?



    open OUTPUTFILE,">","$filename" or die "cannot read $!";
    Why are you copying $filename to a string?



    while(<READFILE>){chomp;
    $ln.="\n" if /^\W.?+$/;
    if(/^\d{4}/){$yr=$&;} # get the year
    if(/^[A-Z].+/){ $cat=$&; # get the Category
    $cat=join"",split /,/,$cat; # remove the comma in front
    $ln.=" $yr: ".$cat; # add both the year and Category
    }
    if(/\--.+/){$win=$`; # get the winner
    The use of $&, $' and $` will slow down *ALL* regular expressions in the
    program. Better to just use capturing parentheses.

    if (/^(\d{4})/ ) { $yr = $1 } # get the year
    if ( /^([A-Z].+)/ ) {
    $cat = $1; # get the Category

    $cat = join "", split /,/, $cat; # remove the comma in front
    $ln.= " $yr: " . $cat; # add both the year and Category
    }
    if ( /(.*?)\--.+/ ) { $win = $1; # get the winner

    And the line:


    $cat = join "", split /,/, $cat; # remove the comma in front
    Says "remove the comma in front" but it will remove ALL commas.

    A more efficient way to remove all commas is:

    $cat =~ tr/,//d; # remove all commas


    $win=join"",split /[\*,\"]/,$win;
    Again, a more efficient way to remove all '*', ',' and '"' characters is:

    $win =~ tr/*,"//d;



    $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
    #### you get "-- It Happened One Night {""Peter Warne""}",etc added
    to
    what you have
    }
    }
    print OUTPUTFILE $ln;
    close OUTPUTFILE;
    close READFILE;


    John
    --
    Any intelligent fool can make things bigger and
    more complex... It takes a touch of genius -
    and a lot of courage to move in the opposite
    direction. -- Albert Einstein

    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    http://learn.perl.org/

  • Uri Guttman at Aug 5, 2011 at 1:14 am
    "ta" == timothy adigun writes:
    ta> I believe you know that in Perl there are more than one way to do
    ta> it! i.e solve a problem. And that one way is no better than the
    ta> other, it only depend on what the programmer preferred to use, as
    ta> long as the syntax are correct. Secondly, most of your why would
    ta> have been answered, if only you check Ryan Lagola's request.
    If @ARGV contains more than one element then this will not work correctly.
    ta> *Not true!* using $ARGV[0], select only one file to generate your
    ta> report from. But using @ARGV, one has all the files
    ta> listed. Moreover, how do you know how many files he/she intended
    ta> using from the CLI at once?! So, for me it is saver to use
    ta> @ARGV. Please, don't misunderstand this, there are several ways of
    ta> doing things!

    no, you had this code:

    chomp(my $raw_file=<@ARGV>);

    and that will not work if @ARGV has more than one filename. try it out
    and see.
    open READFILE,"<","$raw_file" or die "can't open $!";
    Why are you copying $raw_file to a string?
    ta> open OUTPUTFILE,">","$filename" or die "cannot read $!";
    >>
    Why are you copying $filename to a string?
    ta> I don't know what you mean by "copying both $raw_file and $filename into a
    ta> string"! If you mean by using a double quote around $raw_file and $filename,
    ta> then I should explain that that is called Interpolation in Perl! -- These
    ta> two variables are scalar, so when a double quote they are interpolated i.e
    ta> the value of the scalar (in this context) is inserted.

    you are telling someone who knows perl well about interpolation. but
    what you didn't get (and john didn't explain clearly enough it seems),
    is that quoting a scalar like that isn't needed and it makes an extra
    useless copy of the data. you can pass a scalar anywhere you want
    without quoting it. also in some cases like with objects, quoting it
    will actually be a bug.

    ta> Lastly, codes are written to be improved on. One of the reasons we have
    ta> different books!

    huh?? this has nothing to do with books. it is just poor coding and john
    was correcting you.

    also please learn to edit quoted posts as there is no reason to see all
    of the previous emails.

    uri

    --
    Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
    ------------ Perl Developer Recruiting and Placement Services -------------
    ----- Perl Code Review, Architecture, Development, Training, Support -------
  • Emeka at Aug 6, 2011 at 2:14 pm
    John,

    Thanks for making things pretty simple for mere mortals ..

    chomp( my $raw_file = glob "@ARGV" );
    I am of the view that glob sub is used for as tree (that is to get all the
    files in a folder and all its sub-folders. From the above, it seems like it
    could be used for something else... Someone should help me out here.

    Why are you copying the contents of @ARGV to a string and then globbing
    that string?

    If @ARGV contains more than one element then this will not work correctly.

    And why chomp() a string that will not contain newlines?

    What you want is something like:

    my $raw_file = $ARGV[ 0 ];



    while(<READFILE>){chomp;
    $ln.="\n" if /^\W.?+$/;
    if(/^\d{4}/){$yr=$&;} # get the year
    if(/^[A-Z].+/){ $cat=$&; # get the Category
    $cat=join"",split /,/,$cat; # remove the comma in front
    $ln.=" $yr: ".$cat; # add both the year and Category
    }
    if(/\--.+/){$win=$`; # get the winner
    The use of $&, $' and $` will slow down *ALL* regular expressions in the
    program. Better to just use capturing parentheses.

    if (/^(\d{4})/ ) { $yr = $1 } # get the year
    if ( /^([A-Z].+)/ ) {
    $cat = $1; # get the Category

    $cat = join "", split /,/, $cat; # remove the comma in front
    $ln.= " $yr: " . $cat; # add both the year and Category
    }
    if ( /(.*?)\--.+/ ) { $win = $1; # get the winner

    What is the idiomatic Perl , $1 or $[`, &,'] ? And what makes [$&, $', $`]
    to slow down *ALL* regular expressions in the program.


    --
    *Satajanus Nig. Ltd


    *
  • Rob Dixon at Aug 6, 2011 at 6:29 pm

    On 06/08/2011 15:14, Emeka wrote:
    John,

    Thanks for making things pretty simple for mere mortals ..
    Hi Emeka
    chomp( my $raw_file = glob "@ARGV" );
    I am of the view that glob sub is used for as tree (that is to get all the
    files in a folder and all its sub-folders. From the above, it seems like it
    could be used for something else... Someone should help me out here.
    my @file_list = glob "@ARGV";

    is a simple way of getting a list of all files that match the list of
    filename patterns passed on the command line. But in scalar context (as
    Timothy originally posted) it will fetch only the first file in that
    list, and it is wrong to chomp it as it is not terminated by an
    additional newline.

    You are mostly correct, except that glob will not search a directory
    tree of files. You can use wildcards, such as

    glob '~/*/*'

    which will list all files in any directory immediately within the home
    directory, but to search throughout a directory tree of arbitrary depth
    you need File::Find or something similar.

    That is all glob does. It is of no use in any other way. What can be
    confusing is that <*.pl> calls glob '*.pl' whereas <filehandle> calls
    readline filehandle.

    Take a look at

    perldoc -f glob

    and

    perldoc File::Glob

    (which is the module that implements the glob operator).
    The use of $&, $' and $` will slow down *ALL* regular expressions in the
    program. Better to just use capturing parentheses.

    if (/^(\d{4})/ ) { $yr = $1 } # get the year
    if ( /^([A-Z].+)/ ) {
    $cat = $1; # get the Category

    $cat = join "", split /,/, $cat; # remove the comma in front
    $ln.= " $yr: " . $cat; # add both the year and Category
    }
    if ( /(.*?)\--.+/ ) { $win = $1; # get the winner

    What is the idiomatic Perl , $1 or $[`,&,'] ? And what makes [$&, $', $`]
    to slow down *ALL* regular expressions in the program.
    perldoc prelre says this:
    WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in
    the program, it has to provide them for every pattern match. This may
    substantially slow your program. Perl uses the same mechanism to produce
    $1, $2, etc, so you also pay a price for each pattern that contains
    capturing parentheses.
    so no recent well-written code will use $& etc. although they are still
    available for backward compatability.

    Reading the same source again, it also says this
    So avoid $&, $', and $` if you can, but if you can't (and some
    algorithms really appreciate them), once you've used them once, use
    them at will, because you've already paid the price.
    so there is a niche for their continued use, but I have never come
    across anything that isn't better written using captures.

    HTH,

    Rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedAug 4, '11 at 1:38a
activeAug 6, '11 at 6:29p
posts8
users6
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase