FAQ
Dear all,

the script below count word occurences in input file. It uses simple
hash structure to store unique words and its frequencies.
--------------------
use strict;
my %words;
while (<>) {
chop;
foreach my $wd (split) {
$words{$wd}++;
}
}

foreach my $w (keys %words) {
print "$w|$words{$w}\n";
}
--------------------

In order to process large amounts of data (10.000.000 lines) and to
avoid memory problems I use DB_File module to store hash %words into
local file and than read data from it.

--------------------
use strict;
use DB_File;
tie my %words, 'DB_File', 'words.db';
while (<>) {
chop;
foreach my $wd (split) {
$words{$wd}++;
}
}

foreach my $w (keys %words) {
print "$w|$words{$w}\n";
}
untie(%words);
--------------------


Is that brainy solution in the sense of good programming practice...?

Thanks in advance for any opinion,
Andrej

Search Discussions

  • Peter Scott at Sep 25, 2006 at 2:53 pm

    On Sat, 23 Sep 2006 11:51:54 +0200, Andrej Kastrin wrote:
    the script below count word occurences in input file. It uses simple
    hash structure to store unique words and its frequencies. [...]
    foreach my $w (keys %words) {
    print "$w|$words{$w}\n";
    } [...]
    Is that brainy solution in the sense of good programming practice...?
    Good start, but you just shot yourself in the foot. Read 'perldoc -f tie'
    and pay especial attention starting at the second paragraph.
  • Andrej Kastrin at Sep 25, 2006 at 4:57 pm

    Peter Scott wrote:
    On Sat, 23 Sep 2006 11:51:54 +0200, Andrej Kastrin wrote:

    the script below count word occurences in input file. It uses simple
    hash structure to store unique words and its frequencies. [...]
    foreach my $w (keys %words) {
    print "$w|$words{$w}\n";
    } [...]
    Is that brainy solution in the sense of good programming practice...?
    Good start, but you just shot yourself in the foot. Read 'perldoc -f tie'
    and pay especial attention starting at the second paragraph.
    Peter, thanks for your response. I already implement 'each' function to
    iterate over the hash without building the entire list in memory.

    Cheers, Andrej

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedSep 23, '06 at 9:52a
activeSep 25, '06 at 4:57p
posts3
users2
websiteperl.org

2 users in discussion

Andrej Kastrin: 2 posts Peter Scott: 1 post

People

Translate

site design / logo © 2021 Grokbase