Why isn't my Perl example of the frequency of occurrences useful?
I am very new to Perl and I am trying to write a word frequency counter as a learning exercise.
However, I cannot figure out the error in my code below after working on it. This is my code:
$wa = "A word frequency counter.";
@wordArray = split("",$wa);
$num = length($wa);
$word = "";
$flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank.
%wordCount = ("null" => 0);
if ($num == -1) {
print "There are no words.\n";
} else {
print "$length";
for $i (0 .. $num) {
if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word.
print "here";
$word = $wordArray[$i];
$flag = 0;
} elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word.
$word = $word . $wordArray[$i];
} elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word.
$word = $word . $wordArray[$i];
$flag = 1;
$wordCount{$word}++;
print "\nword: $word";
} elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks.
# do nothing.
}
}
for $i (keys %wordCount) {
print " \nword: $i - count: $wordCount{$i} ";
}
}
It is not a seal of "here", not a word. At the moment, I don't care about optimization, although any input in this direction would be appreciated as well.
source to share
Disable,
$wordArray[$i]!=' '
it should be
$wordArray[$i] ne ' '
according to Perl documentation for string and character comparison. In general use numerical operators ( ==
, >=
, & hellip;) for numbers and operators to the text string ( eq
, ne
, lt
, & hellip;).
Alternatively, you could do
@wordArray = split(" ",$wa);
instead
@wordArray = split("",$wa);
and then @wordArray
you wouldn't have to do an awkward character check and you never have a problem. @wordArray
will split into words already and you just need to count the occurrences.
source to share
You seem to be writing C in Perl. The difference is not only in style. As a result of exploding a string into an array of individual characters, you can also explode the memory area of your script.
Also, you need to think about what constitutes a word. Below I am not suggesting that either \w+
is a word, but rather indicating the difference between \S+
and \w+
.
#!/usr/bin/env perl
use strict; use warnings;
use YAML;
my $src = '$wa = "A word frequency counter.";';
print Dump count_words(\$src, 'w');
print Dump count_words(\$src, 'S');
sub count_words {
my $src = shift;
my $class = sprintf '\%s+', shift;
my %counts;
while ($$src =~ /(?<sequence> $class)/gx) {
$counts{ $+{sequence} } += 1;
}
return \%counts;
}
Output:
--- A: 1 counter: 1 frequency: 1 wa: 1 word: 1 --- '"A': 1 $ wa: 1 =: 1 counter. ";: 1 frequency: 1 word: 1
source to share