So I am trying to write a Perl script which will take in 3 arguments.
- First argument is the input file or directory.
- If it is a file, it will count number of occurrences of all words
- If it is a directory, it will recursively go through each directory and get all the number of occurrences for all words in the files within those directories
- Second argument is a number that will be how many of the words to display with the highest number of occurrences.
- This will print to the console only the number for each word
- Print them to an output file which is the third argument in the command line.
It seems to be working as far as recursively searching through directories and finding all occurrences of the words in a file and prints them to the console.
How can I print these to an output file and also, how would I take the second argument, which is the number, say 5, and have it print to the console the number of words with the most occurrences while printing the words to the output file?
The following is what I have so far:
#!/usr/bin/perl -w
use strict;
search(shift);
my $input = $ARGV[0];
my $output = $ARGV[1];
my %count;
my $file = shift or die "ERROR: $0 FILE\n";
open my $filename, '<', $file or die "ERROR: Could not open file!";
if ( -f $filename ) {
print("This is a file!\n");
while ( my $line = <$filename> ) {
chomp $line;
foreach my $str ( $line =~ /\w+/g ) {
$count{$str}++;
}
}
foreach my $str ( sort keys %count ) {
printf "%-20s %s\n", $str, $count{$str};
}
}
close($filename);
if ( -d $input ) {
sub search {
my $path = shift;
my @dirs = glob("$path/*");
foreach my $filename (@dirs) {
if ( -f $filename ) {
open( FILE, $filename ) or die "ERROR: Can't open file";
while ( my $line = <FILE> ) {
chomp $line;
foreach my $str ( $line =~ /\w+/g ) {
$count{$str}++;
}
}
foreach my $str ( sort keys %count ) {
printf "%-20s %s\n", $str, $count{$str};
}
}
# Recursive search
elsif ( -d $filename ) {
search($filename);
}
}
}
}
I have figured it out. The following is my solution. I'm not sure if it's the best way to do it, but it works.