I am looking for an advanced version of this.
Basically, if I have a file with text:
abc
ghi
fed
jkl
abc
ghi
fed
I want the output to be:(for n=3
)
Duplicated Lines
abc
ghi
fed
Times = 2
So, something like this (in perl):
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
my @order;
while ( my $line = <DATA> ) {
chomp ( $line );
push ( @order, $line ) unless $seen{$line}++;
}
foreach my $element ( @order ) {
print "$element, $seen{$element}\n" if $seen{$element} > 1;
}
__DATA__
abc
ghi
fed
jkl
abc
ghi
fed
This can turn into a shorter snippet by:
perl -e 'while ( <> ) { push ( @order, $_ ) unless $seen{$_}++; } for (@order) {print if $seen{$_} > 1}' myfile
One way is splitting your text based on your
n
then count the number of your elements that all is depending this counting you can use some data structures that use hash-table like dictionary in python that is much efficient for such tasks.The task is that you create a dictionary that keeps the keys unique and then loop over the list of splitted text and increase the count of each item every time you see a duplicate.
At last you'll have a dictionary contain the unique items with those count as the values of dictionary.
Some langs like python provides good tools like
Counter
for count the elements within an iterable andislice
for slicing and iterable that returns a generator and is very efficient for long iterables :Or you can do it custom :