how to apply nltk.pos_tag() for ngrams

79 Views Asked by At

I need to use nltk.pos_tag() together with bigrams and here's my code:

from nltk.util import ngrams
from collections import Counter
bigrams = list(ngrams(all_file_data, 2))
print(bigrams[:50])
print(Counter(bigrams).most_common(30))

The output is:

[('SUBDELAGATION', 'ON'), ('ON', 'AGENDA'), ('AGENDA', 'ITEM'), ('ITEM', '3'), ...]

How can I get pos_tag along with the result of bigram frequencies like in the picture attached?

The result I need

1

There are 1 best solutions below

0
alvas On

Try this:

from nltk import pos_tag, word_tokenize

from nltk.util import ngrams
from collections import Counter

text = "hello world is a common sentence. A common sentence is foo bar. A foo bar is a common ice cream."
tagged_texts = pos_tag(word_tokenize(text))

counter = Counter(ngrams(tagged_texts, 2))

counter.most_common(3)

[out]:

[((('is', 'VBZ'), ('a', 'DT')), 2),
 ((('a', 'DT'), ('common', 'JJ')), 2),
 ((('common', 'JJ'), ('sentence', 'NN')), 2),
 ((('.', '.'), ('A', 'DT')), 2),
 ((('foo', 'JJ'), ('bar', 'NN')), 2),
 ((('hello', 'JJ'), ('world', 'NN')), 1),
 ((('world', 'NN'), ('is', 'VBZ')), 1),
 ((('sentence', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('common', 'JJ')), 1),
 ((('sentence', 'NN'), ('is', 'VBZ')), 1),
 ((('is', 'VBZ'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('is', 'VBZ')), 1),
 ((('common', 'JJ'), ('ice', 'NN')), 1),
 ((('ice', 'NN'), ('cream', 'NN')), 1),
 ((('cream', 'NN'), ('.', '.')), 1)]