Bangla text word cloud

305 Views Asked by At

I wanted to take out the word cloud of Bengali text, But when it is being printed, the consonants of each word were being printed separately .

data = pd.read_csv('/content/gdrive/MyDrive/data.csv',encoding='UTF-8')
refined_sentence = " ".join(data)
regex = r"[\u0980-\u09FF]+"
wc = WordCloud(width=800, height=400, mode="RGBA",background_color=None, colormap="hsv", 
stopwords = stopwords, font_path="kalpurush.ttf", regexp=regex).generate(refined_sentence)
plt.figure(figsize=(7, 7))
plt.imshow(wc, interpolation='none')
plt.axis("off")
plt.show()
1

There are 1 best solutions below

0
On

I followed this comment and could solve the problem in Ubuntu eventually.

Step 1: !sudo apt-get install libfreetype6-dev libharfbuzz-dev libfribidi-dev gtk-doc-tools

Step 2: !wget -O raqm-0.7.0.tar.gz https://raw.githubusercontent.com/python-pillow/pillow-depends/master/raqm-0.7.0.tar.gz

Now the raqm-0.7.0.tar.gz file should be in your downloads section.

Step 3: !tar -xzvf raqm-0.7.0.tar.gz

Step 4: !cd raqm-0.7.0

Step 5: !./configure --prefix=/usr && make -j4 && sudo make -j4 install

Step 6: Now you just have to reinstall the Pillow library. Activate the correct environment. Then run the following commands:

python3 -m pip install --upgrade pip python3 -m pip install --upgrade Pillow

That's it! Now you have a working Pillow library that can produce proper Bengali and other Indic fonts in the image.