How to clean up RSS feed summary field?

92 Views Asked by At

I am trying to get an RSS feed into a pandas DataFrame. Other fields work nicely, but the summary field is still in an HTML format. My code is:

import feedparser
import pandas as pd
rss_feed = 'https://maavoimat.fi/ajankohtaista/ampuma-ja-melutiedotteet/-/announcements/rss'
feed = feedparser.parse(rss_feed)
posts = []
for post in feed.entries:
    posts.append((post.title, post.summary, post.published))
df = pd.DataFrame(posts, columns=['title', 'summary', 'published'])
df

How can I get it to show up nicely without the HTML markings?

1

There are 1 best solutions below

0
khushi singhania On

try this!

import feedparser

from bs4 import BeautifulSoup

//Parse the RSS feed

//Iterate over the entries

for entry in feed.entries:

   summary= entry.summary

   soup = BeautifulSoup(summary, 'html.parser')

   modified_text = soup.get_text()

   entry.summary = modified_text

//Continue your code