parsing an rss feed with this code
resp=requests.get(url)
soup = BeautifulSoup(resp.content, features="xml")
soup.prettify()
items = soup.findAll('item')
news_items = []
for item in items:
news_item={}
news_item['title']=item.title.text
news_item['description']=item.description.text
news_item['link']=item.link.text
news_item['pubDate']=item.pubDate.text
news_items.append(news_item)
in the description tage there is a div for the img src
<description>
<![CDATA[ <div><img src="https://library.sportingnews.com/styles/twitter_card_120x120/s3/2023-11/nba-plain--358f0d81-148e-4590-ba34-3164ea0c87eb.png?itok=fG5f5Dwa" style="width: 100%;" /><div>Now back from his foot injury and ready to continue his Golden Boot charge, Erling Haaland looks to return in full as Man City visit Brentford in a Monday Premier League matinee.</div></div> ]]>
</description>
is there anyway i can retrieve everything in the description tag except for the image div, thanks
You can modify your code to parse the HTML content inside the description tag and remove the
imgtag. Here's how you can do it using BeautifulSoup: