Correctly sort RSS items by time

88 Views Asked by At

I'm getting RSS items from different RSS channels. And I'd like to sort them correctly by time and take into account the time zone, from the latests to the oldests. So far, I have the following code:

import feedparser
import dateutil.parser

rss_channels = [
    "https://www.novinky.cz/rss",
    "https://news.ycombinator.com/rss",
    "https://unix.stackexchange.com/feeds",
    "https://www.lupa.cz/rss/clanky/",
    "https://www.lupa.cz/rss/n/digizone/",
    "https://www.zive.cz/rss/sc-47/",
    "https://bitcoin.stackexchange.com/feeds",
    "https://vi.stackexchange.com/feeds",
    "https://askubuntu.com/feeds",
]

latest_items = []

for url in rss_channels:
    feed = feedparser.parse(url)
    for entry in feed.entries:
        pub_date_str = entry.published

        try:
            pub_date = dateutil.parser.parse(pub_date_str, ignoretz=True, fuzzy=True)
            if pub_date.tzinfo is None:
                pub_date = pub_date.replace(tzinfo=dateutil.tz.tzutc())
            latest_items.append((entry.title, pub_date, entry.link))
        except Exception as e:
            print(str(e))

latest_items.sort(key=lambda x: x[1], reverse=True)

for title, pub_date, url in latest_items:
    print(f"{pub_date.strftime('%Y-%m-%d %H:%M:%S %z')} - {title} - {url}")

I'm not sure if the code is correct. Could you assure me or refute and show me what's wrong? The code is very slow as well, so if it's possible to make faster, it would be great.

1

There are 1 best solutions below

0
xralf On

Finally, I used the following snippet.

try:
  pub_date = dateutil.parser.parse(entry.published).replace(tzinfo=None)
  pub_date = pytz.timezone('Europe/Prague').localize(pub_date)
  # ...
except Exception as e:
  print(str(e))