I want it to print every site that isnt blacklisted(how the code looks so far) but it doesnt work if you change the string in the last if statement from pass to print(site) then it prints everything in the black list, yet it wont print everything that isnt blacklisted which is my goal
import requests
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re
import fnmatch
url = ("http://stackoverflow.com")
blacklist = ['*stackoverflow.com*', '*stackexchange.com*']
r = requests.get(url, timeout=6, verify=True)
soup = BeautifulSoup(r.content, 'html.parser')
for link in soup.select('a[href*="http"]'):
site = (link.get('href'))
site = str(site)
for filtering in blacklist:
if fnmatch.fnmatch(site, filtering):
pass
else:
print(site)
You want something like:
The issue happens here (old code):
While you're iterating here, if the website is blacklisted it will match one condition but not the other, so it will always be printed. There are multiple solutions, mine was to use
any()
to check if the result is True at least once and if it is, continue the loop and don't print :D