blacklist href in python to remove junk sites

197 Views Asked by 00xZ At 05 April 2025 at 04:52

I want it to print every site that isnt blacklisted(how the code looks so far) but it doesnt work if you change the string in the last if statement from pass to print(site) then it prints everything in the black list, yet it wont print everything that isnt blacklisted which is my goal

import requests 
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re
import fnmatch
url = ("http://stackoverflow.com")
blacklist = ['*stackoverflow.com*', '*stackexchange.com*']
r = requests.get(url, timeout=6, verify=True)
soup = BeautifulSoup(r.content, 'html.parser')
for link in soup.select('a[href*="http"]'):
    site = (link.get('href'))
    site = str(site)
    for filtering in blacklist:
        if fnmatch.fnmatch(site, filtering):
            pass
        else:
            print(site)

Original Q&A

There are 1 best solutions below

darkArp On 04 October 2021 at 09:26

You want something like:

import requests
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re
import fnmatch
url = ("http://stackoverflow.com")
blacklist = ['*stackoverflow.com*', '*stackexchange.com*']
r = requests.get(url, timeout=6, verify=True)
soup = BeautifulSoup(r.content, 'html.parser')
for link in soup.select('a[href*="http"]'):
    site = (link.get('href'))
    site = str(site)
    if any([fnmatch.fnmatch(site, filtering) for filtering in blacklist]):
        continue
    print(site)

The issue happens here (old code):

for filtering in blacklist:
        if fnmatch.fnmatch(site, filtering):
            pass
        else:
            print(site)

While you're iterating here, if the website is blacklisted it will match one condition but not the other, so it will always be printed. There are multiple solutions, mine was to use any() to check if the result is True at least once and if it is, continue the loop and don't print :D

blacklist href in python to remove junk sites

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in LIST

Related Questions in PARSING

Related Questions in HREF

Related Questions in FNMATCH

Trending Questions

Popular # Hahtags

Popular Questions