Strange beautifulsoup nonetype error

Question

Strange beautifulsoup nonetype error

651 Views Asked by Ben At 29 November 2013 at 21:41

I made a well-functioning scrubber to get all the classes from my university (to filter them later), but it sometimes suddenly gives strange errors like `AttributeError: 'NoneType' object has no attribute 'findAll'. If I move on to another long page, it will give me a similar error.

My code:

from bs4 import BeautifulSoup
import urllib2
import datetime
import httplib
from math import floor
from random import randrange
import cPickle as pickle
[...irrelevant code...]
urls = ["http://locus.vub.ac.be/reporting/spreadsheet?identifier=DA&submit=toon%20de%20gegevens%20-%20show%20the%20teaching%20activities&idtype=name&template=Mod%2bSS&objectclass=module%2bgroup", "http://locus.vub.ac.be/reporting/spreadsheet?identifier=AL+tot+AP&submit=toon+de+gegevens+-+show+the+teaching+activities&idtype=name&template=Mod%2BSS&objectclass=module%2Bgroup"]
for url in urls:
    url = urllib2.urlopen(url).read()
    soup = BeautifulSoup(url)
    begins = soup.findAll("span", {"class" : "label-1-0-0"})
    for begin in begins:
        table = begin.findNext("table", {"class" : "spreadsheet"})
        #if table is not None:
        gegevens = table.findAll("tr")
        for i in range (1, len(gegevens)):
            naam = gegevens[i].td
            dag = naam.find_next_sibling("td")
            beginuur = dag.find_next_sibling("td")
            einduur = beginuur.find_next_sibling("td")
            duur = einduur.find_next_sibling("td")
            weken = duur.find_next_sibling("td")
            titularis = weken.find_next_sibling("td")
            lokaal = titularis.find_next_sibling("td")
            print naam.text + " " + dag.text + " " + beginuur.text + " " + einduur.text + " " + weken.text + " " + titularis.text + " " + lokaal.text

My output for link 1:

[...]
Discrete wiskunde (HOC) ma 18:00 21:00 4, 8, 11, 13 CARA PHILIPPE F.4.111
Discrete wiskunde (WPO2) ma 13:00 15:00 3-6, 8, 10-12, 14 Deneckere Tom E.0.12
Discrete wiskunde (HOC) wo 9:00 11:00 2-3, 6, 8-9, 11-14 CARA PHILIPPE E.0.07
Traceback (most recent call last):
  File "Untitled 7.py", line 24, in <module>
    titularis = weken.find_next_sibling("td")
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

My output for link 2:

[...]
Algemeen boekhouden - WPO - TEW - groep 5 (E-M) ma 9:00 11:00 5-6 VANDENHAUTE Marie-Laure D.3.04
Algemeen boekhouden - WPO - HI - groep 1 (A-D) di 14:00 16:00 3-14 VANDENHAUTE Marie-Laure D.2.09
Algemeen boekhouden - WPO - HI - groep 3 (Q-Z) ma 9:00 11:00 3-8, 10-14 CEUSTERMANS Stefanie D.2.10
Algemeen boekhouden - WPO - HI - groep 2 (E-P) di 9:00 11:00 3-8, 10-11, 13-14 VANDENHAUTE Marie-Laure D.3.05
Approaches to language teaching & learning for multilingual education HOC- wo 10:00 12:00 2-9, 11-14 VAN DE CRAEN PIERRE E.3.05
Traceback (most recent call last):
  File "Untitled 7.py", line 16, in <module>
    gegevens = table.findAll("tr")
AttributeError: 'NoneType' object has no attribute 'findAll'

EDIT: replacing soup = BeautifulSoup(url) with soup = BeautifulSoup(url, "xml") (and importing the lxml library) resolved the issue. I have no idea why though...

Original Q&A

There are 1 best solutions below

**Jawwad Zakaria** · Answer 1 · 2013-11-29T21:51:24.027000

Jawwad Zakaria On 29 November 2013 at 21:51

Seems like an error from urllib2.urlopen. You should make sure you can get the page you are trying to get on your server, or handle exceptions properly.

Strange beautifulsoup nonetype error

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MEMORY

Related Questions in BEAUTIFULSOUP

Related Questions in NONETYPE

Trending Questions

Popular # Hahtags

Popular Questions