Extracting Numbers From a Table on a Website

Question

Extracting Numbers From a Table on a Website

1k Views Asked by Joe_OU_Wx At 06 June 2025 at 21:50

I am trying to extract data from a website for personal use. I only want the precipitation at the top of the hour. I am nearly complete but I cannot sum the data up. I think its because its returning null values, and/or because the data are not all integers? Maybe using a for loop is incorrect?

Here is the code:

import urllib2
from bs4 import BeautifulSoup
import re

url = 'http://www.saiawos2.com/K61/15MinuteReport.php'
page = urllib2.urlopen(url) 
soup  = BeautifulSoup(page.read())

table = soup.findAll('table')[0]
rows = table.findAll('tr')

second_columns = []
thirteen_columns = []

for row in rows[1:]:
    second_columns.append(row.findAll('td')[1]) #Column with times
    thirteen_columns.append(row.findAll('td')[12]) #Precipitation Column

for second, thirteen in zip(second_columns, thirteen_columns):
    times = ['12:00','11:00','10:00','09:00','08:00','07:00','06:00',
         '05:00','04:00','03:00','02:00','01:00','00:00','23:00',
         '22:00','21:00','20:00','19:00','18:00','17:00','16:00',
         '15:00','14:00','13:00',]
    time = '|'.join(times) 
    if re.search(time, second.text):
        pcpn = re.sub('[^0-9]', '', thirteen.text) #Get rid of text
        print sum(pcpn[1:]) #Print sum and get rid of leading zero

Perhaps there is an easy way to do this, but this is what I have so far. When I sum(pcpn) it gives the following error for the line with the print statement:

TypeError: unsupported operand type(s) for +: 'int' and 'unicode'

Original Q&A

There are 1 best solutions below

**nu11p01n73R** · Accepted Answer

The problem is sum tries to find the sum of list of integers where as you have passed a list of unicode characters which it cannot sum.

All you need to do is to map each element of the list to int and pass it to sum.

if re.search(time, second.text):
        pcpn = re.findall(r'[0-9.]+', thirteen.text) 
        print sum( float(x) for x in pcpn )

What it does?

re.findall(r'[0-9.]+', thirteen.text) rather than using the re.sub function we use re.findall() which will give you a list of matches, which can then be passed to the sum() function. Here the match is digits.
sum( float(x) for x in pcpn ) Maps each element to float and find the sum.
- ( float(x) for x in pcpn ) is a generator statement which creates elements on the go.

Extracting Numbers From a Table on a Website

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Trending Questions

Popular # Hahtags

Popular Questions