Handling exceptions from table extraction

51 Views Asked by At

I am extracting data from rows within a HTML table for multiple products, as some products may have certain rows and others will not I am catching potential index exceptions for each rowwithin the table like this.

try:
    category = response.css("table.specification td::text").extract()[0]
except IndexError:
    category = 'None'

try:
    function = response.css("table.specification td::text").extract()[1]
except IndexError:
    function = 'None'

try:
    weight = response.css("table.specification td::text").extract()[2]
except IndexError:
    weight = 'None'

Although this works perfectly for a small tables it results in alot of repetitive code when extracting large tables as i'm writing seperate try statements for each row.

The results are then output together for that product (to csv), before moving on to the next one.

yield {
    'Category': category,
    'Function': function,
    'Weight': weight,
}

Is my approach to extracting table data wrong? Or is there a better way to handle potential exceptions that I am missing.

Thanks, Andy

1

There are 1 best solutions below

2
On

It would be best if you shared URL of the website/page you are scraping so we have a better notion. In particular it depends on what exactly is present on the page and when. If all category, function and weight are either present or not, you can do it (switching to XPath as it's more powerful) like this:

specification = response.xpath('//table[@class="specification"]//td/text()').extract()
if specification:
    category, function, weight = specification[:3]

Otherwise, you can try to get individual information defaulting to None if they are not present:

category = response.xpath('//table[@class="specification"]//td/text()[1]').extract_first()
function = response.xpath('//table[@class="specification"]//td/text()[2]').extract_first()
weight = response.xpath('//table[@class="specification"]//td/text()[3]').extract_first()