Handling exceptions from table extraction

53 Views Asked by Andy At 11 February 2018 at 15:38

I am extracting data from rows within a HTML table for multiple products, as some products may have certain rows and others will not I am catching potential index exceptions for each rowwithin the table like this.

try:
    category = response.css("table.specification td::text").extract()[0]
except IndexError:
    category = 'None'

try:
    function = response.css("table.specification td::text").extract()[1]
except IndexError:
    function = 'None'

try:
    weight = response.css("table.specification td::text").extract()[2]
except IndexError:
    weight = 'None'

Although this works perfectly for a small tables it results in alot of repetitive code when extracting large tables as i'm writing seperate try statements for each row.

The results are then output together for that product (to csv), before moving on to the next one.

yield {
    'Category': category,
    'Function': function,
    'Weight': weight,
}

Is my approach to extracting table data wrong? Or is there a better way to handle potential exceptions that I am missing.

Thanks, Andy

Original Q&A

There are 1 best solutions below

Tomáš Linhart On 11 February 2018 at 15:56

It would be best if you shared URL of the website/page you are scraping so we have a better notion. In particular it depends on what exactly is present on the page and when. If all category, function and weight are either present or not, you can do it (switching to XPath as it's more powerful) like this:

specification = response.xpath('//table[@class="specification"]//td/text()').extract()
if specification:
    category, function, weight = specification[:3]

Otherwise, you can try to get individual information defaulting to None if they are not present:

category = response.xpath('//table[@class="specification"]//td/text()[1]').extract_first()
function = response.xpath('//table[@class="specification"]//td/text()[2]').extract_first()
weight = response.xpath('//table[@class="specification"]//td/text()[3]').extract_first()

Handling exceptions from table extraction

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in CSV

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Trending Questions

Popular # Hahtags

Popular Questions