I am extracting data from rows within a HTML table for multiple products, as some products may have certain rows and others will not I am catching potential index exceptions for each rowwithin the table like this.
try:
category = response.css("table.specification td::text").extract()[0]
except IndexError:
category = 'None'
try:
function = response.css("table.specification td::text").extract()[1]
except IndexError:
function = 'None'
try:
weight = response.css("table.specification td::text").extract()[2]
except IndexError:
weight = 'None'
Although this works perfectly for a small tables it results in alot of repetitive code when extracting large tables as i'm writing seperate try statements for each row.
The results are then output together for that product (to csv), before moving on to the next one.
yield {
'Category': category,
'Function': function,
'Weight': weight,
}
Is my approach to extracting table data wrong? Or is there a better way to handle potential exceptions that I am missing.
Thanks, Andy
It would be best if you shared URL of the website/page you are scraping so we have a better notion. In particular it depends on what exactly is present on the page and when. If all
category,functionandweightare either present or not, you can do it (switching to XPath as it's more powerful) like this:Otherwise, you can try to get individual information defaulting to
Noneif they are not present: