xPath to extract values from a specific table?

57 Views Asked by At

Some time ago I successfully made a database with number of cars sold in specific countries. The website I extracted data from has changed and data for multiple countries is now mixed on same page. Example:

Using xPath //tr[td='2020']/td[2] picks 2 values (Europe and China) in https://www.goodcarbadcar.net/skoda-octavia-sales-figure/ and https://www.goodcarbadcar.net/skoda-superb-sales-figure/

And 5 values (US, Canada, Europe, China) in https://www.goodcarbadcar.net/bmw-3-series-sales-figures/

Is there any solution to extract sales figures separately for each country? The expected output must have one row for each model and one column for each year/country (US 1990-2023 sales, Canada 1990-2023 sales, Europe 1990-2023 sales, China 1990-2023 sales). Note that title above tables are not consistent, for BMW 3-Series and Skoda Octavia is "Europe Annual Sales" while for Skoda Superb is "Yearly".

1

There are 1 best solutions below

8
Jack Fleeting On

If I understand you correctly, you can pick a specific value if you know how many values there are. For example, to pick the second value (in either page you linked) you can use:

(//div[contains(@class,"dataTables_wrapper")])[2]//td[.="2020"]/following-sibling::td

which will give you the 2020 China Annual Sales.

You can figure out how many values there are by using

count(//div[contains(@class,"dataTables_wrapper")])