How to retrieve a multiple tables from a webpage using R

Question

How to retrieve a multiple tables from a webpage using R

234 Views Asked by user432797 At 27 June 2025 at 22:32

I want to extract all vaccine tables with the description on the left and their description inside the table using R,

this is the link for the webpage

this is how the first table look on the webpage:

I tried using XML package, but I wasn't succeful, I used:

vup<-readHTMLTable("https://milken-institute-covid-19-tracker.webflow.io/#vaccines_intro", which=5)

I get an error:


Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: ''

How to do this?

Original Q&A

There are 1 best solutions below

**Dave2e** · Accepted Answer

This webpage does not use a tables thus the reason for your error. Due to the multiple subsections and hidden text, the formatting on the page is quite complicated and requires finding the nodes of interest individually.

I prefer using the "rvest" and "xml2" package for the easier and more straight forward syntax.
This is not a complete solution and should get you moving in the correct direction.

library(rvest)
library(dplyr)

#find the top of the vacine section
parentvaccine <- page %>% html_node(xpath="//div[@id='vaccines_intro']") %>% xml_parent()

#find the vacine rows
vaccines <- parentvaccine %>% html_nodes(xpath = ".//div[@class='chart_row for_vaccines']")

#find info on each one
company <- vaccines %>% html_node(xpath = ".//div[@class='is_h5-2 is_developer w-richtext']") %>% html_text()
product <- vaccines %>% html_node(xpath = ".//div[@class='is_h5-2 is_vaccines w-richtext']") %>% html_text()
phase <- vaccines %>% html_node(xpath = ".//div[@class='is_h5-2 is_stage']") %>% html_text()
misc <- vaccines %>% html_node(xpath = ".//div[@class='chart_row-expanded for_vaccines']") %>% html_text()


#determine vacine type
#Get vacine type
vaccinetypes <- parentvaccine %>% html_nodes(xpath = './/div[@class="chart-section for_vaccines"]') %>% 
   html_node('div.is_h3') %>% html_text()
#dtermine the number of vacines in each category
lengthvector <-parentvaccine %>% html_nodes(xpath = './/div[@role="list"]') %>% xml_length() %>% sum()
#make vector of correct length
VaccineType <- rep(vaccinetypes, each=lengthvector)

answer <- data.frame(VaccineType,  company, product, phase)
head(answer)

To generate this code, involved reading the html code and identifying the correct nodes and the unique attributes for the desired information.

How to retrieve a multiple tables from a webpage using R

There are 1 best solutions below

Related Questions in HTML

Related Questions in R

Related Questions in RVEST

Related Questions in WEBFLOW

Trending Questions

Popular # Hahtags

Popular Questions