Webscraping Data Table from Sports Website using rvest

51 Views Asked by At

I am trying to webscrape the competition table from following page

https://www.nrl.com/ladder/?competition=111&round=27&season=2023

I have used the following but returns a NULL result

url <- paste0("https://www.nrl.com/ladder/?competition=111&round=27&season=2023")

page <- read_html(url) 
contentnodes <-page %>% html_nodes ("div.vue-ladder") %>% 
      html_attr("q-data") %>% jsonlite::fromJSON()

Could someone show me what I am missing please? Thanks in advance

3

There are 3 best solutions below

2
Dave2e On

It looks it is just the matter of selecting the correct div node.

Try this:

library(rvest)
url <- "https://nrl.com/ladder/competition=111&round=27&season=2023"
#read page
page <- read_html(url)

#obtain the body and convert from json
table_data <-page %>% html_elements("div.ladder-container")%>%
   html_attr("q-data") %>% 
   jsonlite::fromJSON()

table_data$position
2
Hoel On

With httr2

library(tidyverse)
library(httr2)

"https://www.nrl.com/ladder//data?competition=111&round=27&season=2023" %>%  
  request() %>%  
  req_perform() %>%  
  resp_body_json(simplifyVector = TRUE) %>%  
  pluck("positions")  %>%  
  as_tibble() %>% 
  unnest(everything())
0
Martin Morgan On

Using the CRAN rjsoncons package, I did

"https://www.nrl.com/ladder//data?competition=111&round=27&season=2023" |>  
  request() |>
  req_perform() |>  
  resp_body_string() |>
  j_pivot("positions", as = "tibble")

The string returned by resp_body_string() is a JSON description; I explored it with listviewer::jsonedit(), which opens a browser window with a widget for expanding / collapsing nodes. The "positions" argument to j_pivot() is actually a JMESPath expression, and these allow quite flexible querying and transformation of JSON.