user agent and data scraping with R

91 Views Asked by At

I am working on a data visualization project and I need to scrape some data from a website.

When I tried I received the following error:

'Error in read_html.response(link): Forbidden (HTTP 403).'

From this, I understood that the website probably does not allow scraping. So I tried using a user agent with the following code:

user.agent<-"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

link<-GET("https://www.whosampled.com/Daft-Punk/Harder,-Better,-Faster,-Stronger/sampled/", user_agent(user.agent))
page<-read_html(link)

But still, the same error. Would anyone have any advice?

1

There are 1 best solutions below

1
HoelR On

Heres a start:

library(tidyverse)
library(rvest)

page <- "https://www.whosampled.com/Daft-Punk/Harder,-Better,-Faster,-Stronger/sampled/?cp=2" %>% 
  read_html() 

pages <- page %>% 
  html_elements(".page a") %>% 
  html_text2() %>% 
  last()

str_c("https://www.whosampled.com/Daft-Punk/Harder,-Better,-Faster,-Stronger/sampled/?cp=", 1:pages) %>% 
  map(read_html) %>% 
  map_dfr(~ html_elements(.x, ".table.tdata tbody tr") %>% 
            map_dfr(~ tibble(
              title = html_element(.x, ".trackName.playIcon") %>% 
                html_text2(),
              artist = html_element(.x, ".tdata__td3") %>% 
                html_text2(),
              year = html_element(.x, ".tdata__td3:nth-child(4)") %>% 
                html_text2(),
              genre = html_element(.x, ".tdata__badge") %>% 
                html_text2()
            )))

# A tibble: 77 × 4
   title                                                  artist             year  genre            
   <chr>                                                  <chr>              <chr> <chr>            
 1 Stronger                                               Kanye West         2007  Vocals / Lyrics  
 2 Boom Boom Pow                                          Black Eyed Peas    2009  Vocals / Lyrics  
 3 Overdose                                               EXO                2014  Vocals / Lyrics  
 4 Harder, Better, Faster, Stronger                       Bashy              2007  Multiple Elements
 5 Daft Punk Is Playing at My House (Soulwax Shibuya Mix) LCD Soundsystem    2004  Sound FX / Other 
 6 Face to Face / Short Circuit                           Daft Punk          2007  Multiple Elements
 7 Harder Better Faster Stronger (Deadmau5 Edit)          deadmau5           2007  Multiple Elements
 8 Work Is Never Over                                     Diplo              2007  Vocals / Lyrics  
 9 Let Me See You                                         Girl Talk          2008  Vocals / Lyrics  
10 Make It Faster                                         Cruz and the White 2004  Vocals / Lyrics  
# ℹ 67 more rows
# ℹ Use `print(n = ...)` to see more rows