R: Scrape image urls from Google Image Search results page

Question

R: Scrape image urls from Google Image Search results page

58 Views Asked by nba2020 At 30 July 2023 at 20:10

I'm coding in R and building a web scraping script to programatically search on Google for product images and download them into a folder. I've got a for-loop where there is a step inside to get the image URLs from the Google Image result page

#Define the desired Google image search page
page <- read_html("https://www.google.com/search?q=Djeco%20DD04490%20image&tbm=isch&tbs=isz:lt,islt:0.5")
#Fetch the image urls programatically
image_urls <- page %>% html_nodes(".rg_i") %>% html_attr("data-src")
#Continue the rest flow and download the image jpg files from the image url list
...

However, the image_urls is always empty and can't proceed further. How may I resolve this and fetch the image urls from the example page?

Original Q&A

There are 1 best solutions below

**Allan Cameron** · Accepted Answer · 2023-07-30T21:07:46.473000

You can find all the links in the href attribute of the a tags within td tags. You can then use string parsing to get the urls:

library(rvest)
library(tidyverse)

image_urls <- "https://www.google.com/search?" %>%
  paste0("q=Djeco%20DD04490%20image&tbm=isch&tbs=isz:lt,islt:0.5") %>%
  read_html() %>%
  html_nodes(xpath = "//td/a") %>% 
  html_attr("href") %>%
  `[`(str_detect(., "/url\\?")) %>%
  strsplit("=|\\&") %>%
  sapply(`[`, 2)

Resulting in:

image_urls
#>  [1] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"                                       
#>  [2] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"                                       
#>  [3] "https://www.amazon.com.be/-/en/DD04490-DJECO-Cabin-Tent-Multicoloured/dp/B01DKANWME"                    
#>  [4] "https://www.amazon.com.be/-/en/DD04490-DJECO-Cabin-Tent-Multicoloured/dp/B01DKANWME"                    
#>  [5] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#>  [6] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#>  [7] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"                                       
#>  [8] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"                                       
#>  [9] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#> [10] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#> [11] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#> [12] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"                                      
#> [13] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [14] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [15] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [16] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [17] "https://angeloawards.com/item/G1974446"                                                                 
#> [18] "https://angeloawards.com/item/G1974446"                                                                 
#> [19] "https://www.mumzworld.com/ar/djeco-indoor-play-tent"                                                    
#> [20] "https://www.mumzworld.com/ar/djeco-indoor-play-tent"                                                    
#> [21] "https://www.amazon.co.jp/-/en/Oriental-DD04493-Indoor-Scandinavia-Stylish/dp/B085QKFB8L"                
#> [22] "https://www.amazon.co.jp/-/en/Oriental-DD04493-Indoor-Scandinavia-Stylish/dp/B085QKFB8L"                
#> [23] "https://undha.ac.id/ydzsqccfsw/vm-1975865.html"                                                         
#> [24] "https://undha.ac.id/ydzsqccfsw/vm-1975865.html"                                                         
#> [25] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [26] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"       
#> [27] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [28] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [29] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou-and-toy-box"                          
#> [30] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou-and-toy-box"                          
#> [31] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [32] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [33] "https://ssciindia.com/d/X1409951.html"                                                                  
#> [34] "https://ssciindia.com/d/X1409951.html"                                                                  
#> [35] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [36] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"                           
#> [37] "https://www.doudous-et-peluches.com/achat/djeco-cabane-multicolore_332216"                              
#> [38] "https://www.doudous-et-peluches.com/achat/djeco-cabane-multicolore_332216"                              
#> [39] "https://toybox.lt/zaidimu-nameliai-palapines/1036177-djeco-spalvota-palapine-dd04490-3070900044906.html"
#> [40] "https://toybox.lt/zaidimu-nameliai-palapines/1036177-djeco-spalvota-palapine-dd04490-3070900044906.html"

^{Created on 2023-07-30 with reprex v2.0.2}

R: Scrape image urls from Google Image Search results page

There are 1 best solutions below

Related Questions in R

Related Questions in RVEST

Related Questions in XML2

Trending Questions

Popular # Hahtags

Popular Questions