I am trying to develop a web scraping code. I need to automate it and run it in the Google Cloud daily. The daily web scrapped data have to be saved in a Google sheet. Following is the relevant part of the code that I have developed to save data in a CSV file and then upload it to an existing Google Sheet.
# Here is a sample data set
apt_link <- c('https://www.immobilienscout24.at/expose/6220b265d188d1cf74252fbb',
'https://www.immobilienscout24.at/expose/622f314859ff6df2ed86c2ee',
'https://www.immobilienscout24.at/expose/619ca702f1a2b400224637d4',
'https://www.immobilienscout24.at/expose/61cc1cf099a6ef002161f721',
'https://www.immobilienscout24.at/expose/606761cd2c34720022d4117f')
rooms <- c(4,5,2,4,3)
Surface <-c(87.09,104.00,44.90,138.00,146.00)
cost <- c(389000,497000,279000,1890000,1600000)
address <-c('1140 Wien','1210 Wien','1210 Wien','1180 Wien','1060 Wien')
# Creating a dataframe with web scrapped data
df_one <- cbind.data.frame(apt_link,rooms,surface,cost, address, Sys.time())
# Saving data as a CSV file in the computer
con <- file('Real_Estate_Wien_Data.csv',encoding="UTF-8")
data <- write.csv('Real_Estate_Wien_Data.csv', file=con, row.names = T)
# Write Google sheets
library(googlesheets4)
library(googledrive)
drive_auth()
# Link to the folder in my google drive
td <- drive_get("https://drive.google.com/drive/u/0/folders/1ZK6vUGXhRfzCPJ9I-gIrj3Xbzu72R1e3")
# Update
drive_put('Real_Estate_Wien_Data.csv', name = "Real_Estate_Wien_Data", type="spreadsheet", path=as_id(td)) # keeps id because of other links
The issue here is that now this code creates a CSV file on my computer. So that when I am going to automate it on the Google Cloud Platform, I think it's not possible to save the CSV file. There has to be another way to directly write the data to a Google Sheet.
Thank you in advance, and your suggestions are much appreciated.
I would recommend using Google Apps Script, as it is specifically built to interact with Sheets and other Google files. It seems to me that you would like to accomplish 3 different tasks, I've summarized them below:
getFolderById()
orgetFoldersByName()
, as well as fetching individual files with the same dynamic.Range.setValues()
function to write some data in the spreadsheet.