RVest too slow, trying to optimize for pulling stock indicators

496 Views Asked by At

In my first post, I would like to share my pet project. I am in the process of making a machine learning algorithm that can assign buy/sell/hold positions to securities. The first step of this project is to build the dataframe that contains the securities' basic information as well as relevant predictive indicators. I am using rvest to webscrape data from two different websites that give stock information. Below is my code:

#load all variables of interest
 for(i in 1:nrow(stockdata)){
  #price
  url <- paste0('https://www.nasdaq.com/symbol/',tolower(stockdata[,1][i]), 
 sep="") 
  html <- read_html(url)
  #Select the text I want
  Price <- html_nodes(html,'#qwidget_lastsale')
  stockdata$Price[i] <-  html_text(Price)

  #price change percentage
   url <- paste0('https://finviz.com/quote.ashx?t=',stockdata[,1][i], sep="") 
   html <- read_html(url)
   #Select the text I want
   change <- html_nodes(html,'.table-dark-row:nth-child(12) .snapshot-td2:nth- 
 child(12) b')
   stockdata$PriceChange[i] <-  html_text(change)

}

I have truncated the code, but the above works in pulling data. Unfortunately, the process is horrifically slow. I have many more variables to pull, and each one slows it down more and more. My knowledge of vectorization is decent for speeding up the process, but not sure how to apply it. Any tips on making this process faster in its execution or some knowledge on general speedier iteration tips would be greatly appreciated.

0

There are 0 best solutions below