I have a .csv file containing transaction IDs of nearly 1 million transactions associated with a bitcoin wallet (both sent and received transactions), which I read into R as a data table. Now I am trying to add another column to the table that lists the fees for each transaction. This can be done using an API call.
For example, to get the fee for the txid 73336c8b2f8bbf9c4165de515765463d6e835a9f3f87bf822d8bcb23c074ae7f, I have to open: https://blockchain.info/q/txfee/73336c8b2f8bbf9c4165de515765463d6e835a9f3f87bf822d8bcb23c074ae7f and read the data there directly.
What I have done: First I edited the .csv file using Excel to add a new column for the url for each row. Then wrote the following code in R:
for(i in 1:nrow(transactions))
transactions$fee[i] <- scan(transactions$url[i])
But this way it updates just 2-3 rows in 1 second. Since I am a novice, there must be much more efficient ways of doing the same thing.
We can do a lot better (~15x) than
scan()
by usingcurl::curl_fetch_memory
, e.g. with your URL:NB: I used
integer
since your particular URL fits, butas.numeric
may be more appropriate.That said, I still think hitting the web is the biggest bottleneck and you may find some payoff to trying to get a payload with >1 transaction at a time. If not, your biggest performance improvement will come from parallelizing.