The script downloads historic stock prices from finance.yahoo.com. An array of tickers is used to loops through the script, creats li´nks based on the ticker array and downloads the data associated to each ticker. However, some of the ticker symbols are not up to date anymore and as a result yahoo delivers a 404 page instead of a csv containing price information. The errorpage is then instead stored in a csv and saved to my computer. To not download these files I am looking for the string 'Sorry, the page you requested was not found.', which is contained within each of yahoos error sites as an indicator for a 404 page.
Behaviour of the code (output, see below code):
The code runs through all tickers and downloads all stock price .csv's. This works fine for all ticker, but some ticker symbols are not used anymore by yahoo. In the case of a ticker symbol that is not used anymore the program downloads a .csv containing yahoos 404 page. All files (also the good ones containing actual data) are downloaded in the directory c:\Users\W7ADM\stock-price-leecher\data2.
Problem:
I would like for the code to not download the 404 page into a csv file, but just do nothing in this case and move on to the next ticker symbol in the loop. I am trying to achive this with the if-condition that looks for the String "Sorry, the page you requested was not found." that is diplayed on yahoos 404-pages. In the end I hoope to download all csv's for tickers that actually exists and save them to my hdd.
var url_begin = 'http://real-chart.finance.yahoo.com/table.csv?s=';
var url_end = '&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv';
var tickers = [];
var link_created = '';
var casper = require('casper').create({
pageSettings: {
webSecurityEnabled: false
}
});
casper.start('http://www.google.de', function() {
tickers = ['ADS.DE', '0AM.DE']; //ADS.DE is retrievable, 0AM.DE is not
//loop through all ticker symbols
for (var i in tickers){
//create a link with the current ticker
link_created=url_begin + tickers[i] + url_end;
//check to see, if the created link returns a 404 page
this.open(link_created);
var content = this.getHTML();
//If is is a 404 page, jump to the next iteration of the for loop
if (content.indexOf('Sorry, the page you requested was not found.')>-1){
console.log('No Page found.');
continue; //At this point I want to jump to the next iteration of the loop.
}
//Otherwise download file to local hdd
else {
console.log(link_created);
this.download(link_created, 'stock-price-leecher\\data2\\'+tickers[i]+'.csv');
}
}
});
casper.run(function() {
this.echo('Ende...').exit();
});
The Output:
C:\Users\Win7ADM>casperjs spl_old.js
ADS.DE,0AM.DE
http://real-chart.finance.yahoo.com/table.csv?s=ADS.DE&a=00&b=1&c=1950&d=11&e=31
&f=2050&g=d&ignore=.csv
http://real-chart.finance.yahoo.com/table.csv?s=0AM.DE&a=00&b=1&c=1950&d=11&e=31
&f=2050&g=d&ignore=.csv
Ende...
C:\Users\Win7ADM>
casper.open
is asynchronous (non-blocking), but you use it in a blocking fashion. You should usecasper.thenOpen
which has a callback which is called when the page is loaded and you can do stuff with it.Instead of using the
thenOpen
callback, you can also register to thepage.resource.received
event and download it specifically by checking the status. But now you wouldn't have access toticker
so you either have to store it in a global variable or parse it fromresource.url
.I don't think you should do this with
open
orthenOpen
. It may work on PhantomJS, but probably not on SlimerJS.I actually tried it and your page is strange in that the download doesn't succeed. You can load some dummy page like example.com, download the csv files yourself using
__utils__.sendAJAX
(it is only accessible from the page context) and write them using the fs module. You should only write it based in the specific 404 error page text that you identified: