I'm using a Rake task that runs multiple scraping scripts and exports category data for 35 different cities of a site to 35 different CSV files.
The problem I'm having is that when I run the master Rake task from the root directory of the folder, it creates a new file in the parent directory "resultsForCity.csv" instead of seeing the current CSV file within that given subfolder and adding the data to it. To get around it, I thought I should make my master Rake task (within the parent directory) run slave Rake tasks that then run the scraping scripts, but that didn't work either.
However, if I cd
into one of the city folders and run the scraper or Rake task from there, it adds the data to the corresponding CSV file located within that subfolder. Am I not clearly defining dependencies or something else?
Things I've tried:
- I've tried requiring each individual rakefile within my master rake task.
- Tried iterating over all files and loading the rake tasks and received a stack too deep error.
- Tried searching on Stackoverflow for 7 days now.
Here's my Rake task code:
require "rake"
task default: %w[getData]
task :getData do
Rake::FileList.new("**/*.rb*").each do |file|
ruby file
end
end
And here's my scraper code:
require "nokogiri"
require "open-uri"
require "csv"
url = "http:// example.com/atlanta"
doc = Nokogiri::HTML(open(url))
CSV.open("resultsForAtlanta.csv", "wb") do |csv|
doc.css(".tile-title").each do |item|
csv << [item.text.tr("[()]+0-9", ""), item.text.tr("^0-9$", "")]
end
doc.css(".tile-subcategory").each do |tile|
csv << [tile.text.tr("[()]+0-9", ""), tile.text.tr("^0-9$", "")]
end
end
Any help would be more than greatly appreciated.
What if you let your scraper script take an output filename and use the directory structure to help you build the output filenames.
Assuming you have a directory tree something like
where scraper.rb is your scraping script, you should be able to write the task somewhat like this:
and then your Ruby script could just grab the filename off the command line like this: