How do I run multiple rake tasks at once?

873 Views Asked by At

I'm using a Rake task that runs multiple scraping scripts and exports category data for 35 different cities of a site to 35 different CSV files.

The problem I'm having is that when I run the master Rake task from the root directory of the folder, it creates a new file in the parent directory "resultsForCity.csv" instead of seeing the current CSV file within that given subfolder and adding the data to it. To get around it, I thought I should make my master Rake task (within the parent directory) run slave Rake tasks that then run the scraping scripts, but that didn't work either.

However, if I cd into one of the city folders and run the scraper or Rake task from there, it adds the data to the corresponding CSV file located within that subfolder. Am I not clearly defining dependencies or something else?

Things I've tried:

  • I've tried requiring each individual rakefile within my master rake task.
  • Tried iterating over all files and loading the rake tasks and received a stack too deep error.
  • Tried searching on Stackoverflow for 7 days now.

Here's my Rake task code:

require "rake"

task default: %w[getData]


task :getData do 

        Rake::FileList.new("**/*.rb*").each do |file| 

         ruby file 

        end 

end 

And here's my scraper code:

require "nokogiri"

require "open-uri"

require "csv"

url = "http:// example.com/atlanta"

doc = Nokogiri::HTML(open(url))


CSV.open("resultsForAtlanta.csv", "wb") do |csv|

    doc.css(".tile-title").each do |item|
        csv << [item.text.tr("[()]+0-9", ""), item.text.tr("^0-9$", "")] 
    end 

    doc.css(".tile-subcategory").each do |tile|
        csv << [tile.text.tr("[()]+0-9", ""), tile.text.tr("^0-9$", "")]
    end 



end 

Any help would be more than greatly appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

What if you let your scraper script take an output filename and use the directory structure to help you build the output filenames.

Assuming you have a directory tree something like

Atlanta/scraper.rb
LosAngeles/scraper.rb
...

where scraper.rb is your scraping script, you should be able to write the task somewhat like this:

task :getData do
  Rake::FileList.new("**/scraper.rb").each do |scraper_script|
    dir = File.dirname(file)
    city = File.basename(dir)
    csv_file = File.join(dir, "resultsFor#{city}.csv")
    ruby [scraper_script, csv_file].join(" ")
  end
end 

and then your Ruby script could just grab the filename off the command line like this:

CSV.open(ARGV[1], "wb") do |csv|
   ...
end