How to test a webscraper in Ruby

60 Views Asked by At

I have an assignment that requires me to do two tests. I have this code that I want to unit-test using test-unit but I have no idea how to do it.

Is there a way I can check for the data being returned in a test environment?

def get_aspley_data
  url = "https://www.domain.com.au/rent/aspley-qld-4034/?price=0-900"

  unparsed_page = HTTParty.get(url)
  parsed_page    = Nokogiri::HTML(unparsed_page)
  house_listings_data = []
  house_listings = parsed_page.css('.listing-result__details')
  house_listings.each do |hl|
    prop_type      = hl.css('.listing-result__property-type')[0]
    price          = hl.css('.listing-result__price')[0]
    suburb_address = hl.css('span[itemprop=streetAddress]')[0]

    house_array = [house_listings]
    house_array.push("#{prop_type} #{price}")
    house_listings_data << [prop_type, price, suburb_address]
    puts [prop_type, price, suburb_address].to_csv(col_sep: "|")
  end
  File.open($aspley_file, "ab") do |f|
    data = house_listings_data.map{ |d| d.to_csv(col_sep: "|") }.join
    f.write(data)
  end
end
1

There are 1 best solutions below

0
Schwern On

Your function only requires a single piece of network information, unparsed_page = HTTParty.get(url). If we extract the parsing and saving of the page from fetching the page it all becomes testable by conventional means.

def parse_aspley_page(unparsed_page)
  parsed_page    = Nokogiri::HTML(unparsed_page)
  house_listings_data = []
  house_listings = parsed_page.css('.listing-result__details')
  house_listings.each do |hl|
    prop_type      = hl.css('.listing-result__property-type')[0]
    price          = hl.css('.listing-result__price')[0]
    suburb_address = hl.css('span[itemprop=streetAddress]')[0]

    house_array = [house_listings]
    house_array.push("#{prop_type} #{price}")
    house_listings_data << [prop_type, price, suburb_address]
    puts [prop_type, price, suburb_address].to_csv(col_sep: "|")
  end

  return house_listings_data
end

def save_house_listings(house_listings, file:)
  File.open(file, "ab") do |f|
    data = house_listings.map{ |d| d.to_csv(col_sep: "|") }.join
    f.write(data)
  end
end

def get_aspley_data(url, file: $aspley_file)
  save_house_listings(
    parse_aspley_page(HTTParty.get(url))
  )
end

Now parse_aspley_page and save_house_listings can be unit tested normally.

Note that I've changed save_house_listings to take the file to write to as a parameter. This will make it easier to test, you can tell it to write to a temp file, and more flexible in general.

get_aspley_data is now a thin wrapper around two unit tested functions which takes a URL. It only needs integration testing. It also takes the file to write to as an argument, defaulting to $aspley_file. To test it, mock HTTParty.get to return a page, and tell it to write to a temp file.

Alternatively, you can set up a small HTTP server for testing. But I find refactoring your code to be unit tested to result in simpler tests and a more flexible design.

Since this is an assignment, you should check with your teacher how they want you to solve it.