I'm trying to use Scrubyt to get the details from this page http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events. I've managed to get the titles and detail URLs from the list, but I can't use next_page to get the scraper to go to the next page. I assume that's cause I'm not using the correct pattern for the next page link. I tried the string "Next Page", and I've also tried the XPath. Any other ideas?
The code is below:
require 'rubygems'
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events'
event do
title 'The Coast of Mayo'
#url "href", :type => :attribute
link_url
end
next_page "Next Page", :limit => 2
end
nuffield_data.to_xml.write($stdout,1)
Try this with a slightly different URL:
scrubyt seems to be having issues with "?section=events" query on the end of the URL.
When it looks for the next page it is trying to return this URL:
http://www.nuffieldtheatre.co.uk/cn/events/?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events
instead of:
http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events
Removing the query string on the end of the URL seems to fix this - you might want to file this as a bug.