I'm using Anemone to spider a domain and it works fine.
the code to initiate the crawl looks like this:
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
This very nicely prints out all the page urls for the domain like so:
http://www.example.com/
http://www.example.com/about
http://www.example.com/articles
http://www.example.com/articles/article_01
http://www.example.com/contact
What I would like to do is create an array of key value pairs using the last part of the url for the key, and the url 'minus the domain' for the value.
E.g.
[
['','/'],
['about','/about'],
['articles','/articles'],
['article_01','/articles/article_01']
]
Apologies if this is rudimentary stuff but I'm a Ruby novice.
The simplest and possibly least robust way to do this would be to use
to obtain your 'key'. You would need to test various edge cases to ensure it worked reliably.
edit: this will return 'www.example.com' as the key for 'http://www.example.com/' which is not the result you require