web crawler in rails,how to crawl all pages of the site

1.4k Views Asked by At

I need to get all urls from all pages of the given domain,
I think it make sense to use background jobs, placing them on multiple queues
trying to use cobweb but it seems very confusing gem,
and anomone, anemone is working for a long time if there are a lot of pages

require 'anemone'

Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_every_page do |page|
      puts page.links
  end
end

What do u think would fit me best?

1

There are 1 best solutions below

1
ajknzhol On

You can use Nutch Crawler, Apache Nutch is a highly extensible and scalable open source web crawler software project.