Ruby open_uri always 404. (allow https redirects git version)

105 Views Asked by At

I'm using the open-uri module which allows https redirects.

What I'm trying to do is open every page from a domain. I do this by first crawling it through anemone:

require 'anemone'
require "./open_uri"

class Query
  def initialize()
    fs = File.read("file.json");
    string = JSON.parse(fs);
    string["items"].each do |item|
      Anemone.crawl("http://" + item["displayLink"] + "/") do |anemone|
        anemone.on_every_page do |page|
          #p page.url
          begin
            OpenURI.open_uri(page.url) do |f|
              f.each_line do |line|
                p line
              end
            end
          rescue                        
            p "404"
            next
          end
        end                 
      end
      p "---------------------------------------------------------"
    end
  end
end

qs = Query.new()

I'm trying to open it and then print every line to the console however it looks as if all is printed in my console is 404. Looking at my code this would mean that the open_uri fails to open any of links even though they are valid as far as I'm aware.

What am I missing here?

Also

rescue Exception=> e
 p e
end

prints out to the console the following:

#<OpenURI::HTTPError: 404 Not Found>
  • UPDATE

As advised in the comments I tried to curl the links that get 404 error and the console in the output does not return a 404 page. I tried about 40 of the returned links and none of them after being curl in the console return 404. Any ideas?

0

There are 0 best solutions below