Ruby webscrape script for GoDaddy

643 Views Asked by At

I'm new to Ruby and for my first scripting assignment, I've been asked to write a web scraping script to grab elements of our DNS listings from GoDaddy.

Having issues with scraping the links and then I need to follow the links. I need to get the link from the "GoToSecondaryDNS" js element below. I'm using Mechanize and Nokogiri:

<td class="listCellBorder" align="left" style="width:170px;">
          <div style="padding-left:4px;">
            <div id="gvZones21divDynamicDNS"></div>
            <div id="gvZones21divMasterSlave" cicode="41022" onclick="GoToSecondaryDNS('iwanttoscrapethislink.com',0)" class="listFeatureButton secondaryDNSNoPremium" onmouseover="ShowSecondaryDNSAd(this, event);" onmouseout="HideAdInList(event);"></div>
            <div id="gvZones21divDNSSec" cicode="41023" class="listFeatureButton DNSSECButtonNoPremium" onmouseover="ShowDNSSecAd(this, event);" onmouseout="HideAdInList(event);" onclick="UpgradeLinkActionByID('gvZones21divDNSSec'); return false;" useClick="true" clickObj="aDNSSecUpgradeClicker"></div>
            <div id="gvZones21divVanityNS" onclick="GoToVanityNS('iwanttoscrapethislink.com',0)" class="listFeatureButton vanityNameserversNoPremium" onmouseover="ShowVanityNSAd(this, event);" onmouseout="HideAdInList(event);"></div>
            <div style="clear:both;"></div>
          </div>
        </td>

How can I scrape the link 'iwanttoscrapethislink.com' and then interact with the onclick to follow the link and scrape content on the following page with Ruby?

So far, I have a simple start to the code:

require 'rubygems'
require 'mechanize'
require 'open-uri'




def get_godaddy_data(url)


      web_agent = Mechanize.new

      result = nil

      ### login to GoDaddy admin


      page = web_agent.get('https://dns.godaddy.com/Default.aspx?sa=')

      ## there is only one form and it is the first form on thepage
      form = page.forms.first
      form.username = 'blank'
      form.password = 'blank'

      ## form.submit
      web_agent.submit(form, form.buttons.first)

     site_name = page.css('div.gvZones21divMasterSlave onclick td')  
      ### export dns zone data

      page = web_agent.get('https://dns.godaddy.com/ZoneFile.aspx?zone=' + site_name + '&zoneType=0&refer=dcc')
      form = page.forms[3]
      web_agent.submit(form, form.buttons.first).save(uri.host + 'scrape.txt')

       ## end

    end 

    ### read export file
    ##return File.open(uri.host + 'scrape.txt', 'rb') { |file| file.read }
  end


  def scrape_dns(url)

  site_name = page.css('div.gvZones21divMasterSlave onclick td') 
  LIST_URL = "https://dns.godaddy.com/ZoneFile.aspx?zone=" + site_name + '&zoneType=0&refer=dcc"
  page = Nokogiri::HTML(open(LIST_URL))

#not sure how to scrape onclick urls and then how to click through to continue scraping on the second page for each individual DNS

end
1

There are 1 best solutions below

1
On

You can't interact with "onclick" because Nokogiri isn't a JavaScript engine.

You can extract the contents and then use that as the URL for a subsequent web request. Assuming doc contains the parsed HTML:

doc.at('div[onclick^="GoToSecondaryDNS"]')['onclick']

will give you the value for the onclick parameter. ^= means "find the word starting with", so that lets us rule out other <div> tags with onclick parameters and returns:

"GoToSecondaryDNS('iwanttoscrapethislink.com',0)"

Using a simple regex [/'(.+)'/,1] will get you the hostname:

doc.at('div[onclick^="GoToSecondaryDNS"]')['onclick'][/'(.+)'/,1]
=> "iwanttoscrapethislink.com"

The rest, such as how to get access to Mechanize's internal Nokogiri document, and how to create the new URL, are left for you to figure out.