finding linked files with HPricot

726 Views Asked by At

I've been playing around with HPricot, but after a fair amount of searching, I've not been able to work this out.

I'm trying to parse a HTML page and find all tags with a href to an mp3 file. So far I've got

<ul>
    <% @page.search('//a[@href*=mp3]').each do |link| %>    
        <li>
            <%= link.inner_text %>
        </li>
    <% end %>
</ul>

which is working fine, and a regex, /href\s*=\s*\"([^\"]+)(.mp3)/ which also works. I'm just not sure how to combine the two.

Is there a good example, or documentation that someone could point me to in order to work out what I can do with the .search function.

Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

found the answer. the method is attributes, (not attr) and also, the brackets need to be square. link.attributes['href']

1
On

You can access the attribute href with

link.attr('href')

As CSS3 selector you might want to consider @href$=.mp3 (instead of *=) as it matches only attributes which ends in .mp3.

Edit: You're right, sorry. I found out, that attr is only an alias for set for Hpricot::Elements. The right way is indeed:

link.attributes['href']

Nevertheless I would like to recommend Nokogiri as a faster substitute for Hpricot.