Get all tags followings a certain with mechanize ? (ruby)

441 Views Asked by Matrix At 12 July 2017 at 10:51

How can I get all elements following once, like :

<div id="exemple">
  <h2 class="target">foo</h2>
  <p>bla bla</p>
  <ul>
    <li>bar1</li>
    <li>bar2</li>
    <li>bar3</li>
  </ul>
  <h4>baz</h4> 
  <ul>
     <li>lot</li>
  </ul>
  <div>of</div>
  <p>possible</p>
  <p>tags</p>
  <a href="#">after</a>
</div>

I need to detect <h2 class="target"> and get all tags to the next <h4> and ignore <h4> AND all followings tags (if <h4> not exist, I have to get all tags to the end of parent [here : end of <div>])

The content is dynamic and unpredictable The only rule is : we know there is a target and there is a (or end of element). I need to get all tags beetween both and exclud all others.

With this exemple I need to get the HTML following :

<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
  <li>bar1</li>
  <li>bar2</li>
  <li>bar3</li>
</ul>

so I can get : target = page.at('#exemple .target') I know next_sibling method, but how can i test the type of tag of the current node?

I think about something like that to course the node tree :

html = ''
while not target.is_a? 'h4'
  html << target.inner_html
  target = target.next_sibling

How can I do this?

Original Q&A

There are 2 best solutions below

pguardiario On 12 July 2017 at 23:59 BEST ANSWER

You can subtract the ones you don't want from your nodeset:

h2 = page.at('h2')
(h2.search('~ *') - h2.search('~ h4','~ h4 ~ *')).each do |el|
    # el is not a h4 and does not follow a h4
end

Maybe it makes more sense to use xpath but I can do this without googling.

Your idea of iterating next sibling can work too:

el = page.at('h2 ~ *')
while el && el.name != 'h4'
    # do something with el
    el = el.at('+ *')
end

Mark Thomas On 12 July 2017 at 11:44

Looks like you want to return the h2 element and its following siblings. I'm not clear on whether you want to keep or discard the h4; if you want to keep it the XPath would be:

//h2[@class="target"] | //h2[@class="target"]/following-sibling::*

If you need to exclude the h4:

//h2[@class="target"] | //h2[@class="target"]/following-sibling::*[not(self::h4)]

Edit: If you need to exclude an h4 and anything beyond:

//h2[@class="target"] | //h2[@class="target"]/following-sibling::*[not(self::h4) | not(preceding-sibling::h4)]

foo

foo

foo

Get all tags followings a certain with mechanize ? (ruby)

There are 2 best solutions below

Related Questions in RUBY

Related Questions in XPATH

Related Questions in CSS-SELECTORS

Related Questions in NOKOGIRI

Related Questions in MECHANIZE-RUBY

Trending Questions

Popular # Hahtags

Popular Questions