Let me post part of html I want to scrape first
<div id="hello">
<p>abc</p>
<center><img src="image_url"></center>
<p align="center" style="text-align: center;"><b>def</b></p>
<center><img src="image_url"></center>
<p align="center" style="text-align: center;"><b>def</b></p>
<p>abc</p>
<p align="center" style="text-align: center;"><b>def</b></p>
<center><img src="image_url"></center>
<p align="center" style="text-align: center;"><b>def</b></p>
<p>abc</p>
<center><img src="image_url"></center>
</div>
I am trying to scrape the text in p and src of image which is the image_url in order.
The thing is, the html I showed above is actually not static, all pages have different structure which means sometimes there'll be more p tags before having center tag which includes img src
Since the p and center tags are randomly structured in each pages, I was thinking of getting all the p tags for example using response.css('#hello p') then loop through all the p to get text but while getting the text from current p tag while looping, also check if next sibling has a center tag, if do then get the src append it.
I found something like that by doing p.xpath('following-sibling::center[1]/img/@src').get() as p is each paragraph duing the iteration.
But I figured, that does not work at all because let's say if I have 4 p tags until a center I will actually get 4 img src because that p.xpath('following-sibling::center[1]/img/@src').get() does not just find the next sibling but goes through all the siblings after and see if center tag is matched.
I tried googling but I do not see anything mentioning only check if next sibling is some tag. Anyone has any idea I can get it work so I can save the data in sequence?
Hopefully my explanation makes sense.
Thanks in advance for any help and suggestions
Try below XPath to get required output