I need to mirror recursively some site wallpaper images having a specific markup around, like:
<div class="wb_more">
Original Resolution: <a href="//site.com/download/space_planet_sky_94434/4800x2700">4800x2700</a><br>
Views: <a href="/download/last">96661</a>
</div>
but not others, like:
<div class="wd_resolution">
<span class="wd_res_cat">Fullscreen</span>
<span class="wd_res_cat_raz"><a class="wb_res_select" href="//site.com/download/space_planet_sky_94434/1600x1200">1600x1200</a>
...
</span>
...
</span>
</div>
Note, the URLs are the same, except for the resolutions, but the resolutions of the originals might vary, so only the markup around makes the difference, like preceeding the link with a text like Original Resolution:
.
Is there a solution for this using wget or httrack or some other tool?
Thank you.
You can try to use a normal
wget
and use regex on it (withsed
orperl
for example) And then download the link you obtain (wget can do it)A basic script will look like that
With the GetFlag.pl looks like
For example if your url are
<a href="url">New Wallpaper</a>
the regex will becare about
\w
it misses some character that can't be used in var name as-
Hope this is clear enough.