Here is what I'm doing
import requests
from requests.adapters import HTTPAdapter
from bs4 import BeautifulSoup
HEADERS = {
'authority': 'www.noon.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'dnt': '1',
'upgrade-insecure-requests': '1',
'accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-dest': 'document'
}
response = requests.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905',headers=HEADERS,stream=True)
soup = BeautifulSoup(response.content,'lxml')
results = soup.find_all("div", {"class" : "productContainer"})
result = results[0]
print("https://www.noon.com" + result.a.get('href'))
Output
https://www.noon.com/uae-en
But the expected output should be 'https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f'
As here you can see from the browser
<div class="productContainer"><a class="sc-7vj7do-0 ftlAjW" href="/uae-en/product/N35521717A/p?o=f885efe0b6534e9f" id="productBox-N35521717A"><div class="kcs0h5-0 diNcmV grid" title="Samsung Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="e3js0d-1 efqIDW"><div class="productImage" data-qa-id="productImagePLP_Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="lazyload-wrapper"><div class="puv25r-0 hfEfTS"><div class="puv25r-2 hJKuPa"><img alt="Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE " src="https://a.nooncdn.com/t_desktop-pdp-v1/v1605814225/N35521717A_1.jpg"/></div></div></div></div><div class="e3js0d-2 dqjnoR"><div class="tagContainer"></div></div></div><div class="e3js0d-6 iKEZJh"><div class="e3js0d-7 jULUCI"><div class="e3js0d-10 cyUANN"><span class="e3js0d-11 gXshOX">Samsung</span>Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE </div></div><div class="e3js0d-8 jtiosv"><div class="sc-3751lm-0 hSumnU"><div class="sc-3751lm-1 eUJkVt large"><span class="currency">AED</span><strong>819.00</strong></div><div class="sc-3751lm-2 kWnsOk"><span class="oldPrice">AED<!-- --> <!-- -->859</span></div></div></div><div class="e3js0d-9 kDpjlW"><div class="e3js0d-12 gMFqig"><div class="u8zs36-0 kRPdZJ"><img alt="noon-express" height="20px" src="https://a.nooncdn.com/s/app/com/noon/images/fulfilment_express-en.png" width="80px"/></div></div></div></div></div></a></div>
What happens and steps to reproduce
Website seems to deal with dynamically generated content.
Open the website in browser
Open source code
ctrl + u
search forclass="productContainer"
and you will see thehref
of<a>
only contains/uae-en
-> That is what you get by usingrequests
Open inspector
ctrl+shift+i
and inspect your<a>
and you will find the dynamically added part, what you get if you use selenium.Minimal example
Output
EDIT
You wont get the information with
requests
by scraping the source, but there is an alternativ way.You could use the api with
requests
and build the link (simple example you can customize):Output