I am trying to download a table from html which is not in the usual td/ tr format and includes images and save the result to excel.
The html code looks like this:
<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')"><div class="improve_listing_btn ui_button primary small">improve this entry</div></a>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>
I would like to get the table: [Location 45 out of fifty points, Service 45 out of fifty points]. And save the result to an Excel file. The column-header in the Excel file should state "Location" and the cell below "45" or "45 out of fifty points". The next column-header should state "Service" and the cell in the row below should state "45" or "45 out of fifty points". I manage to save the name and the rating for Location but the cell for the Service rating remains empty.
My python code looks like this:
workbook = xlsxwriter.Workbook('File.xlsx')
worksheet = workbook.add_worksheet()
row=1
col=0
for url in urls:
r=requests.get(url)
soup=BeautifulSoup(r.content, "lxml")
worksheet.write('A1', 'name')
worksheet.write('B1', 'location')
worksheet.write('C1', 'service')
row += 1
name= soup.find_all("div", {"class": "LocationName"})
for item in name:
worksheet.write_string(row, col, item.text)
for div in soup.find_all('div', class_="ratingRow wrap"):
text = div.text.strip()
alt = div.find('img').get('alt')
print(text, alt)
worksheet.write_string(row, col+1, alt)
The print function gives out
Location 45 out of fifty points
Service 45 out of fifty points
The console prints all the results for Location and Service but in the Excel sheet only the rating for location appears while the cell for the service rating remains empty. I tried the enumerate function, but then there is just one character of the rating for Location in each cell in one row in Excel, but no results for the service rating appear, either.
0 4
1 5
2
3 o
4 u
5 t
6
7 o
8 f
9
10 f
11 i
12 f
13 t
14 y
15
16 p
17 o
18 i
19 n
19 t
20 s
Is there any way I can tell Python to save the second line in the print text "45 out of fifty points" to the cell below "Service" in Excel? I searched thoroughly but could not find a solution yet. Thank you very much for your help!
I can't understand why you have 2 separate loops, and I can't find where the class
LocationName
appears anywhere within the HTML. Because I would expect you to get no results for that, I would expect nothing to be written in the first loop - consistent with what you report. Seems like you should writetext
to (row, col) in the 2nd loop.Following from the discussion, the first loop employs the name from other data in the HTML that only occurs once per page.
My suggestion to avoid the likely overwrite of the (row, col+1) cell: