I can't seem to get the city, state, and zip located after the mailing address in the "br" tag to pull out. I have no issue getting any other information extracted.
element = soup.find(lambda tag: tag.name=='th' and 'Mailing Address:' in tag.text)
if not element:
print(f"Mailing Address not found for {parcel_number}.")
return None
mailing_address = element.find_next_sibling().decode_contents()
address_lines = mailing_address.split('<br>')
address = address_lines[0].strip() # first line is the address
city_state_zip_br = element.find_next_sibling('br') # find the br tag containing the city, state, and zip
if city_state_zip_br:
city_state_zip = city_state_zip_br.next_sibling.strip()
parts = city_state_zip.split(', ')
if len(parts) < 3:
city = ""
state = ""
zip_code = ""
else:
city = parts[0].strip()
state, zip_code = parts[1].strip().split(' ')
zip_code = zip_code.strip()
else:
city = ""
state = ""
zip_code = ""
Here is the HTML code: You will have to scroll to the right to see the city, state and zip on the HTML code below.
<tr><th>Parcel Number:</th><td>1207250000015003</td></tr>
<tr><th>Type:</th><td>Real</td></tr>
<tr><th>Property Class:</th><td>2 </td></tr>
<tr class="active form-table-title"><th colspan="2">Location</th></tr>
<tr><th>Address:</th><td>11875 HIGHWAY 43 N AXIS, AL 36505 </td></tr>
<tr class="active form-table-title">
<th colspan="2">
Owner
</th>
</tr>
<tr><th>Name:</th><td>TOWER LOT 1 LLC </td></tr>
<tr><th>Mailing Address:</th><td>P O BOX 336 <br> <br>BIRMINGHAM , AL 35201-0336
I got it to pull out the state and zip perfect but on the city its pulling in the address with it. Any suggestions?
print out looks like this: city: 10163 KALI OKA RD EIGHT MILE state: AL zip_code: 36613-8790
New code:
mailing_address_lines = [line.strip() for line in mailing_address.split('\n') if line.strip()]
address = mailing_address_lines[0] # first line is the address
print(f"address: {address}")
city_state_zip_pattern = r'^(.+?),\s+([A-Z]{2})\s+(\d{5}(?:-\d{4})?)$'
city_state_zip_match = re.match(city_state_zip_pattern, mailing_address_lines[-1])
if city_state_zip_match:
city = city_state_zip_match.group(1)
state = city_state_zip_match.group(2)
zip_code = city_state_zip_match.group(3)
else:
city_state_zip = mailing_address_lines[-1]
address_parts = address.split(city_state_zip)
if len(address_parts) > 1:
city = address_parts[1].strip()
else:
city = ""
city_state_zip_parts = city_state_zip.split()
state = city_state_zip_parts[-2]
zip_code = city_state_zip_parts[-1]
address = address_parts[0].strip()
print(f"address: {address}")
print(f"city: {city}")
print(f"state: {state}")
print(f"zip_code: {zip_code}")