find string pattern using regex and append the result into a list

446 Views Asked by At

I'm a noob using re library on Python. I am doing a Web Scraping and I would like to match some string patterns and append the values on a list. for instance:

parking = []
rooms  = []
toilets = []


attribute = soup.find('ul',{'class':'specs-list'}).find_all('li')
for a in attribute:
    print(a.text)

output iteration a with index 0

Metters
50 m�

Rooms
2

Toilets
1

output iteration a with index 1

   Metters
   50 m�
   
   parking 
   1
   
   spends
   340 

 

so for example I want to match the names of the titles and if exists on the A value I want to append the result on each list

pseudocode:

for a in attribute:
  if a contains "Rooms":
     rooms.append(a)
  if a contains "Parking":
     parking.append(a)
  if a contains "toilets":
     parking.append(a)


  if a not contains strings above:
     rooms.append(nan)
     parking.append(nan)
     rooms.append(nan)

I use BeautifulSoup to create the web scraping and the result of attribute value is the following one:

Attribute variable output for index 0:

[<li class="specs-item">
<strong>Metters</strong>
<span>50 m�</span>
</li>,<li class="specs-item">
<strong>Rooms</strong>
<span>2</span>
</li>,<li class="specs-item">
<strong>Toilets</strong>
<span>1</span>
</li>,<li class="specs-item">
<strong>Spends</strong>
<span>340</span></li>]

An attribute has a length 0f 5 values and each value has a similar code than the above but the titles and values are different, someones contain parking, rooms, toilets, others just have toilets and rooms, and so on.

1

There are 1 best solutions below

1
On BEST ANSWER

This should help u:

from bs4 import BeautifulSoup
import requests 

parking = []
rooms  = []
toilets = []

html = requests.get('website url').text

soup = BeautifulSoup(html,'html.parser')

attribute = soup.find_all('li',{'class':'specs-item'})

for a in attribute:
    
    heading = a.strong.text
    span = a.span.text
    
    if heading == "Parking":
        parking.append(span)
    elif heading == "Rooms":
        rooms.append(span)
    elif heading == "Toilets":
        toilets.append(span)
    
print("Parking =" , parking)
print("Rooms =", rooms)
print("Toilets =", toilets)

Output for the li values provided by u:

Parking = []
Rooms = ['2']
Toilets = ['1']

Edit:

Though this works, what I feel is that having so many lists is not a good approach. Instead, u can use a dictionary. This is how u can achieve the same output using a dictionary:

details_dict = {'Parking':[],
                'Rooms':[],
                'Toilets':[]}
for a in attribute:
    
    heading = a.strong.text
    span = a.span.text
    
    if heading == "Parking" or heading == "Rooms" or heading == "Toilets":
        details_dict[heading].append(span)

print(details_dict)

Output:

{'Parking': [], 'Rooms': ['2'], 'Toilets': ['1']}

I feel that this is a better approach. But it is all up to u. Choose whichever best suits ur task.