I want to scrape a soccer website to create a dataset in Pandas. I don't know how to input the scraped data information of the players into 3 columns(name, league, soccer team) and also add the country to fit into a table/data frame.
The information has been scraped, all be it not very neatly, but I'm not sure (nor do I know how) i should create an array and loop the information into lists or arrays.
from bs4 import BeautifulSoup
import requests
url = 'https://ng.soccerway.com/players/players_abroad/nigeria/'
req = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
page = req
soup = BeautifulSoup(page.text, 'html')
table = soup.find_all('table', class_="playersabroad table")
player_country = soup.find_all('th')
player_country_header = [country.text.strip() for country in player_country]
print(player_country_header)
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['player-name', 'League', 'team_name'])
#df = pd.DataFrame(columns = player_country_header ) df
table_data = soup.find_all('td')
player_data_list=[data.text.strip() for data in table_data]
#length = len(df)
#df.loc[length] = player_data_list
print(player_data_list)
With pandas, here is a proposition with a post-processed
read_html
:Output :