I'm trying to scrape people's public profiles to get most common skills for certain roles. I'm able to extract email, company, name, position etc. but I can't get the skills. I'm using Selector from parsel. I tried many approaches but clearly i'm targeting the wrong class and I should probably loop through skills. Here is my code so far:
def linkedin_scrape(linkedin_urls):
profiles = []
for url in linkedin_urls:
_DRIVER_CHROME.get(url)
sleep(5)
selector = Selector(text=_DRIVER_CHROME.page_source)
# Use xpath to extract the exact class containing the profile name
name = selector.xpath('//*[starts-with(@class, "inline")]/text()').extract_first()
if name:
name = name.strip()
# Use xpath to extract the exact class containing the profile position
position = selector.xpath('//*[starts-with(@class, "mt1")]/text()').extract_first()
if position:
position = position.strip()
position = position[0:position.find(' at ')]
# Use xpath to extract the exact class containing the profile company
company = selector.xpath('//*[starts-with(@class, "text-align-left")]/text()').extract_first()
if company:
company = company.strip()
# Use xpath to extract skills
skills = selector.xpath('//*[starts-with(@class, "pv-skill")]/text()').extract_first()
if skills:
skills = skills.strip()
profiles.append([name, position, company, url])
print(f'{len(profiles)}: {name}, {position}, {company}, {url}, {skills}')
return profiles
In order to capture all skills, you need first to expand the skills section so that it displays all skills and then target the class with the name that starts with 'pv-skill-category-entity__name-text'.
This works for me until today.