My Xpaths don't work in Scrapy Splash, but work in Selenium

46 Views Asked by At

I am trying to scrape a list all of the scholarships in the https://bigfuture.collegeboard.org/scholarships/; I was able to scrape all of the links and store it in a list using Selenium. However, Selenium is not scalable to scrape the data in each web address. I am trying to use Scrapy and Splash, but using the Xpath or the CSS selector don't work. This is my first time webscraping so I am very lost. I would greatly appreciate any help!

class ScholarshipSpider(scrapy.Spider):
    name = 'scholarship'
    start_urls = [line.strip() for line in open("links.txt")]
    
    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, args={'wait': 7, 'html': 1, 'png': 1})

    def __init__(self, *args, **kwargs):
        super(ScholarshipSpider, self).__init__(*args, **kwargs)
        self.items_list = []
        
    def parse(self, response):
        
        item = {
            'name': response.xpath('//*[@id="main-content"]/div/div[2]/div/div/div[1]/section[1]/div/div[1]/h1/text()').get()

            #other items here
        }
        
        self.logger.info(item) 
        self.items_list.append(item)
        
        print(f"Name: {item['name']}") 
        
    def closed(self, reason):
        df = pd.DataFrame(self.items_list)
        df.to_csv('scraped_data.csv', index=False)

When I tried using Selenium, Xpaths worked, but my code stopped to work after a while. Scrapy seems like the best alternative but doesn't matter what I try, it does not work.

I am using Jupyter Notebook btw.

0

There are 0 best solutions below