How can the start_urls for scrapy be imported from csv?

373 Views Asked by At

I try to crawl several urls from a csv file (all in 1 column). However, the code does not return anything. Thanks, Nicole

import scrapy
from scrapy.http import HtmlResponse
from scrapy.http import Request
import csv

scrapurls = ""

def get_urls_from_csv():
    with open("produktlink_test.csv", 'rbU') as csv_file:
        data = csv.reader(csv_file)
        scrapurls = []
        for row in data:
            scrapurls.append(column)
            return scrapurls

class GetlinksgalaxusSpider(scrapy.Spider):
    name = 'getlinksgalaxus'
    allowed_domains = []
    
    # An dieser Stelle definieren wir unsere Zieldomains
    start_urls = scrapurls

    def parse(self, response):

    ....
1

There are 1 best solutions below

1
On

Previous Answer: How to loop through multiple URLs to scrape from a CSV file in Scrapy?l

Also, it's better to put all of your methods inside the Scrapy spider and explicitly add in start_requests.