Python & Scrapy output: "\r\n\t\t\t\t\t\t\t"

993 Views Asked by At

I'M learning scraping with Scrapy and having some issues with some code giving me a weird output that I don't understand. Can someone explain to me why I am getting a bunch "\r\n\t\t\t\t\t\t\t"

I found this solution on Stack overflow: Remove an '\\n\\t\\t\\t'-element from list

But I want to learn what is causing it.

Here is my code that is causing my issue. The Strip method from the link above solves it, but as mentioned, I don't understand where it is coming from.

import scrapy
import logging
import re

class CitySpider(scrapy.Spider):
    name = 'city'
    allowed_domains = ['www.a-tembo.nl']
    start_urls = ['https://www.a-tembo.nl/themas/category/city/']

    def parse(self, response):
        titles = response.xpath("//div[@class='hikashop_category_image']/a")
        
        for title in titles:
            series = title.xpath(".//@title").get()
            link = title.xpath(".//@href").get()

            #absolute_url = f"https://www.a-tembo.nl{link}"
            #absolute_url = response.urljoin(link)

            yield response.follow(link, callback=self.parse_title)

    def parse_title(self, response):
        rows = response.xpath("//table[@class='hikashop_products_table adminlist table']/tbody/tr")

        for row in rows:
            product_code = row.xpath(".//span[@class='hikashop_product_code']/text()").get()
            product_name = row.xpath(".//span[@class='hikashop_product_name']/a/text()").get()

            yield{
                "Product_code": product_code,
                "Product_name": product_name
                       
            }
1

There are 1 best solutions below

1
On

Characters like \n are called escape characters. For example: \n indicates a new line and \t signifies a tab. Websites are full of them, although you never see them without inspecting the page. If you want to learn more about escape characters in Python you can read about them here. I hope that answers your question.