Scrapy Ignore values after <br> tag

286 Views Asked by At

HTML:

<span class="number"> - Sep 15, 1991<br><strong>Some Number: </strong>123, 123, 145</span>

Scrapy:

 samples = response.css('ul li.somthing')
    for sample in samples:
        loader = ItemLoader(item=CatelogItem(), selector=sample)
        loader.add_css('some', 'span.number::text')
        yield loader.load_item()

Item.py

some = Field(
    input_processor=MapCompose(str.strip),
    output_processor=Join()
)

Result

- Sep 15, 1991

Expected

- Sep 15, 1991 Some Number: 123, 123, 145

Why is this behavior? how do i get the full value loaded in itemloader?

1

There are 1 best solutions below

0
On BEST ANSWER

You needed to grab all the innerhtml instead of text which includes all of it's nested components.

loader.add_css('some', 'span.number *::text')