NodeJS xRay module object string html remove

91 Views Asked by At

I extracted product names and product prices with xray module of nodejs. While I was scraping, some htm lexpressions like /n comes with text. I want to replace all html codes and create objects with replaced versions.

I have codes like this:

var Xray = require('x-ray')
var x = Xray()
 
var urls=['link','link','link']
 
for(var i = 0; i < urls.length; i++){
    x(urls[i], {
            title: '#sp-title',
            price: '.lastPrice'.replace(/(<([^>]+)>)/ig,"").trim()
        })(function(err, obj) {
        console.log(obj);
    })
}

The example code above takes data from the loop of 3 different links and saves as an object and output is as follows.

{
  title: 'King P 1110 Exotic Katı Meyve Sıkacağı',
  price: '\n                    549,00 TL                '
}
{
  title: 'Xiaomi Mi Pro 10000 mAh Type-C Taşınabilir Şarj Cihazı',
  price: '\n                    144,14 TL                '
}
{
  title: 'Fakir River  Çay Makinesi',
  price: '\n                    505,50 TL                '
}

Also, how can I check whether an element exist on this page?

Thanks.

1

There are 1 best solutions below

0
On

So you want to change

'\n                    549,00 TL                '

for

549,00 TL

I hope I got your question right:

It's only a new line (\n) and empty space, so if you trim the result, you'll get what you are looking for.

x-ray allows you to add filters and apply them to your queries:

var Xray = require('x-ray')
const x = Xray({
    filters: {
        trim: function (value) {
            return typeof value === 'string' ? value.trim() : value
        },
        low: function (value) {
            return typeof value === 'string' ? value.toLocaleLowerCase() : value
        },
        status: function (value) {
            newv = value.replace("Status: ", "")
            return newv
        },
        lines: function (value) {
            noLines = value.replace(/\r?\n|\r/g, "")
            return noLines
        },
        punto: function (value) {
            comments = value.replace(/.+?(?=·)/, "").replace('comments', '').replace('·', '')
            return comments
        },
        toNum: function (value) {
            return parseInt(value)
        }
    }
})

you can add whatever filter you want, and then on your code just put a "| [nameOfFilter]" like this:

x(l.link, '.Item', [{
                    post_time: '.DiscussionMeta span a time@datetime | trim',
                    comment_time: '.CommentMeta span a time@datetime | trim',
                    origin: '.Category a | trim'
                }])

there you have every option with a trim filter passed on. the result will be like the one you are expecting if everything went right