Chatbot written in Python that should work in Telegram does not search for articles from the IEEE Spectrum website

46 Views Asked by At

I need to create a bot that will search for articles on the IEEE Spectrum site after the user enters keywords. The bot must work in Telegram. But when searching for articles, the bot always gives me No results were found for your request.. Although there are articles on the site, I checked it. Why is the bot not working correctly?

import telegram
from telegram.ext import Updater, CommandHandler
import requests
from bs4 import BeautifulSoup

# a function that will be enabled when a command is received
def start(update, context):
    update.message.reply_text(
        "Hello! I'll help you find articles on the IEEE Spectrum website."
        'Just write /search and the search keywords after that.')

# a function that will turn on when you receive a text message
def search(update, context):
    query = " ".join(context.args)
    if query == "":
        update.message.reply_text('To search, you must enter keywords after the /search command')
        return

    # the site where we will search for articles
    url = 'https://spectrum.ieee.org'
    # request a site using keywords
    response = requests.get(url+'/search?keywords=' + query)

    if response.status_code == 200:
        # parsing html page using BeautifulSoup
        soup = BeautifulSoup(response.content, 'html.parser')

        # looking for articles on the search results page
        articles = soup.select('.search-result')
        if len(articles) > 0:
            for article in articles:
                title = article.select_one('.search-result-title a').text
                href = article.select_one('.search-result-title a')['href']
                message = f'{title}\n{url}{href}'
                update.message.reply_text(message)
        else:
            update.message.reply_text('No results were found for your request.')
    else:
        update.message.reply_text('Error when requesting IEEE Spectrum site.')

# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher

# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)

# launch a bot
updater.start_polling()
updater.idle()

I tried to do something, but nothing worked

2

There are 2 best solutions below

3
Brandon Li On

The problem is that the site is using JavaScript. requests only works for static web pages, and will not work this site. You can verify this with curl: curl -L https://spectrum.ieee.org/search/\?q\=aerospace. You can see that the response contains JavaScript, which request will not work with.

Instead, you might want to use a headless web driver with Selenium. Selenium spawns an actual browser instance, so JavaScript will function, and the search results will load.

The general flow of your program should remain the same, and you only need to change out the web-scraping part of your code.

You can learn more about Selenium with its documentation.

0
Steve Steve On
import telegram
from telegram.ext import Updater, CommandHandler
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup


# a function that will be enabled when a command is received
def start(update, context):
    update.message.reply_text(
        "Hello! I'll help you find articles on the IEEE Spectrum website."
        'Just write /search and the search keywords after that.')


# a function that will turn on when you receive a text message
def search(update, context):
    query = " ".join(context.args)
    if query == "":
        update.message.reply_text('To search, you must enter keywords after the /search command')
        return

    # the site where we will search for articles
    url = 'https://spectrum.ieee.org'

    # set up the web driver
    options = Options()
    options.add_argument('--headless')
    driver = webdriver.Chrome(options=options)
    driver.get(url + '/search?keywords=' + query)

    # get the page source after the JavaScript has loaded
    html = driver.page_source
    driver.quit()

    # parsing html page using BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # looking for articles on the search results page
    articles = soup.select('.search-result')
    if len(articles) > 0:
        for article in articles:
            title = article.select_one('.search-result-title a').text
            href = article.select_one('.search-result-title a')['href']
            message = f'{title}\n{url}{href}'
            context.bot.send_message(chat_id=update.effective_chat.id, text=message)
    else:
        update.message.reply_text('No results were found for your request.')


# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher

# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)

# launch a bot
updater.start_polling()
updater.idle()

This is modified code