How to obtain stock market company sector from ticker or company name in python

6.1k Views Asked by At

Given a company ticker or name I would like to get its sector using python.

I have tried already several potential solutions but none has worked succesfully

The two most promising are:

1) Using the script from: https://gist.github.com/pratapvardhan/9b57634d57f21cf3874c

from urllib import urlopen
from lxml.html import parse

'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
  tree = parse(urlopen('http://www.google.com/finance?&q='+name))
  return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

However I am using python --version 3.8

I have been able to tweak this solution, but the last line is not working and I am completely new to scraping web pages, so I would appreciate if anyone has some suggestions.

Here is my current code:

from urllib.request import Request, urlopen
from lxml.html import parse

name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

tree = parse(webpage)

But then the last part is not working and I am very new to this xpath syntax:

tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

2) The other option was embedding R's TTN package as shown here: Find which sector a stock belongs to

However, I want to run it within my Jupyter notebook, and it is just taking ages to run ss <- stockSymbols()

3

There are 3 best solutions below

5
keepAlive On

Following your comment, for marketwatch.com/investing/stock specifically, the xpath that is likely to work is "//div[@class='intraday__sector']/span[@class='label']" meaning that doing

tree.xpath("//div[@class='intraday__sector']/span[@class='label']")[0].text

should return the desired information.

I am completely new to scraping web pages [...]

Some precisions:

  1. This xpath totally depends on the website you are looking at, explaining why there were no hope in searching "//a[@id='sector']" in the page you mention in comments, since this xpath (now outdated) was google-finance specific. Put differently, you first need to "study" the page you are interested in to know where the information you want is located.
  2. To conduct such "study" I use Chrome DevTools and check any xpath in the console, doing $x(<your-xpath-of-interest>) where the function $x is documented here (with examples!).
  3. Luckily for you, the information you want to get from marketwatch.com/investing/stock -- the sector's name -- is statically generated (i.e. not dynamically generated at page loading, in which case other scraping techniques would have been required, resorting to other python libraries such as Selenium.. but this is another question).
0
alejandro On

To answer the question:

How to obtain stock market company sector from ticker or company name in python?

I had to find a work around after reading some material and some nice suggestions from @keepAlive.

The following does the job in a reverse way, i.e. gets the companies given the sector. There are 10 sectors, so it is not too much work if one wants info for all sectors: https://www.stockmonitor.com/sectors/

Given that marketwatch.com/investing/stock was throwing a 405 Error, I decided to use https://www.stockmonitor.com/sectors/, for example:

https://www.stockmonitor.com/sector/healthcare/

Here is the code:

import requests

import pandas as pd

from lxml.html import parse
from urllib.request import Request, urlopen

headers = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)" + " "
    "AppleWebKit/537.36 (KHTML, like Gecko)" + " " + "Chrome/35.0.1916.47" +
    " " + "Safari/537.36"
]

url = 'https://www.stockmonitor.com/sector/healthcare/'

headers_dict = {'User-Agent': headers[0]}
req = Request(url, headers=headers_dict)
webpage = urlopen(req)

tree = parse(webpage)
healthcare_tickers = []
for element in tree.xpath("//tbody/tr/td[@class='text-left']/a"):

    healthcare_tickers.append(element.text)

pd.Series(healthcare_tickers)

Thus, healthcare_tickers has the stock companies in the healthcare sector.

0
oo7knutson On

You can easily obtain the sector for any given company/ticker with yahoo finance:

import yfinance as yf

tickerdata = yf.Ticker('TSLA') #the tickersymbol for Tesla
print (tickerdata.info['sector'])

Code returns: 'Consumer Cyclical'

If you want other information about the company/ticker, just print(tickerdata.info) to see all other possible dictionary keys and corresponding values, like ['sector'] used in the code above.