Python 3.4 : LXML : Parsing Tables

261 Views Asked by At

I want to parse an entire table from yahoo finance. As I understand it 'tbody' and 'thead' tags are not registered by lxml but rather as additional TR so I switched the xpath from:

/html/body/div[4]/div[4]/table[2]/tbody/tr[2]/td/table[2]/tbody/tr/td/table/tbody

to what is seen in the code below

url = 'http://finance.yahoo.com/q/is?s=MMM+Income+Statement&annual'

tree = html.parse(url)



tick_content = [td.text_content() for td in tree.xpath('/html/body/div[4]/div[4]/table[2]/tr[3]/td/table[2]/tr[1]/td/table/td[1]')]

print(tick_content)

I am returning a blank screen. Is there a special way to parse a table orrrr?

1

There are 1 best solutions below

8
On BEST ANSWER

Rather than use a huge long XPath as generated by Chrome, you can just search for a table with the yfnc_tabledata1 class; there is just the one:

>>> tree.xpath("//table[@class='yfnc_tabledata1']")
[<Element table at 0x10445e788>]

Get to your <td> from there:

>>> tree.xpath("//table[@class='yfnc_tabledata1']//td[1]")[0].text_content()
'Period EndingDec 31, 2014Dec 31, 2013Dec 31, 2012\n                            \n                        Total Revenue\n                            \n                        \n                                \n                            31,821,000\xa0\xa0\n                                \n                            \n                                \n                            30,871,000\xa0\xa0\n                                \n                            \n                                \n                            29,904,000\xa0\xa0\n                                \n                            Cost of Revenue16,447,000\xa0\xa016,106,000\xa0\xa015,685,000\xa0\xa0\n                            \n                        Gross Profit\n                            \n                        \n                                \n                            15,374,000\xa0\xa0\n                                \n                            \n                                \n                            14,765,000\xa0\xa0\n                                \n                            \n                                \n                            14,219,000\xa0\xa0\n                                \n                            \n                    \n                Operating Expenses\n                    \n                Research Development1,770,000\xa0\xa01,715,000\xa0\xa01,634,000\xa0\xa0\n                    \n                Selling General and Administrative6,469,000\xa0\xa06,384,000\xa0\xa06,102,000\xa0\xa0\n                    \n                Non Recurring\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                    \n                Others\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                    \n                \n                    \n                Total Operating Expenses\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                            \n                        Operating Income or Loss\n                            \n                        \n                                \n                            7,135,000\xa0\xa0\n                                \n                            \n                                \n                            6,666,000\xa0\xa0\n                                \n                            \n                                \n                            6,483,000\xa0\xa0\n                                \n                            \n                    \n                Income from Continuing Operations\n                    \n                Total Other Income/Expenses Net33,000\xa0\xa041,000\xa0\xa039,000\xa0\xa0\n                    \n                Earnings Before Interest And Taxes7,168,000\xa0\xa06,707,000\xa0\xa06,522,000\xa0\xa0\n                    \n                Interest Expense142,000\xa0\xa0145,000\xa0\xa0171,000\xa0\xa0\n                    \n                Income Before Tax7,026,000\xa0\xa06,562,000\xa0\xa06,351,000\xa0\xa0\n                    \n                Income Tax Expense2,028,000\xa0\xa01,841,000\xa0\xa01,840,000\xa0\xa0\n                    \n                Minority Interest(42,000)(62,000)(67,000)\n                    \n                \n                    \n                Net Income From Continuing Ops4,956,000\xa0\xa04,659,000\xa0\xa04,444,000\xa0\xa0\n                    \n                Non-recurring Events\n                    \n                Discontinued Operations\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                    \n                Extraordinary Items\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                    \n                Effect Of Accounting Changes\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                    \n                Other Items\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                            \n                        Net Income\n                            \n                        \n                                \n                            4,956,000\xa0\xa0\n                                \n                            \n                                \n                            4,659,000\xa0\xa0\n                                \n                            \n                                \n                            4,444,000\xa0\xa0\n                                \n                            Preferred Stock And Other Adjustments\n            -\n            \xa0\n            -\n            \xa0\n            -\n            \xa0\n                            \n                        Net Income Applicable To Common Shares\n                            \n                        \n                                \n                            4,956,000\xa0\xa0\n                                \n                            \n                                \n                            4,659,000\xa0\xa0\n                                \n                            \n                                \n                            4,444,000\xa0\xa0\n                                \n                            '
>>> print(tree.xpath("//table[@class='yfnc_tabledata1']//td[1]")[0].text_content())
Period EndingDec 31, 2014Dec 31, 2013Dec 31, 2012

                        Total Revenue



                            31,821,000  



                            30,871,000  



                            29,904,000  

                            Cost of Revenue16,447,000  16,106,000  15,685,000  

                        Gross Profit



                            15,374,000  



                            14,765,000  



                            14,219,000  



                Operating Expenses

                Research Development1,770,000  1,715,000  1,634,000  

                Selling General and Administrative6,469,000  6,384,000  6,102,000  

                Non Recurring
            -
             
            -
             
            -
             

                Others
            -
             
            -
             
            -
             



                Total Operating Expenses
            -
             
            -
             
            -
             

                        Operating Income or Loss



                            7,135,000  



                            6,666,000  



                            6,483,000  



                Income from Continuing Operations

                Total Other Income/Expenses Net33,000  41,000  39,000  

                Earnings Before Interest And Taxes7,168,000  6,707,000  6,522,000  

                Interest Expense142,000  145,000  171,000  

                Income Before Tax7,026,000  6,562,000  6,351,000  

                Income Tax Expense2,028,000  1,841,000  1,840,000  

                Minority Interest(42,000)(62,000)(67,000)



                Net Income From Continuing Ops4,956,000  4,659,000  4,444,000  

                Non-recurring Events

                Discontinued Operations
            -
             
            -
             
            -
             

                Extraordinary Items
            -
             
            -
             
            -
             

                Effect Of Accounting Changes
            -
             
            -
             
            -
             

                Other Items
            -
             
            -
             
            -
             

                        Net Income



                            4,956,000  



                            4,659,000  



                            4,444,000  

                            Preferred Stock And Other Adjustments
            -
             
            -
             
            -
             

                        Net Income Applicable To Common Shares



                            4,956,000  



                            4,659,000  



                            4,444,000