Extracting data inside a table from confluence

179 Views Asked by At

I am trying to extract the table from a confluence page and write into a dataframe using the below code.

I've been getting the 401 unauthorized error, reason could be access related, anyhow i wanted to ensure whether the code is clean or not

from atlassian import Confluence
import os
from bs4 import BeautifulSoup
import pandas as pd

user = "user_name"
api_key = os.environ['OEwfen9FFrerGreer5GRrrdfd']
server = "https://confluence.abc.com/display/Int/%5new_variable"

confluence = Confluence(url=server, username=user, password=api_key)
page = confluence.get_page_by_title("TEST", "page 1", expand="body.storage")
body = page["body"]["storage"]["value"]

tables_raw = [[[cell.text for cell in row("th") + row("td")]
                    for row in table("tr")]
                    for table in BeautifulSoup(body, features="lxml")("table")]

tables_df = [pd.DataFrame(table) for table in tables_raw]
for table_df in tables_df:
    print(table_df)
1

There are 1 best solutions below

0
On

Assuming you are using confluence cloud

from atlassian import Confluence
import io from StringIO
import pandas as pd


api_key = 'OEwfen9FFrerGreer5GRrrdfd' #PAT you get from your account, also dont share this key
server = "https://confluence.abc.com/" #Just the base link in my case it was host.com/confluence

confluence = Confluence(url=server, token=api_key)
page = confluence.get_page_by_title("TEST", "page 1", expand="body.storage")
body = page["body"]["storage"]["value"]

df = pd.read_html(body) 
#if you want links do this instead
df = pd.read_html(body, links="all")

then deal with the dataframe as you want

Confluence API Documentaion

Pandas Documentation

read documentation they will have your answers 90% of the time