scraping content from confluence

58 Views Asked by At

With the help of atlassian api, i managed to get the content stored inside the table on the confluence page and write into a dataframe. Finally i load that into a json file. output looks like below

"table_name":"emp",
"created_date":"2/2/2024"

values present inside the table in the same way.

Now along with other fields, i have one more field called query which holds the sql query inside it

select * from table
where priority=1  -- filtering records with more priority 1
and owner="marie"  -- maries' content

the problem here is, im writing the content directly from html to dataframe like shown below

    confluence = Confluence(url=server, token=api_key)
    page = confluence.get_page_by_id(page_id, expand="body.storage")
    body = page["body"]["storage"]["value"]

    #Writing the page content into dataframe
    df = pd.read_html(body)

and json dump follows. I need to remove the comments preceded by -- inside the query. In the beginning itself all the values inside query turn into a string with single line, regular expression is not solving the issue. If query was delimited by new lines, the regex would have done its job.

Need help on removing the comments and before the json creation

0

There are 0 best solutions below