scraping content from confluence

58 Views Asked by Saravana Kumar At 15 February 2024 at 07:47

With the help of atlassian api, i managed to get the content stored inside the table on the confluence page and write into a dataframe. Finally i load that into a json file. output looks like below

"table_name":"emp",
"created_date":"2/2/2024"

values present inside the table in the same way.

Now along with other fields, i have one more field called query which holds the sql query inside it

select * from table
where priority=1  -- filtering records with more priority 1
and owner="marie"  -- maries' content

the problem here is, im writing the content directly from html to dataframe like shown below

    confluence = Confluence(url=server, token=api_key)
    page = confluence.get_page_by_id(page_id, expand="body.storage")
    body = page["body"]["storage"]["value"]

    #Writing the page content into dataframe
    df = pd.read_html(body)

and json dump follows. I need to remove the comments preceded by -- inside the query. In the beginning itself all the values inside query turn into a string with single line, regular expression is not solving the issue. If query was delimited by new lines, the regex would have done its job.

Need help on removing the comments and before the json creation

Original Q&A

scraping content from confluence

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in HTML

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in CONFLUENCE

Trending Questions

Popular # Hahtags

Popular Questions