I am creating program in Python which is handling metadata from Tableau dashboards. Some of the dashboards have pictures, some of them not. The pictures can not be deleted from the dashboards itselves. I load Tableau metadata through REST API. It is program which is supposed to download all the metadata of all dashboards in our company so it needs to be automatized solution.
My program can read the data and handle them (extract the important things etc.), but when there is some error I can not read the .twb file and therefore can not edit the structure of the program.
So I need an advice how to make my workbook.twb readable as for computer, as for me. I do not care in which type of file would the 'readable workbook' converted into since it is gonna be possible for me to read it and scroll through it.
I tried to make this function:
`class Tableau:
def __init__(self, twbx_path=None):
self.in_memory_files = {}
self.twbx_path = twbx_path
self.extracted_data = {}
self.tree = None
self.root = None
self.datasource_info = {}
def unpack_twbx(self):
self.in_memory_files[self.twbx_path] = {}
file_extension = os.path.splitext(self.twbx_path)[1]
if zipfile.is_zipfile(self.twbx_path):
with zipfile.ZipFile(self.twbx_path, 'r') as zip_ref:
for file_info in zip_ref.infolist():
with zip_ref.open(file_info.filename) as file:
file_content = file.read()
self.in_memory_files[self.twbx_path][file_info.filename] = file_content
try:
# No need for the if statement that checks for '.twb', let's directly parse!
self.tree = ET.ElementTree(ET.fromstring(file_content))
self.root = self.tree.getroot()
for elem in self.root.iter():
print(elem.tag, elem.attrib) # This will print the tag name and attributes for each XML element
except ET.ParseError:
weird_file_name = 'weird_file_content'
if file_content[:4] == b'\x89PNG':
weird_file_name += '.png'
else:
weird_file_name += '.bin' # default to .bin if we can't identify the file
with open(weird_file_name, 'wb') as f:
f.write(file_content)
print(f"Saved content to {weird_file_name}.")
print("heres file content")
print(file_content)
# with open('weird_file_content.xml', 'wb') as f:
# f.write(file_content)
# print("fffdjslkfa content here")
# print(file_content)
except UnicodeDecodeError:
print("Decoding failed. Writing raw bytes to disk for manual inspection.")
with open('weird_file_content.raw', 'wb') as f:
f.write(file_content)
print("file content here")
print(file_content)
elif file_extension == '.twb':
# Parse the twb directly
try:
self.tree = ET.parse(self.twbx_path)
self.root = self.tree.getroot()
print(f"I've lovingly parsed the content of {self.twbx_path}! ")
except ET.ParseError:
with open('workbook.twb', 'r', encoding='ISO-8859') as f:
content = f.read()
print(content)
self.tree = ET.parse(self.content)
self.root = self.tree.getroot()
else:
print("I'm not sure what this file type is. ")
print(f"I've unpacked content of {twbx_path} into the in-memory files dictionary! ")
`
which firstly checks whether the downloaded workbook is zipfile or not (that should transform the picture files) and then reads the file and save it to memory. But it does not catch all the dashboards. Therefore even after executing this function the workbook.twb file still looks like this:
PK y5W �ܭS �V Image/Positive_flat.pngļe\T�6<� ҍ�!ݡ 5��tIw�R�ҍ0#��PC (�� ݝ����~{>?���9g��^q�k�}��h� !�'�@$�*ϵA l��x��t{���C��C������� ����U����J������/+iHOJ������朖#�I(�����)�F|n�kV���(�+�����o,��Ϝf�S�.K�J�Y�lT�e6U�jH���܇l���)
The file downloaded through REST API is hardcoded to be saved as 'workbook.twb', with commplicated dashboards I tried to save it as 'workbook.twbx' but it did not helped neither. I am programmer beginner.