I need to extract this uid from a .sgm file

152 Views Asked by At

I need to extract the uid from a .sgm file, I tried the below code but it doesn't, work can anybody help?

Sample .sgm file content:

<miscdoc n='1863099' uid='0001863099_20220120' type='seccomlett' t='frm' mdy='01/20/2022'><rname>Kimbell Tiger Acquisition Corp, 01/20/2022</rname>

<table col='2' type='txt'>
<colspec col='1' colwidth='*'>
<colspec col='2' colwidth='2*'>
<tname>Meta-data</tname>
<tbody>
<row><entry>SEC-HEADER</entry><entry>0001104659-22-005920.hdr.sgml : 20220304</entry></row>
<row><entry>ACCEPTANCE-DATETIME</entry><entry>20220120160231</entry></row>
<row><entry>PRIVATE-TO-PUBLIC</entry></row>
<row><entry>ACCESSION-NUMBER</entry><entry>0001104659-22-005920</entry></row>
<row><entry>TYPE</entry><entry>CORRESP</entry></row>
<row><entry>PUBLIC-DOCUMENT-COUNT</entry><entry>1</entry></row>
<row><entry>FILING-DATE</entry><entry>20220120</entry></row>
<row><entry>FILER</entry></row>

code I tried:

import os  
# Folder Path
path = "Enter Folder Path" 
# Change the directory
os.chdir(path) 
# Read text File  
def read_file(file_path):
    with open(file_path, 'r') as f:
        print(f.read())  
# iterate through all file
for file in os.listdir():
    # Check whether file is in text format or not
    if file.endswith(".sgm"):
        if 'uid' in file:
            print("true")
        file_path = f"{path}\{file}"
        # call read text file function
        read_file(file_path)

I need extract the uid value from the above sgm file, is there any other way I could do this? what should I change in my code?

1

There are 1 best solutions below

0
On

SGM format may just by an XML superset. If it isn't then for this particular case (and if one could rely on the format being as shown in the question) then:

import re

def get_uid(filename):
  with open(filename) as infile:
    for line in map(str.strip, infile):
      if line.startswith('<miscdoc'):
        if uid := re.findall("uid='(.*?)'", line):
          return uid[0]