Split string in a list

105 Views Asked by At

Consider a number of list (looping) that has inconsistent string or even length of lists. The list is an output from email (.eml) message body.

Example list 1

['Request 1',
'String example',
'Service:xyz Request Date Time: 4/7/2022 8:20:54 PMService: Sub Service:']

Example list 2

['Request 2',
'String example 1',
'String example 2',
'Service : xyzabc   Requested by : example   Request Date : 4/8/2022 7:31:17 AM   Service :   abcdefg   Sub Service :   abcdefg       Current Owner']

Example list 3

['Request 3',
'string example',
'Service : abcdefg     Requested by : example   Request Date : Thursday, 7 April 2022, 3:29:55 PM  Service :   abcdefg  Sub Service :   abcdefg        Current Owner','SSC :    abcdefg', 
'Jam']

The string needs to be parse and classify to seperate DataFrame columns:

  • Request
  • String example
  • Service
  • Requested by
  • Requested Date (*and Time)
  • Service
  • Sub Service
  • Current Owner
  • SSC

The problem is there's not even an exact pattern of string which can be use as parameter to split the string.

Here's the code that I use to read the email file, but the issue is there's a nested list because the if condition.

matches = ["Service", "Requested by", "Request Date"]

for file in eml_files:
  with open(file, 'rb') as fp:
    name = fp.name
    msg = BytesParser(policy=policy.default).parse(fp)
  text = msg.get_body(preferencelist=('plain')).get_content()
  file_names.append(name)
  texts.append(text)
  fp.close()

  text = text.split("\n")
  text = [j.strip('\r') for j in text]
  text = [j.strip('\t') for j in text]
  text = [j.strip() for j in text if j.strip()]

  for idx, te in enumerate(text):
    if any(x in te for x in matches):
        text[idx] = re.split('Service :|Requested by : |Request Date : |Service : |Sub Service : | Current Owner|SSC : ', te)
  
  df = pd.DataFrame(text).T
1

There are 1 best solutions below

3
On

As a general gist:

for string in list:
    # Do stuff to the string, the string being list[string], stored as "string"

Due to the nature of your lists you can use the following:

if "Service " in string:
    # Do something
else:
    # Do something else, such as storing it as None or NULL

Although you should be fine with just the loop