python split issue with the spaces

102 Views Asked by At

I am trying to process the Linux output in

Here is my output from Linux:

machine01:/mnt/vlm/log-prod                     machine02:/mnt/machine01_vlm/log-prod                                                    Transferred    17:46:14   Idle
machine01:/mnt/vlm/log-test                     machine02:/mnt/machine01_vlm/log-test                                        Transferred    17:46:14   Idle
machine01:/mnt/wndchl/-                         machine02:/mnt/machine01_wndchl/machine01_wndchl_machine01_wndchl              Transferred    18:36:10   Idle
machine01:/mnt/wndchl/prod                      machine02:/mnt/machine01_wndchl/prod                                         Transferred    18:36:10   Idle
machine01:/mnt/wndchl/test                      machine02:/mnt/machine01_wndchl/test                                         Transferred    18:36:10   Idle
machine01:/mnt/iso/Archive                      machine02:/mnt/iso/Archive                                                  Transferred    19:06:10   Idle
machine01:/mnt/iso/Ready To Transfer            machine02:/mnt/iso/ReadyxToxTransfer                                        Transferred    19:06:10   Idle
machine01:/mnt/iso/-                            machine02:/mnt/iso/iso_machine01_iso                                         Transferred    19:06:10   Idle
machine01:/mnt/it/SCCM                           machine02:/mnt/it/SCCM                                                      Transferred    19:25:51   Idle
machine01:/mnt/it/Windows                        machine02:/mnt/it/Windows                                                   Transferred    19:25:51   Idle
machine01:/mnt/it/-                              machine02:/mnt/it/machine01_it_machine01_it                                   Transferred    19:25:51   Idle
machine01:/mnt/it/dcs                           machine02:/mnt/it/dcs                                                       Transferred    19:25:51   Idle
machine01:/mnt/it/hds_perf_logs                  machine02:/mnt/it/hds_perf_logs                                             Transferred    19:25:51   Idle
machine01:/mnt/legalhold/LegalHold              machine02:/mnt/legalhold/LegalHold                                          Transferred    18:46:06   Idle
machine01:/mnt/legalhold/-                      machine02:/mnt/legalhold/legalhold_machine01_legalhold                       Transferred    18:46:06   Idle

Here is my python script

for x in f.readlines():
output_data = x.split()
#Define variable
source_path = output_data[0]
dest_path = output_data[1]
print "working on....",source_path
relationship = output_data[2]
#We are only interested with hour,split it out!
buffer_time = output_data[3].split(":",1)
relationship_status = output_data[4]
#Get destination nas hostname
dest_nas = output_data[1].split(":",1)
dest_nas_hostname = dest_nas[0]
#Get the exact hour number and convert it into int
extracted_hour = int(buffer_time[0])
if relationship_status == "Idle":
    if extracted_hour > max_tolerate_hour:
        print "Source path         : ",source_path
        print "Destination path    : ",dest_path
        print "Max threshold(hours): ",max_tolerate_hour
        print "Idle (hours)        : ",extracted_hour
        print "======================================================================"

else:
    pass
print "Scan completed!"

Everything seems good but it break when the space from line 7, "Ready To Transfer" screw up the script... I can put try & except , but it didn't solve the problem.

Please let me know what else I can do?

2

There are 2 best solutions below

0
On

You can split based on regular expression. This regex matches more than one space:

>>> import re
>>> s = "machine01:/mnt/iso/Ready To Transfer            machine02:/mnt/iso/ReadyxToxTransfer                                        Transferred    19:06:10   Idle"
>>> re.split('  +', s)
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']

This will still break though if your filename has more than one space. I would suggest using a more tailored regex:

>>> parts = re.search(r'(machine.*)(machine.*)(\s\w+)\s+([0-9:]+)\s+(\w+)', s).groups()
>>> [p.strip() for p in parts]
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']

Edit: that regex broke on "machine02:/mnt/machine01_vlm/log-prod", try this instead

>>> for line in input_lines.split('\n'):
...   parts = re.search(r'(^machine\d\d:.*)(machine\d\d:.*)(\s\w+)\s+([0-9:]+)\s+(\w+)', line).groups()
...   print [p.strip() for p in parts]
... 
['machine01:/mnt/vlm/log-prod', 'machine02:/mnt/machine01_vlm/log-prod', 'Transferred', '17:46:14', 'Idle']
['machine01:/mnt/vlm/log-test', 'machine02:/mnt/machine01_vlm/log-test', 'Transferred', '17:46:14', 'Idle']
['machine01:/mnt/wndchl/-', 'machine02:/mnt/machine01_wndchl/machine01_wndchl_machine01_wndchl', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/wndchl/prod', 'machine02:/mnt/machine01_wndchl/prod', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/wndchl/test', 'machine02:/mnt/machine01_wndchl/test', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/iso/Archive', 'machine02:/mnt/iso/Archive', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/iso/-', 'machine02:/mnt/iso/iso_machine01_iso', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/it/SCCM', 'machine02:/mnt/it/SCCM', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/Windows', 'machine02:/mnt/it/Windows', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/-', 'machine02:/mnt/it/machine01_it_machine01_it', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/dcs', 'machine02:/mnt/it/dcs', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/hds_perf_logs', 'machine02:/mnt/it/hds_perf_logs', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/legalhold/LegalHold', 'machine02:/mnt/legalhold/LegalHold', 'Transferred', '18:46:06', 'Idle']
['machine01:/mnt/legalhold/-', 'machine02:/mnt/legalhold/legalhold_machine01_legalhold', 'Transferred', '18:46:06', 'Idle']

Here's a link to the Python re module docs

A good tool for experimenting with regular expressions is https://www.debuggex.com/

0
On
import re

LOG_FMT = re.compile('(\w+):(.*?)\s+(\w+):(.*?)\s+(\w+)\s+(\d+):(\d+):(\d+)\s+(\w+)')
max_tolerate_hours = 19.2

def main():
    with open('my.log') as inf:
        for row in inf:
            match = LOG_FMT.match(row)
            if match is not None:
                src_machine, src_path, dest_machine, dest_path, rel, hh, mm, ss, status = match.groups()
                hh, mm, ss = int(hh), int(mm), int(ss)
                hours = hh + (mm / 60.) + (ss / 3600.)
                if status == 'Idle' and hours > max_tolerate_hours:
                    print('Source path         : {}'.format(src_path))
                    print('Destination path    : {}'.format(dest_path))
                    print('Max threshold (h)   : {:0.2f}'.format(max_tolerate_hours))
                    print('Idle (h)            : {:0.2f}'.format(hours))
                    print('=========================================================')
    print('Scan completed!')

if __name__=="__main__":
    main()

run against your given data returns

Source path         : /mnt/it/SCCM
Destination path    : /mnt/it/SCCM
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/Windows
Destination path    : /mnt/it/Windows
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/-
Destination path    : /mnt/it/machine01_it_machine01_it
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/dcs
Destination path    : /mnt/it/dcs
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/hds_perf_logs
Destination path    : /mnt/it/hds_perf_logs
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Scan completed!