I am trying to pull out file names from a specifically formatted document, and put them into a list. The document contains a large amount of information, but the lines I am concerned about look like the following with "File Name: " always at the start of the line:
File Name: C:\windows\system32\cmd.exe
I tried the following:
xmlfile = open('my_file.xml', 'r')
filetext = xmlfile.read()
file_list = []
file_list.append(re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', filetext))
This makes file_list look like:
[['File Name: c:\\windows\\system32\\file1.exe',
'File Name: c:\\windows\\system32\\file2.exe',
'File Name: c:\\windows\\system32\\file3.exe']]
I'm looking for my output to simply be:
(file1.exe, file2.exe, file3.exe)
I also tried using ntpath.basename on my above output, but it looks like it wants a string as input and not a list.
I'm very new to Python and scripting in general, so any suggestions would be appreciated.
You can get the expected output with following regular expression:
([^\\]*)will capture everything except a slash after final path separator until\nis encountered, see online example. Sincefindallalready returns a list there's no need to append the return value to existing list.