I am trying to pull out file names from a specifically formatted document, and put them into a list. The document contains a large amount of information, but the lines I am concerned about look like the following with "File Name: " always at the start of the line:
File Name: C:\windows\system32\cmd.exe
I tried the following:
xmlfile = open('my_file.xml', 'r')
filetext = xmlfile.read()
file_list = []
file_list.append(re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', filetext))
This makes file_list
look like:
[['File Name: c:\\windows\\system32\\file1.exe',
'File Name: c:\\windows\\system32\\file2.exe',
'File Name: c:\\windows\\system32\\file3.exe']]
I'm looking for my output to simply be:
(file1.exe, file2.exe, file3.exe)
I also tried using ntpath.basename
on my above output, but it looks like it wants a string as input and not a list.
I'm very new to Python and scripting in general, so any suggestions would be appreciated.
You're on the right track. The reason
basename
wasn't working was becausere.findall()
returns a list which was being put into yet another list. Here's a fix for that which iterates through that list returned and creates another with just the base file names in: