Why is input file being read as a list after being put through for loop after being passed by argparse

38 Views Asked by At

Im attempting to make a script that will take a file and 2 additional arguments which will then use the start_point and end_point arguments to extract the text in between.

However, when running this I am receiving the error (line 35) "TypeError: can only concatenate str (not "list") to str". I don't understand this as the input file is being passed to a for loop, where each line should be read from the input file, the regex query performed on the line, and then the string printed out / appended to the file.

import re
import argparse
#import requests

parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('--input','-i',
    type = str,
    nargs = '?',
    dest = 'input_file',
    help='Input file name.'
)
parser.add_argument('--start','-s',
    type = str,
    nargs = '+',
    dest = 'start_point',
    help='The string (within quotes) you want to search from.'
)
parser.add_argument('--end','-e',
    type = str,
    nargs = '+',
    dest = 'end_point',
    help='The string (within quotes) you want to search up to.'
)

args = parser.parse_args()

fileName = args.input_file
start_string = args.start_point
end_string = args.end_point

content = open(fileName,'r')
for line in content:
   result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()

I have reviewed the docs for argparse and tried using the different methods to read the file such as setting the type of the input_file argument to 'argparse.FileType('r')' and then using (args.input_file.readlines()) and setting that to the 'content' variable. However, I think I must be misunderstanding as everything I've looked at online suggests this should work.

On my previous version of this script where I'm not using flags and just positional arguments it works as expected, however i want to expand the functionality of it so i can pass a URL and have it work directly on web pages too.

Full error message

$python3 betweeny_grabber2.py -i test -s '.asp">' -e '</a></td>'
Traceback (most recent call last):
  File "/home/george/Tools/Scripts/Python/betweeny_grabber2.py", line 35, in <module>
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
                   ~~~~~~^~~~~~~~~~~~~
TypeError: can only concatenate str (not "list") to str

Previous version

import re
import argparse


parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('input', type=str, help='Input file name.')
parser.add_argument('start_point', type=str, help='The string (within quotes) you want to search from.')
parser.add_argument('end_point', type=str, help='The string (within quotes) you want the search to end at.')
args = parser.parse_args()

input_file = args.input
start_string = args.start_point
end_string = args.end_point

content = open(input_file,"r")
for line in content:
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()
1

There are 1 best solutions below

0
Hai Vu On BEST ANSWER

For a command line of

-i test -s '.asp">' -e '</a></td>'

args is

Namespace(input_file='test', start_point=['.asp">'], end_point=['</a></td>'])

Note that start_point and end_point are lists, not strings; and that is why you got that error. To fix this issue, you need to fix up the arguments to remove the nargs. You don't have to specify type=str because that is the default.

parser.add_argument("--input", "-i", dest="input_file", help="Input file name.")
parser.add_argument(
    "--start",
    "-s",
    dest="start_point",
    help="The string (within quotes) you want to search from.",
)
parser.add_argument(
    "--end",
    "-e",
    dest="end_point",
    help="The string (within quotes) you want to search up to.",
)