Text File
• I.D.: AN000015544
DESCRIPTION: 6 1/2 DIGIT DIGITAL MULTIMETER
MANUFACTURER: HEWLETT-PACKARDMODEL NUM.: 34401A CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY45027398
• I.D.: AN000016955
DESCRIPTION: TEMPERATURE CALIBRATOR
MANUFACTURER: FLUKE MODEL NUM.: 724 CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: 1189063
• I.D.: AN000017259
DESCRIPTION: TRUE RMS MULTIMETER
MANUFACTURER: AGILENT MODEL NUM.: U1253A CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY49420076
• I.D.: AN000032766
DESCRIPTION: TRUE RMS MULTIMETER
MANUFACTURER: AGILENT MODEL NUM.: U1253B CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY5048 9036
Objective
Seeking to find a more efficient algorithm for parsing the manufacturer name and number. i.e 'HEWLETT-PACKARDMODEL NUM.: 34401A', 'AGILENT MODEL NUM.: U1253B'...etc. from the text file above.
Data Structure
parts_data = {'Model_Number': []}
Code
with open("textfile", 'r') as parts_info:
linearray = parts_info.readlines(
for line in linearray:
model_number = ''
model_name = ''
if "MANUFACTURER:" in line:
model_name = line.split(':')[1]
if "NUM.:" in line:
model_number = line.split(':')[2]
model_number = model_number.split()[0]
model_number = model_name + ' ' + model_number
parts_data['Model_Number'].append(model_number.rstrip())
My code does exactly what I want, but I think there is a faster or cleaner way to complete the action.Let's increase efficiency!
Your code looks fine already and unless you're parsing more than GB's of data I don't know what the point of this is. I thought of a few things.
If you remove the
linearray = parts_info.readlines(line Python understands just using a for loop with an open file so that'd make this whole thing streaming in case your file's huge. Currently that line of code will try reading the entire file into memory at once, rather than going line by line, so you'll crash your computer if you have a file bigger than your memory.You can also combine the if statements and do 1 conditional since you seem to only care about having both fields. In the interest of cleaner code you also don't need
model_number = ''; model_name = ''Saving the results of things like
line.split(':')can help.Alternatively, you could try a regex. It's impossible to tell which one is going to perform better without testing both, which brings me back to what I was saying in the beginning: optimizing code is tricky and really shouldn't be done if not necessary. If you really, really cared about efficiency you would use a program like
awkwritten in C.