I have a JSON file that I am parsing through in an attempt to see if a domain is live.
The code I have is the following:
for i in range(len(json_data)):
print(i)
if int(json_data[i]['response']['result_count'])>0:
for j in range(len(json_data[i]['response']['matches'])):
try:
socket.gethostbyname(json_data[i]['response']['matches'][j]['domain'] )
except:
del json_data[i]['response']['matches'][j]['domain']
I have attempted to use multithreading in the following form:
def run_half():
for i in range(0,round(len(data_json)/2)):
print(i) # make this len(data_json) if NOT testing, range(10) if testing
if int(data_json[i]['response']['result_count'])>0:
for j in range(len(data_json[i]['response']['matches'])):
try:
socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
except:
del data_json[i]['response']['matches'][j]['domain']
def run_half_2():
for i in range(round((len(data_json)/2))+1,len(data_json)):
print(i) # make this len(data_json) if NOT testing, range(10) if testing
if int(data_json[i]['response']['result_count'])>0:
for j in range(len(data_json[i]['response']['matches'])):
try:
socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
except:
del data_json[i]['response']['matches'][j]['domain']
t1 = threading.Thread(target=run_half(),args=(10,))
t2= threading.Thread(target=run_half_2(),args=(10,))
t1.start()
t2.start()
t1.join()
t2.join()
For some reason, I have not noticed a change in the time to compute.
Any advice or suggestions would be greatly appreciated. Thank you!
Yes, threading useful here as this is a network/IO bound task.
Rather than splitting the work into groups as above, a better approach is to treat each host name check as an individual task and fan-out the execution out to number of workers.
I'd suggest that you use the thread pool executor provided by the python standard library to achieve this.
https://docs.python.org/3/library/concurrent.futures.html
The concept being that you fan-out each long running task into a future, and then fan-in to collect all the results.
e.g,