I am working on pulling data from OpenCalais API and here are the details:
Input: Some paragraph (a string e.g. "Barack Obama is the President of United States." Also, what gets returned is some instance variables with offsets and lengths but not necessarily in order of occurrence.
Output (I want): Same string but with the identified entity instances with hyperlinks (which is also a string) i.e.
output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"
BUT IT IS A PYTHON QUESTION REALLY.
This is what I have
#API CALLS ABOVE WHICH IS NOT RELEVANT.
output=input
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
previdx=0
idx=0
for y in range(0,len(result.entities[x]["instances"])):
try:
url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']
except:
url="https://en.wikipedia.org/wiki/"+result.entities[x] ["name"].replace(" ", "_")
print "Generating wiki page link"
print url+"\n"
#THE PROBLEM STARTS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
output=output[:offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]
print output
Now the issue is, if you read the code properly you'll know that after the first iteration, the output string changes - therefore for subsequent iterations, the offset values no longer applies in the same manner. So, I cannot make the expected change.
Basically trying to get:
input = "Barack Obama is the President of United States"
output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>."
How can it be done, I wonder. Tried splicing n dicing but string just gets garbled.
I finally solved it. Took some major math logic to do but as my last comment with the intuition that - "Maybe a solution can be storing the {offset, length} tuples in an array and then sort it on the offset values and THEN run the loop. Any help making that structure?" - THAT DID THE TRICK.
And WALLAH!:) - Thanks for the help folks. Hope it helps someone someday.