Python / Updating panda row with new column value from function

1.1k Views Asked by At

Python 2.7 I am trying to write the result of a "robot check" (although I suppose this applies in other circumstances) where I am iterating over a data frame. I have tried

import robotparser
import urlparse
import pandas as pd
df = pd.DataFrame(dict(A=['http://www.python.org'
                          ,'http://www.junksiteIamtellingyou.com'
                         ]))

df
    A
0   http://www.python.org
1   http://www.junksiteIamtellingyou.com

agent_name = 'Test'
for i in df['A']:
    try:
        parser = robotparser.RobotFileParser()
        parser.set_url(urlparse.urljoin(i,"robots.txt"))
        parser.read()
    except Exception as e:
        df['Robot'] =  'No Robot.txt'
    else:
        df['Robot'] =  parser.can_fetch(agent_name, i)
df
    A                                       Robot
0   http://www.python.org                   No Robot.txt <<<-- NOT CORRECT
1   http://www.junksiteIamtellingyou.com    No Robot.txt

What is happening, of course, is the last value of the iteration is writing over the entire column of values. The value of Robot should be 'True' (which can be demonstrated by deleting the junk URL from the data frame.

I have tried some different permutations of .loc, but can't get them to work. They always seem to add rows as opposed to update the new column for the existing row.

So, is there a way to specify the column being updated (with the function result)? Perhaps using .loc(location), or perhaps there is another way such as using lambda? I would appreciate your help.

1

There are 1 best solutions below

2
On

There is an apply for that:

import robotparser
import urlparse
import pandas as pd
df = pd.DataFrame(dict(A=['http://www.python.org'
                          ,'http://www.junksiteIamtellingyou.com']))

def parse(i, agent_name):
    try:
        parser = robotparser.RobotFileParser()
        parser.set_url(urlparse.urljoin(i, "robots.txt"))
        parser.read()
    except Exception as e:
        return 'No Robot.txt'
    else:
        return parser.can_fetch(agent_name, i)

df['Robot'] = df['A'].apply(parse, args=('Test',))