*CSV contains fake information
Python on Kaggle w/ Pandas + Numpy
Context: Goal is to import/update Exchange contacts. Users want to search for parent email by their student(s).
Input (see CSV screenshot):
- 'Student ID' column (7-digit number) - 'Parent ID' column (7-digit number) - 'Student First Name' column (str) - 'Student Last Name' column (str) - 'Parent First Name' column (str) - 'Parent Last Name' column (str)
Desired Output (see CSV screenshot):
- 'Contacts' column *'Contacts' column follows this pattern: - One Student: "studentLastName, studentFirstName (parentLastName, parentFirstName)" - Multiple Students: "studentLastName, student1FirstName, student2FirstName, & student3FirstName (parentLastName,> parentFirstName)"

Assumptions:
- use student/parent IDs to evaluate conditions
- Many students/guardians share last names and are not related
- Many students have >2 guardians
- Many students do not share last names with one or more guardians
- Some students are siblings with different legal last names
#1. What I tried: Nested for loop with an if statement
for i in range(14):
for j in range(14):
if contacts.iloc[i]['parentID'] == contacts.iloc[j]['parentID'] and i!=j:
hold = [contacts.iloc[i]['studentFirstName'], contacts.iloc[j]['studentFirstName']]
print(hold)
#What I was expecting: list of tuples of student first names of siblings where the corresponding parentIDs match in different rows
#Example Output:
[[Uganda, Tiramisu], [Uganda, Tiramisu], [Ethernet, Persimmon], [Ethernet, Niagara], [Persimmon, Niagara]]
#I am new to DataFrames. I want to treat everything like lists.
#2.What I tried: Nested for loop with numpy.where():
for i in range(14):
for j in range(14):
print(i, j)
print(contacts.iloc[i]['parentID'], contacts.iloc[j]['parentID'])
np.where(contacts.iloc[i]['parentID'] == contacts.iloc[j]['parentID'],
print(contacts.iloc[i]['studentFirstName'] + ' and ' + contacts.iloc[j]['studentFirstName']), print('1'))
#What I was expecting: student1FirstName and student2FirstName where parentIDs match, else 1
#I don't understand broadcasting, and feel like that and/or printing results is breaking this.
#Considering alternatives: pd.agg(), pd.groupby(), lambda functions, list comprehension
#I am a beginner to Pandas/Numpy/Python. When practicing I think more easily with for loops, then try to convert them to list comprehension. It seems there are many ways to solve this problem.
This is my understanding of what you are asking for: