I am looking for data containing specific words through snorkel(https://www.snorkel.org/use-cases/01-spam-tutorial) from dataframe df.
df
not_matter | Text |
---|---|
111 | hallo Apple |
222 | Berry and bb |
333 | bb and Candy |
Now i have a pandas dataframe df_wordlist where column_1 and 2 are different words and column_3 is a combination of columns 1 and 2.
df_wordlist
Column_1 | Column_2 | Column_3 |
---|---|---|
aa | Apple | aa_Apple |
aa | Berry | aa_Berry |
aa | Candy | aa_Candy |
bb | Apple | bb_Apple |
bb | Berry | bb_Berry |
bb | Candy | bb_Candy |
I now need to define different label functions, and I want the names of these functions to be the values in column_3, and the contents of the funtion to be the values in column_1 and column_2.
@labeling_function()
def aa_Apple(x):
return FOERD if re.search(r"\b(?=.*aa.*)(?=.*Apple.*)\b|\b(?=.*Apple.*)(?=.*aa.*)\b", df.Text, flags=re.I) else ABSTAIN
@labeling_function()
def aa_Berry(x):
return FOERD if re.search(r"\b(?=.*aa.*)(?=.*Berry.*)\b|\b(?=.*Berry.*)(?=.*aa.*)\b", df.Text, flags=re.I) else ABSTAIN
.......the other 3 functions.....
@labeling_function()
def bb_Candy(x):
return FOERD if re.search(r"\b(?=.*bb.*)(?=.*Candy.*)\b|\b(?=.*Candy.*)(?=.*bb.*)\b", df.Text, flags=re.I) else ABSTAIN
I tried to do this with loop but it didn't work.
for i in range(len(df_wordlist)):
label_name = str(df_wordlist.iloc[i,-1])
label_word1 = str(df_wordlist.iloc[i,0])
label_word2 = str(df_wordlist.iloc[i,1])
@labeling_function()
def label_name(x):
return FOERD if re.search(r"\b(?=.*label_word1.*)(?=.*label_word2.*)\b|\b(?=.*label_word2.*)(?=.*label_word1.*)\b", df, flags=re.I) else ABSTAIN
I want through a loop generate so many label functions, like the length of the df_wordlist.
Latter I need to put all the functions in a list to be invoked, like:
function_ls = [aa_Apple, aa_Berry, bb_Candy]
In a loop, it should be:
for i in range(len(df_wordlist)):
label_name = str(df_wordlist.iloc[i,-1])
function_ls= []
function_ls = function_ls.append(label_name)
return function_ls