Extract Hypernym Relationships using Hearst Patterns from Text

60 Views Asked by At

I am new to Natural language processing. I'm working on a NLP task and need assistance with a specific problem. I have a list of noun phrases (NPs) and a text string (S) that contains arbitrary text and symbols. My goal is to extract hypernym relationships from S based on Hearst patterns while ensuring that the hypernyms link only the noun phrases present in NPs.

Here are the details of the problem:

NPs: A list of strings [np1, np2, …], where each string is a lowercase noun phrase. I can assume that no noun phrase in this list is a substring of another.

S: A string containing arbitrary text and symbols.

I want to use regular expressions to find all hypernyms in S that link only the noun phrases in NPs. Specifically, I'd like to use the Hearst patterns described earlier in the question. Noun phrases may appear capitalized in S.

Additionally, noun phrases in NPs will not have the indefinite article (a, an), but it may appear in the string S. In such cases, I need to extract the hypernym relationship. For example, if "dog" and "mammal" appear in NPs, but S contains "a dog is a mammal," then I want to extract the relationship ('mammal', 'dog').

My objective is to create a set containing each hypernym as a tuple of strings (x, y), where x is a hypernym of y in S, and both x and y appear exactly as they appeared in NPs.

I'd appreciate any guidance, code snippets, or suggestions on how to approach this problem efficiently and effectively. Thank you!

Example Input: (['hemingway', 'bibliophile', 'author', 'william faulkner', 'mark twain'], "Hemingway was an author of many classics. But also, Hemingway was a bibliophile, having read the works of every other famous American author, such as William Faulkner and Mark Twain.") Output: {('author', 'hemingway'), ('bibliophile', 'hemingway'), ('author', 'william faulkner'), ('author', 'mark twain')}

I tried to implement regex like this r'(\w+)\s+(?:is|are)\s+(?:a\s+type\s+of|a\s+kind\s+of)\s+(\w+)'. But do I have to add many patterns like this to get X is a type of Y? Then how do I do this example like X, including a,b,c,....,z where the hypernym-hyponym output would be (X,a),(X,b),....,(X,z).

0

There are 0 best solutions below