What is an efficient algorithm to identify multi-degree email chains in a mock company network?

76 Views Asked by At

In a hypothetical company scenario where there are num_employees employees and a total of num_emails emails exchanged, each employee adheres to a policy of sending at most one email to another employee. Furthermore, employees are prohibited from sending emails to multiple recipients simultaneously, adhering strictly to a one-to-one correspondence.

To simulate this scenario, I've set up a test environment using the Python and the Faker library:

from faker import Faker
from random import Random
rand = Random()
fake = Faker().unique

num_employees = 200
num_emails = 2000

employees = [fake.email() for _ in range(num_employees)]

emails = []
for i in range(num_emails):
    email = (rand.choice(employees), rand.choice(employees))
    # assume that all email sender and recipient addresses are unique
    while email in emails:
        email = (rand.choice(employees), rand.choice(employees))

    emails.append(email)

It's important to note that the index of the emails list represents the chronological order of the sent emails.

To identify replies, I've developed the following code:

received_a_reply = {}
for i, (sender, recipient) in enumerate(emails):

    if (recipient, sender) in emails[i:]:
        received_a_reply[i] = [sender, recipient]

print(f"{round(len(received_a_reply)/len(emails) * 100)}% of emails are replies")
print(f"{len(emails)} emails")
print(f"{len(received_a_reply)} emails received a reply")
print(f"{len(emails) - len(received_a_reply)} emails weren't in a thread")

This code efficiently identifies replies up to the second degree of connection. However, I'm now interested in detecting loops extending to the third degree (e.g., Matthew sends an email to Mark, who forwards it to John, and John replies back to Matthew).

Given that my actual dataset comprises approximately 4,000 "employees" and 40,000 "emails", I need to optimize both time and memory usage.

Is there a method to identify loops extending beyond the second degree without exhaustively generating all possible combinations and then filtering out non-loop instances?

0

There are 0 best solutions below