How comparing list elements against database list before sending out message?

91 Views Asked by At

I created a script which scrapes craiglist for specific listings, creates a list based on title, image link, and listing href. From there I use twilio to send myself a text message with the data. That all works great but I want to be able to save the list to a file and then compare the new data to the file and only text me if there is new information. I am having trouble wrapping my head around the proper way to do this.

Im confident this is something people do all the time but Im not finding the right info that makes it click for me conceptually.

2

There are 2 best solutions below

0
rnorris On

I think that your title has the answer: use a database. A simple way to accomplish your goal would be to set up a table that uses the listing URL as the primary key of the table, and constrain the table so that this value must be unique (if your preferred DB doesn't already require a primary key to be unique). For simplicity here, I will assume that you will go with sqlite3, since it is easy to get started with, and has good python support with extensive documentation.

Also, for simplicity, I will assume that you have two processes: one that scans listings and adds them to your database, and one process that scans for new entries and sends them as notifications.

From here, there are a number of approaches you could take in order to accomplish your goal of only sending new information. If you only have one process scanning your database and sending notifications, it's simple enough to add a column that keeps track of whether or not a particular listing has been sent to you or not. As an outline, you could define a table that has the columns:

CREATE TABLE listings(
    url TEXT PRIMARY KEY,
    title TEXT,
    image_link TEXT,
    sent_to_notifications INT);

Since SQLite doesn't support booleans natively, you could just use 1/0 as True/False in the sent_to_notifications field. Now, any time you want to scan for new listings in your database, you can get a list of all of them with something like: SELECT * FROM listings WHERE sent_to_notifications=0;. Then, after sending the notification for a particular entry, UPDATE listings SET sent_to_notifications=1 WHERE url="url_that_was_just_sent";. You can adjust this to update the entire batch at once, of course, but I'm just providing one possible outline on how to attack a problem like this.

0
inzel On

I ended up finding a simple way of doing this:

list = []
with open("listing.txt", "r+") as f:
    pre_check_list = f.read()
final_list = []

for h in listing_soup.find_all('a', {"class": "result-title 
hdrlnk"}, limit = 5):
    link = h.get('href')
    title = h.text
    if link not in pre_check_list:
        list.append(title)
        list.append(link + '\n')

final_list = '\n'.join(list)

with open("listing.txt", "a") as f:
    f.writelines(final_list)
    f.close()

I basically set a variable as the file contents, gather new data, compare it to the existing data, and append it if its new.