get email or other contact information from craigslist using python request

674 Views Asked by At

I am trying to figure out a way I can get email or tel from craigslists using python.

I have used python-craigslist to get the post but there is nothing I am able to find regarding emails or other contact info

I tried this:

import requests

url = "https://chandigarh.craigslist.org/reply/ixc/hum/7220389776/mailto"

head = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Content-Length": "344",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"DNT": "1",
"Host": "chandigarh.craigslist.org",
"Origin": "https://chandigarh.craigslist.org",
"Referer": "https://chandigarh.craigslist.org/hum/d/hr-outsourcing-company-in-mohali/7220389776.html",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
post_data = {"n": "U2FsdGVkX184MDg3MTgwOF5nEvY336v771unnxU7b9fc52-DzxhmmxcCYwQ6uylAsvUK2atZ1Ot3zWsSF4ukqvM9BMFMnNA_L00i0jQ5DhiZkfobQq1avkovyPJ3IcQbWM4327VdEQUipMzU6XfOXn5xsLqQ9Tt-L1qJdM55e2Ac11nzeaFCRV7HgpYmmIdrjpESKZpp0dhTh2p5d826f9CSBa4ldNRg0pLswm5P3JXaYGTe4Z7Fe5NB1Jfs3-CBWFdy2ZzqIA345q_YfXUatIMoq1TwN3lc_ee8rKnLKJmQwPHPpLoQHRP9aioeMOBv17okylBLm8uhduZ6HawCRg"}

resp = requests.post(url, headers=head,data= post_data)

print(resp.text)

but no response

2

There are 2 best solutions below

1
FilipA On

For scraping Craigslist, use the pyquery Python package: https://pypi.python.org/pypi/pyquery

For a regex for email addresses / phone numbers, see the examples on this page: http://www.regular-expressions.info/

For storing the email addresses you can just output to csv. You can read on how to do it there: https://docs.python.org/2/library/csv.html

If you wouldd like to use this emails and send messages to them you might want to check this addon: https://addons.mozilla.org/en-US/thunderbird/addon/mail-merge/

Also I advise you to stay on the bright side and read more there: http://en.wikipedia.org/wiki/Morality

3
Raja Muhammad Saad On

Following is my code I got the email successfully.

  1. getting the email ifram to get this link enter image description here

  2. solving the captcha using 2captch

  3. sending the solved captcha response to get the next token

  4. than sending post request to with that token to mailto to get the email.

  5. Also it uses different headers for each request so I hardcoded different headers for each request.

    resp = r.post(url, headers=head, data=post_data, allow_redirects=True)
    asd = resp.json()
    capin = str(asd["nonce"])
    print("Capcha In Request Successfull...")
    
    solver = TwoCaptcha("API_KEY")
    
    try:
        print("Solving Captcha Using 2Captcha")
        result = solver.hcaptcha(
            sitekey='0c3a1de8-e8df-4e01-91b6-6995c4ade451',
            url=ifram_url
        )
    
    except Exception as e:
        print(e)
    
    else:
        captcha = result["code"]
        print("Captcha Solved Successfully")
    
    post_data1 = {"h-captcha-response": str(captcha),
                  "n": capin
                  }
    url1 = capt_link
    c_len = len(captcha)+len(capin)+22
    head1 = {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
        "Connection": "keep-alive",
        "Content-Length": str(c_len),
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Cookie": cookie,
        "DNT": "1",
        "Host": host,
        "Origin": origin,
        "Referer": main_url,
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    
    resp = r.post(url1, headers=head1, data=post_data1, allow_redirects=True)
    bsd = resp.json()
    capout = str(bsd["nonce"])
    print("Captcha Out Request Successfull...")
    
    c_len1 = len(capout)+2
    
    post_data2 = {"n": capout}
    url2 = milto_link
    head2 = {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
        "Connection": "keep-alive",
        "Content-Length": str(c_len1),
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Cookie": cookie,
        "DNT": "1",
        "Host": host,
        "Origin": origin,
        "Referer": main_url,
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    
    resp = r.post(url2, headers=head2, data=post_data2, allow_redirects=True)
    
    defg = resp.json()
    soup = BeautifulSoup(defg["email"], "html.parser")
    email = soup.find("input").get("value")
    print(email)
    Final_Email.append(email)