determine availability (and price?) of 50k domains

80 Views Asked by At

I have a list of 50k possible domain names. I'd like to find out which ones are available and if possible how much they cost. the list looks like this

presumptuous.ly
principaliti.es
procrastinat.es
productivene.ss
professional.ly
profession.ally
professorshi.ps
prognosticat.es
prohibitioni.st

I've tried whois but that runs way too slow to complete in the next 100 years.

def check_domain(domain):
try:
    # Get the WHOIS information for the domain
    w = whois.whois(domain)
    if w.status == "free":
        return True
    else:
        return False
except Exception as e:
    print("Error: ", e)
    print(domain+" had an issue")
    return False

def check_available(matches):
    print('checking availability')
    available=[]
    for match in matches:
        if(check_domain(match)):
            print("found "+match+" available!")
            available.append(match)
    return available

I've also tried names.com/names bulk upload tool but that doesn't seem to work at all.

How do I determine the availability of these domains?

1

There are 1 best solutions below

2
Andrej Kesely On BEST ANSWER

You can use for example multiprocessing package to speed-up the process, i.e.:

import os
import sys
from multiprocessing import Pool

import pandas as pd
from tqdm import tqdm
from whois import whois


# https://stackoverflow.com/a/8391735/10035985
def blockPrint():
    sys.stdout = open(os.devnull, "w")


def enablePrint():
    sys.stdout = sys.__stdout__


def check_domain(domain):
    try:
        blockPrint()
        result = whois(domain)
    except:
        return domain, None
    finally:
        enablePrint()
    return domain, result.status


if __name__ == "__main__":
    domains = [
        "google.com",
        "yahoo.com",
        "facebook.com",
        "xxxnonexistentzzz.domain",
    ] * 100

    results = []
    with Pool(processes=16) as pool:  # <-- select here how many processes do you want
        for domain, status in tqdm(
            pool.imap_unordered(check_domain, domains), total=len(domains)
        ):
            results.append((domain, not bool(status)))

    df = pd.DataFrame(results, columns=["domain", "is_free"])
    print(df.drop_duplicates())

Prints:

100%|██████████████████████████████████████████████| 400/400 [00:07<00:00, 55.67it/s]

                      domain  is_free
0   xxxnonexistentzzz.domain     True
5               facebook.com    False
11                google.com    False
14                 yahoo.com    False

You can see it checks ~55 domains per second.