Slow PHP Routine

165 Views Asked by At
<pre>

<?php

$newline = "\n";

$hit = 0;
$id = "id";
while ($hit < 10) { 
    $a = rand(0, 255);
    $b = rand(0, 255);
    $c = rand(0, 255);
    $d = rand(0, 255);

    $name = gethostbyaddr("$a.$b.$c.$d");

    if (!strpos($name, $id) === false) {
        print "  " . "<a href=$name>$name</a>" . $newline;
        $hit = $hit + 1;
    }
}

print $newline;
print "Copyright Search Engine" . $newline;
?>

This is a Little Search Engine, for the Private Entrepreneur, or it Would be, if it Worked.
The Code Executes, it is Just Incredible Slow. Does Anybody Know why.

In Case the Code is Value, and the Title is NeoSearch, the Sales Conditions, are these.

Information
Order

10% Promille
One Million Dollars US

Rex:.

2

There are 2 best solutions below

0
On

Many sites don't have their reverse DNS set up properly, so calling gethostbyaddr() will be slow if you hit any of those addresses.

Also, you should probably limit $a to rand(0, 223). Anything outside this range is multicast addresses, not useful for a search engine.

Even fixing this, your code will necessarily be slow. You're looking for addresses that resolve to names with id in them. The vast majority of names don't fit that pattern, so you'll have to test thousands of names before you get 10 that you want.

This is not how real search engines work, they don't look up random IPs. They start with a set of well known known web pages, and then follow links in all the pages to find other sites.

0
On

I am not sure what it is that you are trying to accomplish here but having a look at you code and testing it locally I can tell you that you are having to search through a lot of hosts to find ones that contain 'id' as the hostname for most IPv4 addresses are the IP address themselves.

The other problem that you are facing is that you are doing a single thread and this means that you will request 1 IP address host name and then wait for the return response.

If you would like to see what your code is doing try running the modified code below.

<?php

$newline = "\n";

$hit = 0;
$id = "id";
$hosts = 0;
while ($hit < 10) {
    $a = rand(0, 255);
    $b = rand(0, 255);
    $c = rand(0, 255);
    $d = rand(0, 255);

    $ip = "$a.$b.$c.$d";

    $name = gethostbyaddr("$a.$b.$c.$d");
    print($name . $newline);

    if (!strpos($name, $id) === false) {
        print "  " . "<a href=\"$name\">$name</a>" . $newline;
        $hit++;
    }

    $hosts++;
    print($hosts . $newline);
}

print $newline;
print "Copyright Search Engine" . $newline;
?>

I stopped at just over 500 hosts and it was still going with each host taking about a second each. This means that to crawl all 14+ billion IPv4 addresses will take forever not to mention that you are likely to get repeats with the use of random.

Happy to help you more to accomplish what it is that you are trying to do if you let us know