I have a big threaded feed retrieval script in python.
My question is, how can I load balance outgoing requests so that I don't hit any one host too often?
This is a big problem for feedburner, since a large percentage of sites proxy their RSS through feedburner and to further complicate matters many sites will alias a subdomain on their domain to feedburner to obscure the fact that they're using it (e.g. "mysite" sets its RSS url to feeds.mysite.com/mysite, where feeds.mysite.com bounces to feedburner). Sometimes it blocks me for awhile and redirects to their "automated requests" error page.
If your problem is related to Feedburner "throttling you", it most certainly does this because of the source IP of your bot. The way to "load balance to Feedburner" would be to have multiple different source IPs to start from.
Now there are numerous ways to achieving this, 2 of them being:
Of course, don't you go a put a NAT box in front of them now ;-)
The above takes care of the possible "throttling problems", now for the "scheduling part". You should maintain a "virtual scheduler" per "destination" and make sure not to exceed the parameters of the Web Service (e.g. Feedburner) in question. Now, the tricky part is to get hold of these "limits"... sometimes they are advertised and sometimes you need to figure them out experimentally.
I understand this is "high level architectural guidelines" but I am not ready to be coding this for you... I hope you forgive me ;-)