We are working on a fun project with a friend and we have to execute hundreds of HTTP requests, all using different proxies. Imagine that it is something like the following:
for (int i = 0; i < 20; i++)
{
HttpClientHandler handler = new HttpClientHandler { Proxy = new WebProxy(randomProxy, true) };
using (var client = new HttpClient(handler))
{
using (var request = new HttpRequestMessage(HttpMethod.Get, "http://x.com"))
{
var response = await client.SendAsync(request);
if (response.IsSuccessStatusCode)
{
string content = await response.Content.ReadAsStringAsync();
}
}
using (var request2 = new HttpRequestMessage(HttpMethod.Get, "http://x.com/news"))
{
var response = await client.SendAsync(request2);
if (response.IsSuccessStatusCode)
{
string content = await response.Content.ReadAsStringAsync();
}
}
}
}
By the way, we are using .NET Core (Console Application for now). I know there are many threads about socket exhaustion and handling DNS recycling, but this particular one is different, because of the multiple proxy usage.
If we use a singleton instance of HttpClient, just like everyone suggests:
- We can't set more than one proxy, because it is being set during HttpClient's instantiation and cannot be changed afterwards.
- It doesn't respect DNS changes. Re-using an instance of HttpClient means that it holds on to the socket until it is closed so if you have a DNS record update occurring on the server the client will never know until that socket is closed. One workaround is to set the
keep-alive
header tofalse
, so the socket will be closed after each request. It leads to a sub-optimal performance. The second way is by usingServicePoint
:
ServicePointManager.FindServicePoint("http://x.com")
.ConnectionLeaseTimeout = Convert.ToInt32(TimeSpan.FromSeconds(15).TotalMilliseconds);
ServicePointManager.DnsRefreshTimeout = Convert.ToInt32(TimeSpan.FromSeconds(5).TotalMilliseconds);
On the other hand, disposing HttpClient (just like in my example above), in other words multiple instances of HttpClient, is leading to multiple sockets in TIME_WAIT
state. TIME_WAIT indicates that local endpoint (this side) has closed the connection.
I'm aware of SocketsHttpHandler
and IHttpClientFactory
, but they can't solve the different proxies.
var socketsHandler = new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(10),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(5),
MaxConnectionsPerServer = 10
};
// Cannot set a different proxy for each request
var client = new HttpClient(socketsHandler);
What is the most sensible decision that can be made?
First of all, I want to mention that @Stephen Cleary's example works fine if the proxies are known at compile-time, but in my case they are known at runtime. I forgot to mention that in the question, so it's my fault.
Thanks to @aepot for pointing out those stuff.
That's the solution I came up with (credits @mcont):
A proxy per request means an additional socket for each request (another HttpClient instance).
In the solution above,
ConcurrentDictionary
is used to store the HttpClients, so I can reuse them, which is the exact point of HttpClient. I could use same proxy for 5 requests, before it gets blocked by API limitations. I forgot to mention that in the question as well.As you've seen, there are two solutions solving socket exhaustion and DNS recycling:
IHttpClientFactory
andSocketsHttpHandler
. The first one doesn't suit my case, because the proxies I'm using are known at runtime, not at compile-time. The solution above uses the second way.For those who have same issue, you can read the following issue on GitHub. It explains everything.
I'm open-minded for improvements, so poke me.