I am performing Load testing on Akka-http framework(version: 10.0), I am using wrk tool. wrk command:
wrk -t6 -c10000 -d 60s --timeout 10s --latency http://localhost:8080/hello
first run without any blocking call,
object WebServer {
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
def main(args: Array[String]) {
val bindingFuture = Http().bindAndHandle(router.route, "localhost", 8080)
println(
s"Server online at http://localhost:8080/\nPress RETURN to stop...")
StdIn.readLine() // let it run until user presses return
bindingFuture
.flatMap(_.unbind()) // trigger unbinding from the port
.onComplete(_ => system.terminate()) // and shutdown when done
}
}
object router {
implicit val executionContext = WebServer.executionContext
val route =
path("hello") {
get {
complete {
"Ok"
}
}
}
}
output of wrk:
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.22ms 16.41ms 2.08s 98.30%
Req/Sec 9.86k 6.31k 25.79k 62.56%
Latency Distribution
50% 3.14ms
75% 3.50ms
90% 4.19ms
99% 31.08ms
3477084 requests in 1.00m, 477.50MB read
Socket errors: connect 9751, read 344, write 0, timeout 0
Requests/sec: 57860.04
Transfer/sec: 7.95MB
Now if i add a future call in the route and run the test again.
val route =
path("hello") {
get {
complete {
Future { // Blocking code
Thread.sleep(100)
"OK"
}
}
}
}
Output, of wrk:
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 527.07ms 491.20ms 10.00s 88.19%
Req/Sec 49.75 39.55 257.00 69.77%
Latency Distribution
50% 379.28ms
75% 632.98ms
90% 1.08s
99% 2.07s
13744 requests in 1.00m, 1.89MB read
Socket errors: connect 9751, read 385, write 38, timeout 98
Requests/sec: 228.88
Transfer/sec: 32.19KB
As you can see with future call only 13744 requests are being served.
After following Akka documentation, I added a separate dispatcher thread pool for the route which creates max, of 200 threads.
implicit val executionContext = WebServer.system.dispatchers.lookup("my-blocking-dispatcher")
// config of dispatcher
my-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
// or in Akka 2.4.2+
fixed-pool-size = 200
}
throughput = 1
}
After the above change, the performance improved a bit
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 127.03ms 21.10ms 504.28ms 84.30%
Req/Sec 320.89 175.58 646.00 60.01%
Latency Distribution
50% 122.85ms
75% 135.16ms
90% 147.21ms
99% 190.03ms
114378 requests in 1.00m, 15.71MB read
Socket errors: connect 9751, read 284, write 0, timeout 0
Requests/sec: 1903.01
Transfer/sec: 267.61KB
In the my-blocking-dispatcher config if I increase the pool size above 200 the performance is same.
Now, what other parameters or config should I use to increase the performance while using future call.So that app gives the maximum throughput.
Some disclaimers first: I haven't worked with
wrk
tool before, so I might get something wrong. Here are assumptions I've made for this answer:-t4 -c10000
it keeps 10000 connections, not 4 * 10000.Also I've run the server on the same machine as wrk, and my machine seems to be weaker than yours (I have only dual-core CPU), so I've reduced wrk's thread counts to 2, and connection count to 1000, to get decent results.
The Akka Http version I've used is the
10.0.1
, and wrk version is4.0.2
.Now to the answer. Let's look at the blocking code you have:
This means, every request will take at least 100 milliseconds. If you have 200 threads, and 1000 connections, the timeline will be as follows:
Where
Msg
is amount of processed messages,Ms
is elapsed time in milliseconds.This gives us 2000 messages processed per second, or ~60000 messages per 30 seconds, which mostly agrees to the test figures:
It is also obvious that this number (2000 messages per second) is strictly bound by the threads count. E.g. if we would have 300 threads, we'd process 300 messages every 100 ms, so we'd have 3000 messages per second, if our system can handle so many threads. Let's see how we'll fare if we provide 1 thread per connection, i.e. 1000 threads in pool:
As you can see, now one request takes almost exactly 100ms on average, i.e. the same amount we put into
Thread.sleep
. It seems we can't get much faster than this! Now we're pretty much in standard situation ofone thread per request
, which worked pretty well for many years until the asynchronous IO let servers scale up much higher.For the sake of comparison, here's the fully non-blocking test results on my machine with default fork-join thread pool:
To summarize, if you use blocking operations, you need one thread per request to achieve the best throughput, so configure your thread pool accordingly. There are natural limits for how many threads your system can handle, and you might need to tune your OS for maximum threads count. For best throughput, avoid blocking operations.
Also don't confuse asynchronous operations with non-blocking ones. Your code with
Future
andThread.sleep
is a perfect example of asynchronous, but blocking operation. Lots of popular software operates in this mode (some legacy HTTP clients, Cassandra drivers, AWS Java SDKs, etc.). To fully reap the benefits of non-blocking HTTP server, you need to be non-blocking all the way down, not just asynchronous. It might not be always possible, but it's something to strive for.