I am experimenting with OpenFaas and trying to evaluate the performance benefit of having more than workers serve the same function (the default shasum function available from the store). I have a cluster of three 'small'(1vcpu 2gb ram) one 'medium' (1vcpu 4gb ram) and one 'big' (2vcpu 4gb ram) VM's. Scheduling is done with Kubernetes, and the medium and large VM's are exempt from hosting any functions which are hosted on the small VMs. The hey tool is used to perform multiple invocations, and I spawn workers (i.e additional pods, instances of the function) manually through the API. The 8080 port of the gateway component is port-forwarded to localhost (kubectl port-forward -n openfaas svc/gateway 8080:8080 &), and invocations of the function are made using a commandline similar to the following:
hey -n 50 -c 3 -m POST -D 50large.txt http://localhost:8080/function/shasum
or
hey -n 20000 -c 600 -m POST -d test http://localhost:8080/function/shasum
(the first one tests with 50 shasums of a 30Mb file, the other one with 600 concurrent publishers of a small request, 20000 times). The invocations are made from the 'big' VM, which cannot host any function pods (cordoned).
Sometimes, I notice that if I call the function with great amounts of concurrent requests or large file inputs, the gateway fails to forward these requests and port-forwarding is broken (for example, when using the first command, when substituting -c 3 with -c 5, for 5 concurrent producers).
But even when port-forwarding is not broken (i.e using -c 3) I get some not easily explainable results. Consider the execution log below for a case which makes continuous use of three workers (function pods), evenly spread over the three small VMs:
root@big-vm-1:~# hey -n 500 -c 3 -m POST -D 50large.txt http://localhost:8080/function/shasum
Summary:
Total: 541.0489 secs
Slowest: 5.5438 secs
Fastest: 1.1259 secs
Average: 3.2351 secs
Requests/sec: 0.9204
And the other execution log, which only uses a single worker (one function pod):
root@big-vm-1:~# hey -n 500 -c 3 -m POST -D 50large.txt http://localhost:8080/function/shasum
Summary:
Total: 551.3123 secs
Slowest: 5.1512 secs
Fastest: 1.4815 secs
Average: 3.3106 secs
Requests/sec: 0.9033
Why does using multiple function pods only achieve marginally better results? Can anyone suggest an approach to verify that using multiple workers is actually better than using a single worker, using this or a related setup?
Not knowing what do you mean by "worker" it's very hard to guess why the different number of workers doesn't have a lot of impact.
The only mention of "worker" I was able to find in the OpenFaas documentation is:
so if this is your "worker" than increasing the number of subscribers shouldn't increase the processing speed and your results are kind of expected.
I noticed you're using
localhost, if you have a k8s local installation and running your tests on a single physical (or virtual) machine be informed that it's not the best idea to have the load generator (heyin your case) and the system under test at the same machine due to a possible race condition (which will happen for sure)Also a good idea is running performance tests against production-like environment (staging) because you cannot extrapolate the results and predict/calculate the saturation/breaking points for different hardware/software, there are some aspects which could be tested on a scaled-down environment, but in general results won't be reliable so consider conducting a test in more realistic conditions and using realistic workload/payload/concurrency/etc.