Troubleshoot slow ajax requests in Kudos Boards WebSphere from HCL Connections

39 Views Asked by At

We still need the old WebSphere based Kudos Boards from ISW. Users reported that ajax operations like setting due dates or responsible people in a card are very slow, some even abort before finished. Until now I can't see a pattern here why this occurs. It doesn't seem related to a specific user or client.

How could this be troubleshooted? I'd like to take one step back and measure first, how long it takes to process requests. Then generating a list of all requests that took more than a specific time to process, lets say more than one second. So we know how many users are affected from this and there are maybe some shared things between them, like users on a slow network or something like that.

1

There are 1 best solutions below

0
On

Collecting request duration logs from IHS

The Apache module mod_log_config allows certain properties to be logged. Interesting is %T:

The time taken to serve the request, in seconds.

Using the %{UNIT}T format, it could be displayed in microseconds instead of seconds, which is better for analyzing later. So I extended the common log format to write the processing time first in conf/httpd.conf:

LogFormat "%{ms}T %h %l %u %t \"%r\" %>s %b" common

Another good idea is to add the User-Agent too, which gave some information about the type of client:

LogFormat "%{ms}T %h %l %u %t \"%r\" %>s %b \"%{User-Agent}i\"" common

First check the configuration file to make sure we didn't break anything:

/opt/IBM/HTTPServer #./bin/apachectl -t
Syntax OK

Now we can do a graceful reastart on productive environments. This way, the webserver finishes all currently processed requests before applying our new config. In opposite to a hard restart, most of the users shouldn't even know that we restarted their webserver.

./bin/apachectl -k graceful

All new requests are logged with the duration in ms like this:

100007 1.2.3.4 - - [03/Nov/2020:14:29:52 +0100] "POST /push/form/comet/connect HTTP/1.1" 200 96 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"

Analyzing the measured data

We now have several hundred MB of logs from a single day. A filter is required to get usefull information from them. When looking in the logs, we need to do two things to get information about slow requests from them:

  1. Filter on request duration: Requests which are faster than XXX ms need to be sorted out
  2. HCL Connections and Kudos Board use some very old workarounds: They keep ajax requests open to get informed about server updates. A old way of websockets. Those requests just spam our slowlog with false-positives.

I wrote the following command for this purpose:

log=/tmp/access-slow.log; egrep -i -v "(POST /(push|files)/|GET /(kudosboards/updates|api-boards)/|\.mp4(\?[A-Za-z0-9_\-=&]+)? HTTP)" logs/access_2020_11_03.log | egrep '^[0-9]{4,}' | sort -rnk1 > $log; wc -l $log

The first egrep call filters out all requests which are not interesting here: /push is from the long keep alive connections, kudos do this as well. I also excluded mp4 movies because those files are naturally large. When downloading a 500MB movie take some time, it's nothing we need to worry about. Sometimes I also saw them with GET parameters like movie.mp4?logDownload=true&downloadType=view so those were filtered as well.

egrep '^[0-9]{4,}' sort out requests which have a processing time less than 4 digits, wich basically mean >= 1000ms. Since I'm looking for heavy performance issues, this seems a good point to start.

sort -rnk orders our logs by duration descending, so we have the slowest entries at the beginning.

wc -l $log is just to know how much slow entries we collected this time. Now they could be viewed with less $log. Our large access log with over 2 million entries was reduced to 2k.