ActiveMQ Artemis Consumer Connection Distribution

42 Views Asked by At

This question is a bit of a follow on to something I previously asked.

The setup is the same as outlined in that question, with a symmetric cluster of 4 stand-alone ActiveMQ Artemis (v2.20) nodes, each with the same configuration - same settings, queues etc. Multiple client apps connecting into that cluster as either message consumers or message, and all the clients connect with a connect string like this:

(tcp://artemis1:61616,tcp://artemis2:61616,tcp://artemis3:61616,tcp://artemis4:61616)?type=XA_CF&ha=true&retryInterval=1000&retryIntervalMultiplier=2&maxRetryInterval=32000&reconnectAttempts=-1

Typically, consumer connections are persistent and can remain for several hours.

The particular issue we have is that sometimes, there will be a situation where a particular queue will have no consumers on any of the 4 nodes. The consuming app might be shut down for an hours maintenance, for example. During that time a producer app might send messages to the queue, and in the absence of any consumers these are distributed fairly randomly. E.g. lets say artemis1 and artemis3 get 1 message each, and the other 2 servers remain with zero messages. Then, a little later consumers connect back to the queue - but these appear to connect without regard to the state of messages on the queues. If the client application spawns 2 consumer threads, these might connect to say artemis1 and artemis4, which leaves the message on artemis3 "stuck" indefinitely. Per the other question linked above, it appears redistribution does not kick in in this scenario either. I've noted that even in situations where the client spawns more consumer threads than the number of servers, we may still end up with a situation where one server does not get any consumers and therefore can be left with "stranded" messages.

Hope that explanation of the issue makes sense! Should we expect this situation to occur, or should the consumer connections be more aware of the state of messages across the cluster, on the target queue? Appreciate any suggestions of what we might be able to do to avoid this problem, or any other comments.

Note the reverse does not apply - message producer clients do show awareness of consumer distrubution: messages always go to servers where a consumer is connected (if any) in preference to one without.

1

There are 1 best solutions below

1
Justin Bertram On BEST ANSWER

What you're seeing with the consumer connection is expected. When the connection is made the client has no idea how the messages are distributed around the cluster and the cluster nodes have no idea on which queue the client will create a consumer (if at all). Connections are distributed around the cluster based on the client's connection load-balancing policy.

To be clear, there is an advanced feature called a connection router which can distribute connections around a cluster in a more intelligent way based on information provided with the connection (e.g. client ID, username, role, etc.). It can be used to make sure certain producers and consumers always connect to the same cluster node(s) which can avoid the performance loss of message forwarding or redistribution. But I digress.

However, what you're seeing with the lack of message redistribution is not expected. Redistribution was implemented to solve this exact problem. Based on the fact that you're using a redistribution-delay of 600000 then your consumers on nodes with no messages should start receiving messages after 10 minutes (assuming they stay connected that long, of course) as they are redistributed from nodes which do have messages.

I recommend you work up a minimal, reproducible example with the latest release. If you are able to do so then open a Jira and attach the reproducer. If not, then upgrade your cluster to the latest version and ensure your configuration is comparable to the working version.