I'm currently using RabbitMQ (3.10.25) in production with 3 nodes and it contains several queues:
- one classic queue
- one quorum queue (to handle NServiceBus commands - NServiceBus.RabbitMQ package 8.0.2)
- one quorum queue 'error'
- 28 NServiceBus quorum queues (nsb.v2.delay-level-xx)
The classic queue handles 2 messages per second. The quorum queues do nothing.
After a few hours one node is still stable (used by the classic queue), the other 2 are high on memory usage. The watermark memory limit has been reached. In addition, there is a lot of preallocated unused memory (seems not normal and still increasing). Tables other is 1.2GB as well.. (keeps increasing)
What is the best approach to tackle this issue? Does RabbitMQ or NServiceBus provide a certain setting? to reduce the memory usage?
Some info is available at RabbitMQ, but it is unclear which settings need to be adjusted as starting point.
UPDATE 1: The Write/Sync IO is high as well. (Above screenshots are created while some producers/ consumers are connected. Even with no producer/consumer, memory consumption continues to rise. The IO screenshot is with 0 producers/consumers)
UPDATE 2: I noticed, one of the queues, nsb.v2.verify-stream-flag-enabled, displays 'Cluster is in minority'. Could you explain what it means? Is it causing the described memory issue?
UPDATE 3: A memory breakdown of rabbitmqctl report. As mentioned before, the values of allocated_unused and other_ets are too high.
Total memory used: 3.7986 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 3.3458 gb
**allocated_unused: 2.4707 gb (65.04 %)
other_ets: 1.2057 gb (31.74 %)**
other_proc: 0.0518 gb (1.36 %)
code: 0.0336 gb (0.89 %)
other_system: 0.0148 gb (0.39 %)
connection_other: 0.0048 gb (0.13 %)
plugins: 0.0043 gb (0.11 %)
quorum_queue_procs: 0.0025 gb (0.07 %)
reserved_unallocated: 0.0024 gb (0.06 %)
binary: 0.002 gb (0.05 %)
atom: 0.0015 gb (0.04 %)
mgmt_db: 0.0012 gb (0.03 %)
metrics: 0.001 gb (0.03 %)
mnesia: 0.0008 gb (0.02 %)
connection_channels: 0.0007 gb (0.02 %)
connection_readers: 0.0004 gb (0.01 %)
connection_writers: 0.0003 gb (0.01 %)
stream_queue_procs: 0.0001 gb (0.0 %)
quorum_ets: 0.0 gb (0.0 %)
msg_index: 0.0 gb (0.0 %)
quorum_queue_dlx_procs: 0.0 gb (0.0 %)
stream_queue_replica_reader_procs: 0.0 gb (0.0 %)
queue_procs: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
stream_queue_coordinator_procs: 0.0 gb (0.0 %)
UPDATE 4: Top Processes plugin has been installed and I've noticed the next item keeps increasing the memory usage in the TOP ETS tables. After a while it drops and starts increasing again. The memory of the nodes don't seem to be cleaned up...:
- name: rabbit_stream_coordinator
- owner name: ra_coordination_log_ets
- type: set
- named: false
- protection: public
- compressed: false
UPDATE 5: The issue seems to be caused by the stream queue (created by NServiceBus): 'nsb_v2_verify-stream-flag-enabled'
rabbit_stream_coordinator: Error while starting replica for nsb_v2_verify-stream-flag-enabled
could not connect osiris to replica.
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> crasher:
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> initial call: osiris_replica_reader:init/1
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> registered_name: []
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> exception exit: connection_refused
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> in function gen_server:init_it/6 (gen_server.erl, line 835)
2023-10-03 07:58:25.972915+00:00 [error] <0.11581.46> ancestors: [osiris_replica_reader_sup,osiris_sup,<0.235.0>]
Additional logging:
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> crasher:
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> initial call: osiris_replica:init/1
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> registered_name: []
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> exception error: no case clause matching
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> {error,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> {connection_refused,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> {child,undefined,#Ref<0.840260531.4208984066.201970>,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> {osiris_replica_reader,start_link,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> [#{connection_token =>
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> hosts =>
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> name =>
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> reference =>
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> {resource,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> queue,<<"nsb.v2.verify-stream-flag-enabled">>},
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> start_offset => {0,empty},
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> transport => ssl}]},
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> temporary,false,5000,worker,
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> [osiris_replica_reader]}}}
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> in function osiris_replica_reader:start/2 (src/osiris_replica_reader.erl, line 108)
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> in call from osiris_replica:handle_continue/2 (src/osiris_replica.erl, line 246)
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> in call from gen_server:try_dispatch/4 (gen_server.erl, line 1123)
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> in call from gen_server:loop/7 (gen_server.erl, line 865)
2023-10-03 08:29:37.869613+00:00 [error] <0.10360.34> ancestors: [osiris_server_sup,osiris_sup,<0.235.0>]
As soon as we delete the queue, the memory usage seems stable and the GC numbers go down.
(Update 5 is tested with RabbitMQ 3.11.20 and NServiceBus nuget package 8.1.3)
After restarting the app (using NServiceBus), the 'nsb_v2_verify-stream-flag-enabled' is re-created and the followers still displays 'Cluster in minority'




The stream flag queue is created at start by NServiceBus to check if the broker supports stream queues which are used indirectly by quorum queues for the timeout infrastructure to be reliable.
The queue itself isn't used for any messaging so that cannot cause any memory issues.
Validate the Rabbitmq logs for any issues and run
rabbitmqctl reportas suggested by Adam. Also see: