About 5 days ago, OpenNMS Horizon 22.02 on Ubuntu 18.04.1 LTS stopped accepting traps from network elements. No changes were made to configuration or underlying operating system to my knowledge.
There are about 125 network elements, all Cisco, sending traps.
So far I have checked the following:
- tcpdump shows the traps coming into the interface on port 162
- Turned on Debug for trapd.log and incoming traps from network elements do not create any log entries
- Traps sent with send-trap.pl from the localhost create traps that flow all the way to events
- Traps sent with snmptrap either on localhost or another host create log entries that flow all the way to events. The other host is using the same interface that the network elements are using.
- ss -lnpu sport = :162 shows an open UPD "UNCONN"
- sudo lsof -i :162 shows a single listener java process
- Startup of trapd does not seem to show any warnings in the log
- I have verified that the ufw and iptables are off
- I have updated OpenNMS to 22.04 and updated Ubunutu with no relief
- Restarted OpenNMS many many times...
- I moved Trapd startup after Asterisk in service-configuration.xml based on this
All of this seems similar to this. I think the last commenter on that thread asked about comparing the successful and unsuccessful traps in Wireshark which I have not done but all of the traps that are being sent have worked hundreds if not thousands of times until November 6th.
Is there anywhere else to look for errors as to why Trapd is not accepting traps? I think I have ruled out network issues.
I created a new Ubuntu 18.04 VM, updated it and then installed Horizon 23.01 fresh. I pointed my stream of traps at it and it behaves the exactly the same way, none of the traps create any log entries on the trapd.log with the level set to debug. Tcpdump shows the traps coming to the interface.
Issue Resolved.
The underlying operating system lost its static route for the subnet that the traps were coming from. OpenNMS had a route back to the subnet but not via the path that the traps were coming in from. Once the static route was restored, traps started working again and were flowing all the way to events.