Terracota Ehcache Locking Client

234 Views Asked by At

I'm using Terracotta Enterprise Ehcache along with a Java application, but at some points of the day the Terracotta starts to take too much time to answer put/get requests, sometimes locking client's threads and launching exceptions.

My infrastructure is composed by a cluster of 5 JBoss servers 6.2.0 and another cluster with 4 Terracotta Enterprise Ehcache 3.7.5 that stores a large amount of data.

The application does around 10 million accesses to the Terracotta Ehcache per day.

  • Originally I used criteria, but, when the problems started, I changed everything to use id searches only.

  • I tried to change the DGC interval, making it run more often or even only once a day, it didn't get any better.

  • I started with the persistence mode permanent-store and tried to change to temporary-swap-only, but the problem continues.

  • Tried to change the Terracotta cluster to work with 2 actives machines and 2 passives or 4 actives.

  • Tried to config my caches as eternal true or false.

  • All my caches are nonstop and I tried to use the timeoutBehavior as exception or noop.

Basically none of my efforts seems to produce any significant change and the Terracotta continues to enter in this state where it can't answer the requests anymore.

Right now the only thing that seems to "solve" the problem is to restart all the clients.

Does anybody have a similar scenario using Terracotta, with this kind of throughput? Any ideas for where to look now?

1

There are 1 best solutions below

1
On

Yes i faced a similar issue of thread contention on terracota cluster setup. The slaves requests for get/put used to take time and a thread dump showed locking as the main reason. I dont remember the details as it was more than 4-6 months back. I had 2 options then:

  • Create an own cache server which would be a custom war and would run ehcache underneath and expose my own put, get, delete etc operations as REST endpoints.
  • Use cache replication as ehcache provides.

I first tried with replication suing RMI and then with JGroups. RMI based approach worked excellently and was much stable so I decided to move to RMI based replication which ehcache provides OOTB. My setup was to use ehcache as a cache provider for hibernate based JPA and RMI absed solution worked very well and effectively. It is intelligent enough to see when the other servers in cluster go down and when it comes up. Replication is async and transparent. Since the second approach worked well I didnt try out the first one.