Collection Caching with Hibernate

955 Views Asked by At

I have a Java/Spring application with database entities. One of the entities is a DataSet. Each DataSet can have a parent DataSet. And a parent-DataSet has multiple (OneToMany) children.

    @OneToMany(fetch=FetchType.LAZY)
    @IndexColumn(name = "order_id")
    @JoinColumn(name = "parent_dst_id")
    @Fetch(FetchMode.SELECT)
    @Cache(usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)
    public List<DataSet> getChildren()
    {
        return children;
    }

After several years a lot of DataSets grouped under some parent DataSets.

parentDataset.getChildren().size() // > 200.000

In the code (and I'm not sure if this is the correct way) the following lines are executed when adding a new dataset (survey) to a parent:

surveyDataset.setParent(parentDataset);
parentDataset.getChildren().add(surveyDataset);

Until recently the children weren't cached at all. The 2nd statement then resulted in long running queries on the database.

Now recently we added datasets.children to the cache. That way, everything goes really fast.

Until...

.. another child-dataset is added (under the same parent!) it becomes really slow again (ie: it starts querying the db again).

I think that the cache is getting evicted or cleared or whatever and that that is the reason for querying again.

See also: http://planet.jboss.org/post/collection_caching_in_the_hibernate_second_level_cache (TLDR: "Hibernate doesn’t update the collection in the cache, it just removes it.")

I tried to understand logging (net.sf.ehcache=DEBUG and org.hibernate.SQL=DEBUG) but thats difficult.

So I'm stuck with a few questions

1) What do I have to do to prevent loading all child-datasets again (all siblings from the new dataset)? 2) How can I debug what is happening to my cache? Does it evict the parent? or also all children? As explained in the article it probably will reload the entire collection.

1

There are 1 best solutions below

2
On
  1. You can change the cache usage="read-write" to cache usage="read-only" and then maintain the eviction policy in the code. This is the safest way to manage the cache but needs quite a bit of heavy lifting in code for managing the cache.

  2. For debugging there are 2 ways.

    1. enable show_sql in hibernate and see if the queries are getting executed twice, which is usually indication that hibernate tried to check the cache first for object and did not find and sent a second query database way.
    2. Log cache hits as explained in this article: http://winterbe.com/posts/2009/10/01/how-to-log-hibernate-cache-this/