Spring Data JDBC performance when load the whole aggregate

144 Views Asked by At

I have an aggregate root entity called School. The School entity encompasses various aggregate entities such as staff, courses, schedule flows, documents, etc., resulting in more than 20 attributes. Most of these attributes exhibit a one-to-many (1-M) relationship, and some even have their own aggregate roots.

My question pertains to the process of adding, modifying, or deleting an aggregate. Currently, I load the entire aggregate tree, perform the necessary modifications, and then save the changes. However, considering the substantial amount of data involved, I am concerned about the potential expense of this operation.

Would it be more prudent to directly perform modifications in the database tables using their respective repositories? For instance, instead of loading the entire school to update a course, I could use the course repository to update the course by its ID.

I am interested in understanding the performance implications of the first approach and the associated pros and cons.

What are the steps that we should take to avoid any performance issues in the future when we have lots of data.

3

There are 3 best solutions below

2
On BEST ANSWER

Having a large aggregate might be a sign of bad aggregate design. Aggregate has a meaning. It defines the consistency boundary for operations on several entities that must be consistent (all elements of the aggregates are coherent with each other). If it gets larger (in an extreme way one aggregate for your whole system!), it means you are trying to keep consistency between all entities together. Sure you are keeping the consistency of the whole, but at the cost of loading whole system data. To define an optimum boundary around entities:

  • You should ask yourself what are the entities that need to be consistent together? For example, What are the invariants that "school" enforces to keep the whole aggregate consistent throughout your operations? If any of the entities of the "school" doesn't care about other entities or the "school", you must take it out of the aggregate and define another aggregate around it. For example, the operations on the "course" entity might not need to check invariants related to the "school". So it might be in another aggregate.
  • You should make it smaller to load smaller data. But not too small, because it will suffer from consistency issues. In that case, you have to consider eventual consistency and patterns like SAGA to make it consistent (a more complex solution).

Some other notes to keep in mind:

  • Don't prematurely optimize your solution. Design your aggregates to be solid and clean. After that, optimize your solution if you hit performance issues in production.
  • To design your domain model consider the trade-off between these three concepts:
  1. Domain model completeness
  2. Domain model purity
  3. Performance

Here is an excellent article by Vladmir Khorikov's about these concepts.

1
On

There is missing information about behaviour you want to achieve in your question. If you want your model of School to always be valid in the sense that every Course is having teacher which is part of the schools Staff assigned to it then basically always all of your model needs to be loaded to check those invariants.

But this is rarely the case. A better solution would be for you to allow such problems to exist in your model explicitly (maybe as their own entity). This way a user can evaluate different options and fix them. (The IDE you are probably using does a similiar job when telling you about a compile error in that other file, but doesn't stop you from continuing to edit the currently opened file. So does MS Word too by letting you know that there is a typo in a word 3 lines above your current cursor position, but you can still continue to type.)

0
On

The School entity encompasses various aggregate entities such as staff, courses, schedule flows, documents, etc., resulting in more than 20 attributes. Most of these attributes exhibit a one-to-many (1-M) relationship, and some even have their own aggregate roots.

Well that's your problem. Aggregate Roots (ARs) represent consistency boundaries and to make that boundary clear, it's recommended to only reference other ARs by ID. Before looking into Spring Data JDBC I think you should review how to design proper aggregates that are driven by behaviors & business invariants.

Aggregate design is not about modeling all entity relationships, it's about clustering together data that is needed to evaluate business invariants (rules that must be strongly consistent and valid at all times) as well as resulting state transitions.

Sometimes, when making all rules strongly consistent is not practical (e.g. bad performance, too many concurrency conflicts, etc.) then we might need to find a way to make some rules eventually consistent and break down larger clusters.