Data Calculations MySQL vs Python

3.4k Views Asked by At

I'm trying to understand which of the following is a better option:

  1. Data calculation using Python from the output of a MySQL query.
  2. Perform the calculations in the query itself.

For example, the query returns 20 rows with 10 columns. In Python, I compute the difference or division of some of the columns.

Is it a better thing to do this in the query or in Python ?

2

There are 2 best solutions below

3
On

It is probably a matter of taste but...

... to give you an exact opposite answer as the one by Alma Do Mundo, for (not so) simple calculation made on the SELECT ... clause, I generally push toward using the DB "as a calculator".

Calculations (in the SELECT ... clause) are performed as the last step while executing the query. Only the relevant data are used at this point. All the "big job" has already been done (processing JOIN, where clauses, aggregates, sort).

At this point, the extra load of performing some arithmetic operations on the data is really small. And that will reduce the network traffic between your application and the DB server.

It is probably a matter of taste thought...

2
On

If you are doing basic arithmetic operation on calculations in a row, then do it in SQL. This gives you the option of encapsulating the results in a view or stored procedure. In many databases, it also gives the possibility of parallel execution of the statements (although performance is not an issue with so few rows of data).

If you are doing operations between rows in MySQL (such as getting the max for the column), then the balance is more even. Most databases support simple functions to these calculations, but MySQL does not. The added complexity to the query gives some weight to doing these calculations on the client-side.

In my opinion, the most important consideration is maintainability of the code. By using a database, you are necessary incorporating business rules in the database itself (what entities are related to which other entities, for instance). A major problem with maintaining code is having business logic spread through various systems. I much prefer to have an approach where such logic is as condensed as possible, creating very clear APIs between different layers.

For such an approach, "read" access into the database would be through views. The logic that you are talking about would go into the views and be available to any user of the database -- ensuring consistency across different functions using the database. "write" access would be through stored procedures, ensuring that business rules are checked consistently and that operations are logged appropriately.