I'm using MySQL tables.
Sample table EMP (primary_key on (EID, DOJ)) :
| ENAME | EID | DEPT | SAL | DOJ (YYYY-MM-DD) |
|---|---|---|---|---|
| A | 6 | ee | 2000 | 2021-03-01 |
| B | 5 | me | 2020 | 2021-04-30 |
| C | 3 | it | 2000 | 2020-12-27 |
| D | 4 | cv | 2020 | 2020-10-31 |
| E | 1 | it | 2000 | 2021-01-01 |
| F | 2 | it | null | 2021-02-28 |
| G | 7 | ee | null | 2020-11-20 |
I have one job that deletes and inserts data into a similarly structured table.
The process needs to delete non-monthly data. That is, it will delete rows with EID
6, 3, 1, 7: These rows have DOJ that is not month-end.
The query I came up with:
-- DOJ is of type DATE
DELETE FROM EMP WHERE LAST_DAY(DOJ) <> DOJ ;
It works as expected. But, because there is large data (~5 million) this is slow.
I understand that because of LAST_DAY() function, I'm loosing the index advantage of DOJ.
Can you please suggest how can I improve the query?
You are right about the index. It is no use here. The only idea that comes to mind is a generated column to tell you about the date being the month's end or not. Thus you'd have a column you can index and use in your query:
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=397388b70bb1f459bbefce630ad27ac4
An index can only help, though, if this is about a very small part of the data in the table, say 1%. With many more rows to delete it makes more sense to read the whole table.