I have a Spring Boot project set up to use a MySQL database with Hibernate. I recently added 30k new entries to my database tables and every since then, queries have been abysmally slow.
I use Query DSL to perform the queries themselves. Suppose I have the following Candidate entity with a OneToMany relationship with a CandidateField entity:
@Entity
@Table(name = "Candidates", schema = "Candidate")
public class Candidate {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "CandidateID")
private Long candidateID;
@Column(name = "FirstName")
private String firstName;
@Column(name = "LastName")
private String lastName;
@Column(name = "Zip")
private String zip;
@Column(name = "Email")
private String email;
@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, orphanRemoval = true, mappedBy = "candidate")
private List<CandidateField> fields;
//Constructors, getters, setters omitted
CandidateField entity:
@Entity
@Table(name = "CandidateFields", schema = "Candidate")
public class CandidateField {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "CandidateFieldID")
@JsonIgnore
private Long candidateFieldID;
@JsonIgnore
@Column(name = "CandidateID")
private Long candidateID;
@Column(name = "Name")
private String name;
@ManyToOne
@JoinColumn(name = "CandidateID", referencedColumnName = "CandidateID", insertable = false, updatable = false)
@JsonIgnore
private Candidate candidate;
//Constructors, getters, setters omitted
And finally, here is my search code:
@Autowired
CandidateRepository repository;
QCandidate candidate = QCandidate.candidate;
BooleanExpression predicate = candidate.fields.any().name.eq("Accounting");
Iterable<Candidate> results = repository.findAll(predicate);
The last line takes ~8 minutes to search a 30k record database (no fields are indexed) with a result of 80 rows. The issue is the same locally and in my online staging server. It generates this query and then pauses for 8 minutes:
select candidate0_.CandidateID as Candidat1_4_, candidate0_.Email as
Email7_4_, candidate0_.FirstName as FirstNam8_4_, candidate0_.LastName as
LastNam17_4_, candidate0_.Zip as Zip30_4_ from Candidate.Candidates
candidate0_where exists (select 1 from Candidate.CandidateFields fields1_
where candidate0_.CandidateID=fields1_.CandidateID and fields1_.Name=?)
binding parameter [1] as [VARCHAR] - [Accounting]
EDIT: The issue definitely stems from the EXISTS clause in the auto-generated query. I now have to look into why it does so and how to make it generate a more efficient query
The weird part is that if I do a simple findAll() without any parameters, it actually seems to be selecting and parsing the results rather than pausing. The issue persists if I search by ID rather than Name.
Running an actual SQL query in my DB management tool returns results in ~1 second (here is the query with the EXPLAIN result as requested):
EXPLAIN SELECT *
FROM Candidates
INNER JOIN Candidate.CandidateFields
WHERE CandidateFields.CandidateID = Candidates.CandidateID AND
CandidateFields.Name = "Accounting";
Result:
<table>
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>partitions</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>extra</th>
</tr>
<tr>
<td>1</td>
<td>SIMPLE</td>
<td>CandidateFields</td>
<td>null</td>
<td>ALL</td>
<td>null</td>
<td>null</td>
<td>null</td>
<td>null</td>
<td>30601</td>
<td>10</td>
<td>Using where</td>
</tr>
<tr>
<td>1</td>
<td>SIMPLE</td>
<td>Candidates</td>
<td>null</td>
<td>eq_ref</td>
<td>PRIMARY</td>
<td>PRIMARY</td>
<td>4</td>
<td>Candidate.CandidateFields.CandidateID</td>
<td>1</td>
<td>100</td>
<td>Using where</td>
</tr>
</table>
EDIT: The issue definitely stems from the EXISTS clause in the auto-generated query. I now have to look into why it does so and how to make it generate a more efficient query
Would appreciate ANY and all help. Can also clarify if need be. Cheers!
Why not? Just add this index here:
Or even faster for this particular query:
Because the latter will be a covering index for the semi-join (
EXISTS()
subquery).