Spring Data query is VERY slow

5.1k Views Asked by At

I have a Spring Boot project set up to use a MySQL database with Hibernate. I recently added 30k new entries to my database tables and every since then, queries have been abysmally slow.

I use Query DSL to perform the queries themselves. Suppose I have the following Candidate entity with a OneToMany relationship with a CandidateField entity:

@Entity
@Table(name = "Candidates", schema = "Candidate")
public class Candidate {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "CandidateID")
private Long candidateID;
@Column(name = "FirstName")
private String firstName;
@Column(name = "LastName")
private String lastName;
@Column(name = "Zip")
private String zip;
@Column(name = "Email")
private String email;

@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, orphanRemoval = true, mappedBy = "candidate")
private List<CandidateField> fields;

//Constructors, getters, setters omitted

CandidateField entity:

@Entity
@Table(name = "CandidateFields", schema = "Candidate")
public class CandidateField {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "CandidateFieldID")
@JsonIgnore
private Long candidateFieldID;
@JsonIgnore
@Column(name = "CandidateID")
private Long candidateID;
@Column(name = "Name")
private String name;
@ManyToOne
@JoinColumn(name = "CandidateID", referencedColumnName = "CandidateID", insertable = false, updatable = false)
@JsonIgnore
private Candidate candidate;

//Constructors, getters, setters omitted

And finally, here is my search code:

@Autowired
CandidateRepository repository;
QCandidate candidate = QCandidate.candidate;
BooleanExpression predicate = candidate.fields.any().name.eq("Accounting");
Iterable<Candidate> results = repository.findAll(predicate);

The last line takes ~8 minutes to search a 30k record database (no fields are indexed) with a result of 80 rows. The issue is the same locally and in my online staging server. It generates this query and then pauses for 8 minutes:

select candidate0_.CandidateID as Candidat1_4_, candidate0_.Email as
Email7_4_, candidate0_.FirstName as FirstNam8_4_, candidate0_.LastName as
LastNam17_4_, candidate0_.Zip as Zip30_4_ from Candidate.Candidates
candidate0_where exists (select 1 from Candidate.CandidateFields fields1_
where candidate0_.CandidateID=fields1_.CandidateID and fields1_.Name=?)
binding parameter [1] as [VARCHAR] - [Accounting]

EDIT: The issue definitely stems from the EXISTS clause in the auto-generated query. I now have to look into why it does so and how to make it generate a more efficient query

The weird part is that if I do a simple findAll() without any parameters, it actually seems to be selecting and parsing the results rather than pausing. The issue persists if I search by ID rather than Name.

Running an actual SQL query in my DB management tool returns results in ~1 second (here is the query with the EXPLAIN result as requested):

EXPLAIN SELECT *
FROM Candidates
INNER JOIN Candidate.CandidateFields
WHERE CandidateFields.CandidateID = Candidates.CandidateID AND 
CandidateFields.Name = "Accounting";

Result:

<table>
 <tr>
   <th>id</th>
   <th>select_type</th>
   <th>table</th>
   <th>partitions</th>
   <th>type</th>
   <th>possible_keys</th>
   <th>key</th>
   <th>key_len</th>
   <th>ref</th>
   <th>rows</th>
   <th>filtered</th>
   <th>extra</th>
 </tr>
  <tr>
  <td>1</td>
   <td>SIMPLE</td>
    <td>CandidateFields</td>
    <td>null</td>
    <td>ALL</td>
    <td>null</td>
    <td>null</td>
    <td>null</td>
    <td>null</td>
    <td>30601</td>
    <td>10</td>
    <td>Using where</td>
  </tr>
    <tr>
  <td>1</td>
   <td>SIMPLE</td>
    <td>Candidates</td>
    <td>null</td>
    <td>eq_ref</td>
    <td>PRIMARY</td>
    <td>PRIMARY</td>
    <td>4</td>
    <td>Candidate.CandidateFields.CandidateID</td>
    <td>1</td>
    <td>100</td>
    <td>Using where</td>
  </tr>
  
  </table>

EDIT: The issue definitely stems from the EXISTS clause in the auto-generated query. I now have to look into why it does so and how to make it generate a more efficient query

Would appreciate ANY and all help. Can also clarify if need be. Cheers!

1

There are 1 best solutions below

0
On

no fields are indexed

Why not? Just add this index here:

CREATE INDEX i1 ON Candidate.CandidateFields(Name);

Or even faster for this particular query:

CREATE INDEX i2 ON Candidate.CandidateFields(Name, CandidateID);

Because the latter will be a covering index for the semi-join (EXISTS() subquery).