The first query q1 gives desired result. But query q2 gives excessive large no of movies for a pair of actor and director.
SELECT * FROM
(SELECT pc.PID AS actorID, pc.Name AS Actor, pd.PID AS directorID,pd.Name AS
Director,COUNT(DISTINCT m.MID) count_movie FROM Movie m
JOIN
M_Cast mc ON m.MID = Trim(mc.MID)
JOIN
M_Director md ON m.MID = md.MID
JOIN
Person pc ON Trim(mc.PID) = pc.PID
JOIN
Person pd ON md.PID = pd.PID
WHERE pd.Name LIKE '%Yash%' GROUP BY pc.Name) AS q1
JOIN
(SELECT pc.PID AS actorID,pc.Name Actor,pd.PID AS directorID,pd.Name AS Director, COUNT(DISTINCT
m.MID) count_movie FROM Movie m
JOIN
M_Cast mc ON m.MID = TRIM(mc.MID )
JOIN
M_Director md ON m.MID = md.MID
JOIN
Person pc ON TRIM(mc.PID) = pc.PID
JOIN
Person pd ON md.PID = pd.PID
WHERE pd.Name NOT LIKE '%Yash%' GROUP BY pc.PID) AS q2
ON q1.Actor = q2.Actor ORDER BY q1.count_movie DESC
actorID Actor directorID Director count_movie actorID Actor directorID Director count_movie
0 nm0707271 Jagdish Raj nm0007181 Yash Chopra 11 nm0707271 Jagdish Raj nm0474806 Gulshan Kumar 98
1 nm0471443 Manmohan nm0007181 Yash Chopra 10 nm0471443 Manmohan nm0695153 T. Prakash Rao 39
2 nm0407002 Iftekhar nm0007181 Yash Chopra 9 nm0407002 Iftekhar nm1065099 C.P. Dixit 93
3 nm0534501 Madan Puri nm0007181 Yash Chopra 8 nm0534501 Madan Puri nm0619050 Hiren Nag 94
The movie_count on extreme right seems to be unreasonably high. It should be less than 11 for the first row if it has to match the answer.
Here i have not used the above WHERE condition just to show where my query is wrong. Even if i use the Where condition
where q1.count_movie > q2.count_movie
The result which i get is
Name count
0 Sanjeev Kumar 3
1 Sanjeev Kumar 3
2 Sanjeev Kumar 3
3 Sanjeev Kumar 3
4 Ashok Kumar 2
The below output is the correct result. It will give 245 rows. I am showing only 6 rows.
Name count
0 Jagdish Raj 11
1 Manmohan 10
2 Iftekhar 9
3 Shashi Kapoor 7
4 Rakhee Gulzar 5
5 Waheeda Rehman 5
The tables given are
- Movie(MID,title,year)
- M_Cast(MID,PID)
- Person(PID,Name,Gender)
- M_Director(MID,PID)
I am new in sql and having a lot of trouble in debugging the code. Kindly help me to understand the logic.
The problem with your second part of the query is that you sum up all movies of all other directors instead of counting movies per director.
I see two approaches for the task:
WHERE yash_chopra_count > ANY (...)
, but SQLite doesn't feature theANY
keyword.) Or compare Yash Chopra's movie count to the maximum movie count of the others. (Which does work in SQLite.)ROW_COUNT
).Before we start, let's check, if we got everything right: The data model allows multiple directors per movie. But we can still just count. If an actor made three movies with Yash Chopra and three with I. S. Johar, we don't care whether these are six separate movies or maybe just three where Yash Chopra was I. S. Johar's assistent director. We see in my example for the actor there is another director with at least as many movies as with Yash Chopra, so we don't want to select this actor. The m:n relation is hence no hindrance. It doesn't make a difference.
Now that we made sure we can simply count movies per actor and director let's see the two approaches:
Compare Yash Chopra count with maximum other count
As mentioned, we must aggregate other directors twice (once for the count per director, once for the maximum count), because SQLite lacks the
ANY
operator. This isn't a big deal though when we use a CTE for readability.Rank directors per actor by movie count
What steps must we undertake to get the actors in question?
For the ranking I'd use
ROW_NUMBER
.