I wrote a SQL query to answer the following question:
Find all the actors that made more movies with Yash Chopra than any other director in the IMBD database.
Sample schema:
person
(pid *
,name
);
m_cast
(mid *
,pid *
);
m_director
(mid*
,pid*
);
* = (component of) PRIMARY KEY
Following is my query:
WITH common_actors AS
(SELECT A.actor_id as actors, B.director_id as director_id, B.movies as movies_with_director,
B.director_id as yash_chops_id, B.movies as movies_with_yash_chops FROM
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id) A
JOIN
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id
)B
ON A.actor_id = B.actor_id
WHERE B.director_id in (SELECT PID FROM Person WHERE Name LIKE
'%Yash%Chopra%'))
SELECT distinct actors as actor_id, movies_with_yash_chops as total_movies FROM common_actors
WHERE actors NOT IN (SELECT actors FROM common_actors WHERE movies_with_director > movies_with_yash_chops)
And the result obtained from this is of length: 430 rows. However the result obtained should be of length 243 rows. Could anyone please suggest where I went wrong in my query? Is my approach right?
Sample result:
Actor name
0 Sharib Hashmi
1 Kulbir Badesron
2 Gurdas Maan
3 Parikshat Sahni
...
242 Ramlal Shyamlal
Thanks in advance!
Consider the following:
I've included the movie table only for ease of reference. It's not relevant to the actual problem. Also, note that this model assumes that cast members are only listed once, regardless of whether or not they have multiple roles in a given film.
The following query asks 'how often have each actor and director worked together'...
An actor is any person who has been a cast member of any movie. A director is any person who has been a director of any movie.
By observation, we can see that:
EDIT: While I'm not seriously advocating this as a solution, the following is simply to demonstrate that the kernel provided above is really all you need to solve the problem...
This returns a list of every actor, and the director with whom they've worked most often. In this case, because Bill Smith and Raphael DiMaggio have worked most often equally with two directors, they are excluded from the result.
The answer to your problem is simply to select from this list all rows with Yash Chopra listed as the director.