I have a DB with two tables tblVideos is about 8 million rows, contains Id(auto increment 1,1), videoId, Name, Tags, (FK)VideoProviderId tblVideoProviders is about 6 providers at the moment, and has 3 columns: Id(auto increment 1,1 tiny int), Name, Url(to build the link using the provider + video Id)
Unlike YouTube smaller providers don't have an API to return an array then pick up something random.
retrieving a totally random row takes under a second in both ways I got now:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
WHERE ((ABS(CAST(
(BINARY_CHECKSUM
(tblVideos.id, NEWID())) as int))
% 6800000) < 10 )
OR
slightly longer
select top 1 tblVideoProvider.Url + tblVideos.videoId as url,
tblVideos.Name, tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
ORDER BY NEWID()
but once I start looking for something more specific:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
where (tblVideos.tags like '%' + @tag + '%')
or (tblVideos.Name like '%' + @tag + '%')
ORDER BY NEWID()
The query hits 8 seconds, removing the last or tblVideos like takes it down to 4~5 seconds, but that's way too high.
retrieving the whole query without the "order by newid()" will make the query take a lot less time but the application will consume about 0.2~2 MB of data per user, and assuming over 200~400 simultanios requests ends up in lots of data
In general the "like" operator is very expensive, and when the pattern starts with a "%" even an index on the respective column (assuming you have one) cannot be used. I think there is no easy way to increase the performance of your query.