Retrieve a random row with like statement (over 5 millions rows)

66 Views Asked by At

I have a DB with two tables tblVideos is about 8 million rows, contains Id(auto increment 1,1), videoId, Name, Tags, (FK)VideoProviderId tblVideoProviders is about 6 providers at the moment, and has 3 columns: Id(auto increment 1,1 tiny int), Name, Url(to build the link using the provider + video Id)

Unlike YouTube smaller providers don't have an API to return an array then pick up something random.

retrieving a totally random row takes under a second in both ways I got now:

select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
WHERE ((ABS(CAST(
(BINARY_CHECKSUM
(tblVideos.id, NEWID())) as int))
% 6800000) < 10 )

OR

slightly longer

select top 1 tblVideoProvider.Url + tblVideos.videoId as url,
tblVideos.Name, tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
ORDER BY NEWID()

but once I start looking for something more specific:

select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
where (tblVideos.tags like '%' + @tag + '%')
or (tblVideos.Name like '%' + @tag + '%')
ORDER BY NEWID()

The query hits 8 seconds, removing the last or tblVideos like takes it down to 4~5 seconds, but that's way too high.

retrieving the whole query without the "order by newid()" will make the query take a lot less time but the application will consume about 0.2~2 MB of data per user, and assuming over 200~400 simultanios requests ends up in lots of data

1

There are 1 best solutions below

2
On

In general the "like" operator is very expensive, and when the pattern starts with a "%" even an index on the respective column (assuming you have one) cannot be used. I think there is no easy way to increase the performance of your query.