So I'm trying to make a music bot with discord.py. Shown below is a minimum working example of the bot with the problematic functions:
import os
import discord
from discord.ext import commands
from discord import player as p
import yt_dlp as youtube_dl
intents = discord.Intents.default()
intents.members = True
bot = commands.Bot(command_prefix=';')
class Music(commands.Cog):
def __init__(self, bot):
self.bot = bot
self.yt-dlp_opts = {
'format': 'bestaudio/best',
'outtmpl': '%(extractor)s-%(id)s-%(title)s.%(ext)s',
'restrictfilenames': True,
'noplaylist': True,
'playlistend': 1,
'nocheckcertificate': True,
'ignoreerrors': False,
'logtostderr': False,
'quiet': True,
'no_warnings': True,
'default_search': 'auto',
'source_address': '0.0.0.0', # bind to ipv4 since ipv6 addresses cause issues sometimes
}
self.ffmpeg_opts = {
'options': '-vn',
# Source: https://stackoverflow.com/questions/66070749/
"before_options": "-reconnect 1 -reconnect_streamed 1 -reconnect_delay_max 5",
}
self.cur_stream = None
self.cur_link = None
@commands.command(aliases=["p"])
async def play(self, ctx, url):
yt-dlp = youtube_dl.YoutubeDL(self.ytdl_opts)
data = yt-dlp.extract_info(url, download=False)
filename = data['url'] # So far only works with links
print(filename)
audio = p.FFmpegPCMAudio(filename, **self.ffmpeg_opts)
self.cur_stream = audio
self.cur_link = filename
# You must be connected to a voice channel first
await ctx.author.voice.channel.connect()
ctx.voice_client.play(audio)
await ctx.send(f"now playing")
@commands.command(aliases=["ff"])
async def seek(self, ctx):
"""
Fast forwards 10 seconds
"""
ctx.voice_client.pause()
for _ in range(500):
self.cur_stream.read() # 500*20ms of audio = 10000ms = 10s
ctx.voice_client.resume()
await ctx.send(f"fast forwarded 10 seconds")
@commands.command(aliases=["j"])
async def jump(self, ctx, time):
"""
Jumps to a time in the song, input in the format of HH:MM:SS
"""
ctx.voice_client.stop()
temp_ffempg = {
'options': '-vn',
# Keyframe skipping when passed as an input option (fast)
"before_options": f"-ss {time} -reconnect 1 -reconnect_streamed 1 -reconnect_delay_max 5",
}
new_audio = p.FFmpegPCMAudio(self.cur_link, **temp_ffempg)
self.cur_stream = new_audio
ctx.voice_client.play(new_audio)
await ctx.send(f"skipped to {time}")
bot.add_cog(Music(bot))
bot.run(os.environ["BOT_TOKEN"])
My requirements.txt
file:
discord.py[voice]==1.7.3
yt-dlp==2021.9.2
To play a song in Discord the following format is used:
;p <link>
Where <link>
is any link that yt-dlp supports. Under normal circumstances, the ;p
command is used with songs that are relatively short, to which seek()
and jump()
work extremely quickly to do what they are supposed to do. For example if I execute these sequence of commands in Discord:
;p https://www.youtube.com/watch?v=n8X9_MgEdCg <- 4 min song
And when the bot starts playing, spam the following:
;ff
;ff
;ff
;ff
;ff
The bot is able to almost instantly seek five 10-second increments of the song. Additionally, I can jump to the three minute mark very quickly with:
;j 00:03:00
From some experimentation, the seek()
and jump()
functions seem to work quickly for songs that are under 10 minutes. If I try the exact same sequence of commands but with a 15 minute song like https://www.youtube.com/watch?v=Ks9Ck5LfGWE
or longer https://www.youtube.com/watch?v=VThrx5MRJXA
(10 hours classical music), there is an evident slowdown when running the ;ff
command. However, when I include a few seconds of delay between firings of the ;ff
command, the seeking is just as fast as previously mentioned. I'm not exactly sure what is going on with yt-dlp/FFmpeg behind the scenes when streaming, but I speculate that there is some sort of internal buffer, and songs that pass a certain length threshold are processed differently.
For longer songs, the seek()
command takes longer to get to the desired position, which makes sense since this site specifies that -ss
used as an input option loops through keyframes (as there must be more keyframes in longer songs). However, if the following commands are run in Discord:
;p https://www.youtube.com/watch?v=VThrx5MRJXA <- 10 hour classical music
;j 09:00:00 <- jump to 9 hour mark
;j 00:03:00 <- jump to 3 minute mark
The first seek command takes around 5 to 10 seconds to perform a successful seek, which isn't bad, but it could be better. The second seek command takes around the same time as the first command, which doesn't make sense to me, because I thought less keyframes were skipped in order to reach the 3 minute mark.
So I'm wondering what's going on, and how to potentially solve the following:
- What is actually going on with the
seek()
command? My implementation ofseek()
uses discord.py'sdiscord.player.FFmpegPCMAudio.read()
method, which apparently runs slower if the song's length is longer? Why? - Why does input seeking for long YouTube videos take almost the same time no matter where I seek to?
- How the yt-dlp and FFmpeg commands work behind the scenes to stream a video from YouTube (or any other website that YTDL supports). Does yt-dlp and FFmpeg behave differently for audio streams above a certain length threshold?
- Potential ways to speed up
seek()
andjump()
for long songs. I recall some well-known discord music bots were able to do this very quickly.