Moviepy audio is merging together in script. How do I fix it?

52 Views Asked by At

I have a dataframe containing speech recordings and videofiles I want to merge. For example, here is what the dataframe looks like:

speech_paths,vid_paths,start,stop,short_option, 
Recording.m4a,hr.mp4,00:11:11.520,00:11:22.800,N,
Recording2.m4a,hr.mp4,00:04:38.800,00:04:54.840,N, 
Recording3.m4a,hr.mp4,00:05:12.520,00:05:35.600,N, 
Recording4.m4a,hr.mp4,00:10:36.440,00:11:11.520,N,  

My goal is to loop through this csv and combine each recording with the video. The start and stop stamps represent when I would like the audio from the video file to start. However, my recording in "speech paths" should be added to the beginning of each video before the audio from the video starts. I am essentially trying to create an audiofile that combines my voice with the audio file in the video at the beginning of each video. The video clip will also be extended at the beginning and will contain no audio, but will have video still playing. My voice will just be the start.

Here is the code that does this:

#get the directory containing the .csv
  df = pd.read_csv('./csv-data/clip-config.csv')
  speech = df['speech_paths']
  startstamps = df['start']
  endstamps = df['stop']
  videos = df['vid_paths']

  #create standard recording path
  record_path = 'C:/Users/Corey4005/Documents/Sound recordings'

  #current directory 
  cwd = os.getcwd()

  #video locations 
  videos_path = os.path.join(cwd, 'inputvideos')
  outputvideos_path = os.path.join(cwd, 'outputvideos')
  srt_path = os.path.join(cwd, 'srtfile')

  #a list to concatinate all of the clips into one video if df > 0
  clips_list = []

  count = 0
  #get name of filepath 
  for i in range(len(df)):
    count +=1

    #adding the name of the speech file to the variable
    speech_file = speech[i]

    #selecting the start and end stamps to download from yt
    start_stamp = startstamps[i]
    end_stamp = endstamps[i]

    #selecting the video file
    video_file = videos[i]

    #getting the video file 
    path_to_video = os.path.join(videos_path, video_file)
    path_to_mp3 = os.path.join(record_path, speech_file)

    print("----------- Progress: {} / {} videos processed -----------".format(count, len(df)))
    print("----------- Combining the Following Files: ")
    print("----------- Speech: {}".format(path_to_mp3))
    print("----------- Video: {}".format(path_to_video))

    #need the audio length to get the appropriate start time for the new clip
    audio_length = get_audio_length(path_to_mp3)

    print('----------- Writing mono speech file')
    #create an audio clip of the new audio that is now .mp3 and convert from stero to mono
    mp.AudioFileClip(path_to_mp3).write_audiofile('mono.mp3', ffmpeg_params=["-ac", "1"])
    

    #create the overall big clip that is the size of the audio + the video in question
    big_clip = clip_video(path_to_video, start_stamp, end_stamp, audio_length)

    #create the first clip the size of the speech file, or from 0 -> end of audio_length
    first_clip = big_clip.subclip(0, audio_length)

    #set first clip audio as speech file
    audioclip = mp.AudioFileClip("mono.mp3")
    first_clip.audio=audioclip
  
    #create a second clip the size of the rest of the file or from audio_length -> end
    second_clip = big_clip.subclip(audio_length)

    # Concatenate the two subclips
    final_clip = mp.concatenate_videoclips([first_clip, second_clip])

    if len(df)>1:
      
      #for youtube
      clips_list.append(final_clip)
      
    else:
      ytoutpath = os.path.join(outputvideos_path, 'youtube.mp4')

      print('----------- Writing combined speech and videofile')
      #youtube
      final_clip.write_videofile(ytoutpath)
      #yt filepath 

      ytfilepath = os.path.abspath(ytoutpath)


      #create subtitles filepath
      print("----------- generating srt file")
      transcribefile = video_to_srt(ytfilepath, srt_path)

      #create videos that are subtitles 
      print("----------- subtitiling youtube video")
      subtitledyt = create_subtitles(ytfilepath, transcribefile, 'yt', outputvideos_path)

      #resize the video for tt, resized is the filename
      print('----------- generating tiktok video')
      resized = resize(final_clip, count, outputvideos_path)
      
      print('----------- subtitling tiktokvideo')
      tiktoksubtitled = create_subtitles(resized, transcribefile, 'tt', outputvideos_path)

  if len(df)>1:
    #writing the finall clips list into a concatinated video
    print("----------- Concatinating all {} videos -----------".format(len(df)))
    concatinate_all = mp.concatenate_videoclips(clips_list)
    
    #creating paths to save videos to 
    ytoutpath = os.path.join(outputvideos_path, 'concat_youtube.mp4')

    #write out file for iphone
    concatinate_all.write_videofile(ytoutpath)

Here are some other functions that are used in the main script I created, which will show the complete context:

def get_audio_length(filepath: str)->float:
    print('----------- Retrieving audio length')
    seconds = librosa.get_duration(filename=filepath)
    print(f'----------- Seconds: {seconds}')
    return seconds

def clip_video(input_video: str, start_stamp: str, end_stamp: str, delta: float | None = None) -> mp.VideoFileClip:
  # Load the video.
  video = mp.VideoFileClip(input_video)

  #converting timestamp to seconds 
  if delta:
    start_stamp = convert_timestamp(start_stamp)-delta
    end_stamp = convert_timestamp(end_stamp)
    clip = video.subclip(start_stamp, end_stamp)

  else:
  # Clip the video.
    clip = video.subclip(convert_timestamp(start_stamp), convert_timestamp(end_stamp))
  
  return clip


def convert_timestamp(timestamp: str) -> float:
    
    # Split the timestamp on the `:` character.
    hours, minutes, seconds = timestamp.split(":")  
    seconds, ms = seconds.split('.')
    # Convert the time string to a timedelta object.
    timedelta_object = datetime.timedelta(hours=int(hours), minutes=int(minutes), seconds=int(seconds), milliseconds=int(ms))
    #convert to seconds 
    seconds = timedelta_object.total_seconds()
    return seconds

My problem is that the Recording4.m4a, is bleeding into the last part of each of the recordings above it. I am not sure why this is happening, as I am creating a totally different "mono.mp3" file each time. Essentially, this file is a mono instead of stero version of the "speech" file I am adding to the front of each video.

How do I stop the final recording from bleeding into the others? This basically means that each of my audio files start with the correct sound, but then about halfway through the fourth recording interrupts and starts. I feel like I am missing some understanding of how moviepy works.

1

There are 1 best solutions below

0
On

I was able to solve this problem by writing the audio files to separate locations. For example, here is the code that you need to change in the script above to cause the audio to be individual for each clip, or not bleed together.

    #create an audio clip of the new audio that is now .mp3 and convert from stero to mono
    filename = str(count) + 'mono.mp3'
    mp.AudioFileClip(path_to_mp3).write_audiofile(filename, ffmpeg_params=["-ac", "1"])

    #create the overall big clip that is the size of the audio + the video in question
    big_clip = clip_video(path_to_video, start_stamp, end_stamp, audio_length)

    #create the first clip the size of the speech file, or from 0 -> end of audio_length
    first_clip = big_clip.subclip(0, audio_length)

    #set first clip audio as speech file
    audioclip = mp.AudioFileClip(filename)
    first_clip.audio=audioclip

What is different is that in this case, each audio file is being written out as its own mono file instead of reusing the same file name for each. Not sure why moviepy cannot overwrite audio files, but I assume it has something do with the buffer not being completely flushed or something like that.