I'm working on an iOS app using the AVFoundation framework for real-time audio processing. The app captures audio from the microphone, takes each buffer, passes it to a function that returns a modified buffer and then plays it back.
However, I'm facing a significant delay - around 500 ms - between speaking into the microphone and hearing the playback. At first, I thought that the delay was caused because processing takes so much time, but the same thing happens, even if I remove the processing and just play the original buffer.
Here's my setup:
import SwiftUI
import AVFoundation
class MicTestAudioKitService: MicTestRepository {
private let engine = AVAudioEngine()
private let playerNode = AVAudioPlayerNode()
// ------------------
@Injected private var pitchCorrectionService: PitchCorrectionRepository
// ------------------
private var isInitialized = false
private var pitchCorrectionIntensity: Float = 0.5
// ------------------
private func initialize() {
guard !isInitialized else { return }
isInitialized.toggle()
let inputNode = engine.inputNode
let format = inputNode.inputFormat(forBus: 0)
// Attach and connect the playerNode
engine.attach(playerNode)
engine.connect(playerNode, to: engine.mainMixerNode, format: format)
}
func start() {
initialize()
let inputNode = engine.inputNode
let format = inputNode.inputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { [weak self] buffer, _ in
guard let self else { return }
let processedBuffer = pitchCorrectionService.pitchCorrect(buffer: buffer, intensity: pitchCorrectionIntensity) ?? buffer
outputAudioBuffer(processedBuffer)
}
do {
try engine.start()
} catch {
print("Audio Engine failed to start: \(error)")
}
}
func stop() {
engine.stop()
}
func setPitchCorrectionIntensity(_ intensity: Float) {
pitchCorrectionIntensity = intensity
}
}
extension MicTestService {
private func outputAudioBuffer(_ buffer: AVAudioPCMBuffer) {
playerNode.scheduleBuffer(buffer, completionHandler: nil)
if !playerNode.isPlaying {
playerNode.play()
}
}
}
Same thing happens, even with wired earphones.
Any ideas?
This is what I do for a similar application. It works great. There is latency, but I think it's less than 100 mS.
To monitor the input:
To record the audio (I use a file, you can use a buffer instead):