Does aubio have a way to detect sections of a piece of audio that lack tonal elements -- rhythm only? I tested a piece of music that has 16 seconds of rhythm at the start, but all the aubiopitch and aubionotes algorithms seemed to detect tonality during the rhythmic section. Could it be tuned somehow to distinguish tonal from non-tonal onsets? Or is there a related library that can do this?
Can aubio be used to detect rhythm-only segments?
122 Views Asked by gregory michael travis AtThere are 2 best solutions below

Been busy the past couple of days - but started looking into this today...
It'll take a while to perfect I guess but I thought I'd give you a few thoughts and some code I've started working on to attack this!
Firstly, pseudo code's a good way to design an initial method.
1/ use import matplotlib.pyplot as plt
to spectrum analyse the audio, and plot various fft and audio signals.
2/ import numpy as np
for basic array-like structure handling.
(I know this is more than pseudo code, but hey :-)
3/ plt.specgram
creates spectral maps of your audio. Apart from the image it creates (which can be used to start to manually deconstruct your audio file), it returns 4 structures.
eg
ffts,freqs,times,img = plt.specgram(signal,Fs=44100)
ffts
is a 2 dimentional array where the columns are the ffts
(Fast Fourier Transforms) of the time sections (rows).
The plain vanilla specgram
analyses time sections of 256 samples long, stepping 128 samples forwards each time.
This gives a very low resolution frequency array at a pretty fast rate.
As musical notes merge into a single sound when played at more or less 10 hz, I decided to use the specgram
options to divide the audio into 4096 sample lengths (circa 10 hz) stepping forwards every 2048 samples (ie 20 times a second).
This gives a decent frequency resolution, and the time sections being 20th sec apart are faster than people can perceive individual notes.
This means calling the specgram
as follows:
plt.specgram(signal,Fs=44100,NFFT=4096,noverlap=2048,mode='magnitude')
(Note the mode - this seems to give me amplitudes of between 0 - 0.1: I have a problem with fft
not giving me amplitudes of the same scale as the audio signal (you may have seen the question I posted). But here we are...
4/ Next I decided to get rid of noise in the ffts
returned. This means we can concentrate on freqs
of a decent amplitude, and zero out the noise which is always present in ffts
(in my experience).
Here is (are) my function(s):
def gate(signal,minAmplitude):
return np.array([int((((a-minAmplitude)+abs(a-minAmplitude))/2) > 0) * a for a in signal])
Looks a bit crazy - and I'm sure a proper mathematician could come up with something more efficient - but this is the best I could invent. It zeros any freqencies of amplitude less than minAmplitude
.
This is the relevant code to call it from the ffts
returned by plt.specgram
as follows, my function is more involved as it is part of a class, and has other functions it references - but this should be enough:
def fft_noise_gate(minAmplitude=0.001,check=True):
'''
zero the amplitudes of frequencies
with amplitudes below minAmplitude
across self.ffts
check - plot middle fft just because!
'''
nffts = ffts.shape[1]
gated_ffts = []
for f in range(nffts):
fft = ffts[...,f]
# Anyone got a more efficient noise gate formula? Best I could think up!
fft_gated = gate(fft,minAmplitude)
gated_ffts.append(fft_gated)
ffts = np.array(gated_ffts)
if check:
# plot middle fft just to see!
plt.plot(ffts[int(nffts/2)])
plt.show(block=False)
return ffts
This should give you a start I'm still working on it and will get back to you when I've got further - but if you have any ideas, please share them.
Any way my strategy from here is to:
1/ find the peaks ( ie start of any sounds) then 2/ Look for ranges of frequencies which rise and fall in unison (ie make up a sound).
And
3/ Differentiate them into individual instruments (sound sources more specifically), and plot the times and amplitudes thereof to create your analysis (score).
Hope you're having fun with it - I know I am.
As I said any thoughts...
Regards
Tony
Use a spectrum analyser to detect sections with high amplitude. If you program - you could take each section and make an average of the freqencies (and amplitudes) present to give you an idea of the instrument(s) involved in creating that amplitude peak.
Hope that helps - if you're using python I could give you some pointers how to program this!?
Regards
Tony